Traditional Culture Encyclopedia - Traditional stories - What are the core technologies of big data

What are the core technologies of big data

The system of big data technology is huge and complex, and the basic technology contains data collection, data preprocessing, distributed storage, databases, data warehouses, machine learning, parallel computing, visualization and so on.

1, data collection and preprocessing: FlumeNG real-time log collection system, support in the logging system to customize all kinds of data senders, used to collect data; Zookeeper is a distributed, open source distributed application coordination services, to provide data synchronization services.

2, data storage: Hadoop as an open source framework designed for offline and large-scale data analysis, HDFS as its core storage engine, has been widely used for data storage. HBase, a distributed, column-oriented open source database, can be considered as a wrapper for hdfs, the essence of the data storage, NoSQL database.

3, data cleansing: MapReduce as the query engine of Hadoop, used for parallel computing of large-scale data sets.

4, data query analysis: Hive's core job is to translate SQL statements into MR programs, which can map structured data into a database table and provide HQL (HiveSQL) query functionality.Spark enables in-memory distribution of datasets, in addition to being able to provide interactive querying, it can optimize iterative workloads.

5. Data visualization: docking some BI platforms to visualize the data obtained from the analysis for guided decision-making services.