Traditional Culture Encyclopedia - Traditional festivals - What are the big data collection platforms?

What are the big data collection platforms?

To solve this problem, let's first understand the service platform process provided by the big data collection platform, including:

1, firstly, the platform collects data according to requirements.

2. The platform stores the collected data.

3. Then analyze and process the data.

4. Finally, display data visually, including report and monitoring data.

An excellent big data platform should be able to perform well in big data analysis methods, big data programming, big data warehouse, big data cases, artificial intelligence, data mining and so on.

Now I would like to recommend several mainstream and excellent big data platforms:

1，ApacheFlume

Apache's data acquisition system is open source, highly reliable, highly extensible, easy to manage and supports customers' expansion. This is a distributed, reliable and available system. It is a java runtime environment, which is used to effectively collect, aggregate and move a large number of log data from a large number of different sources for centralized data storage.

The main functions are as follows:

1. Log collection: Various data transmitters are customized in the log system to collect data.

2. Data processing: it provides the ability to simply process data and write it into various data receivers (customizable), and provides data collection from data sources such as console (console), RPC(Thrift-RPC), text (file), Tail (UNIX), syslog(syslog logging system, which supports two modes such as TCP and UDP) and exec (command execution).

2. fluid d

Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and use, so as to better use and understand data. Fluentd is one of the member projects of the Cloud Local Computing Foundation (CNCF), which follows the Apache2License protocol. FLuentd is very extensible, and customers can customize (Ruby) input/buffer/output.

Official website:

Articles/Quick Start

The main functions are as follows:

1, input: responsible for receiving data or actively grabbing data. Support syslog, http, filetail, etc.

2. Buffer: It is responsible for the performance and reliability of data acquisition, and there are also different types of buffers such as files or memory that can be configured.

3. Output: responsible for outputting data to the destination such as file, AWSS3 or other Fluentd.

3. Chukwa language

Chukwa can collect all kinds of data into files suitable for Hadoop processing and save them in HDFS for Hadoop to perform various MapReduce operations. Chukwa itself also provides many built-in functions to help us collect and organize data.

1, real-time monitoring the changes of log files of application nodes, writing the contents of incremental files into HDFS, and at the same time, eliminating duplication and sorting data.

2. Monitor the data from the Socket, and regularly execute the commands we specify to obtain the output data.

There are still many excellent platforms. So far, developers can learn about them according to official documents and choose the required platform according to the characteristics and needs of the project.