Traditional Culture Encyclopedia - Traditional stories - What is big data and what are the typical cases of big data?

What is big data and what are the typical cases of big data?

"Big data" refers to a data set with a particularly large amount of data and data categories, which cannot be captured, managed and processed by traditional database tools. "Big data" first refers to the amount of data? Large refers to a large data set, usually in 10TB? Regarding the scale, but in practical application, many enterprise users put multiple data sets together, which has formed a PB-level data volume; Secondly, it means that there are many kinds of data, which come from various data sources, and the types and formats of data are increasingly rich, which has broken through the previously defined category of structured data, including semi-structured and unstructured data. Secondly, the data processing speed (Velocity) is fast, and the data can be processed in real time under the condition of huge data. The last feature refers to the high authenticity of the data. With people's interest in new data sources such as social data, enterprise content, transaction and application data, the limitations of traditional data sources have been broken, and enterprises increasingly need effective information power to ensure their authenticity and security.

Data collection: ETL tool is responsible for extracting data from distributed and heterogeneous data sources, such as relational data and plane data files, to the temporary middle layer, cleaning, transforming and integrating them, and finally loading them into data warehouse or data mart, which becomes the basis of online analytical processing and data mining.

Data access: relational database, NOSQL, SQL, etc.

Infrastructure: cloud storage, distributed file storage, etc.

Data processing: NLP (NaturalLanguageProcessing) is a subject that studies the language problems of human-computer interaction. The key to natural language processing is to make the computer "understand" natural language, so natural language processing is also called NLU (natural language understanding), also called computational linguistics. On the one hand, it is a branch of language information processing, on the other hand, it is one of the core topics of artificial intelligence (AI).

Statistical analysis: hypothesis test, significance test, difference analysis, correlation analysis, t test, variance analysis, chi-square analysis, partial correlation analysis, distance analysis, regression analysis, simple regression analysis, multiple regression analysis, stepwise regression, regression prediction and residual analysis, ridge regression, logistic regression analysis, curve estimation, factor analysis, cluster analysis, principal component analysis, factor analysis, rapid clustering method and clustering method.

Data mining: classification, estimation, prediction, affinity grouping or association rules, clustering, description and visualization, description and visualization, complex data type mining (text, Web, graphic images, video, audio, etc. ).

Model prediction: prediction model, machine learning, modeling and simulation.

Presented results: cloud computing, tag cloud, relationship diagram, etc.

To understand the concept of big data, we must first start with "big", which refers to the size of data. Big data generally refers to the amount of data above10tb (1TB =1024gb). Big data is different from the previous massive data, and its basic characteristics can be summarized by four V's (volume, diversity, value and speed), namely, large volume, diversity, low value density and high speed.

First, the amount of data is huge. Jump from TB to PB.

Second, there are many types of data, such as web logs, videos, pictures, geographic information and so on.

Third, the value density is low. Take video as an example, in the process of continuous monitoring, the data that may be useful is only one or two seconds.

Fourth, the processing speed is fast. 1 the second law. Finally, this point is essentially different from the traditional data mining technology. Internet of Things, cloud computing, mobile Internet, car networking, mobile phones, tablet computers, PCs, and various sensors all over the world are all data sources or bearing methods.

Big data technology refers to the technology of quickly obtaining valuable information from all kinds of massive data. The core of solving big data problems is big data technology. At present, "big data" not only refers to the scale of data itself, but also includes tools, platforms and data analysis systems for collecting data. The purpose of big data research and development is to develop big data technology and apply it to related fields, and promote its breakthrough development by solving huge data processing problems. So the challenge brought by the era of big data is not only reflected in how to deal with huge amounts.