Traditional Culture Encyclopedia - Traditional stories - What are big data and big data technology?

What are big data and big data technology?

Stop ignoring big data. Hard work is important, but it is also essential, even more important, to grasp the development trend of the times and choose a good direction.

At present, the big data jobs provided by enterprises can be divided into the following categories according to the requirements of work content:

① Primary analysis, including business data analysts and business data analysts. ② Mining algorithms, including data mining engineers, machine learning engineers, deep learning engineers, algorithm engineer, AI engineers, data scientists, etc. ③ Development and operation, including big data development engineer, big data architecture engineer, big data operation and maintenance engineer, data visualization engineer, data acquisition engineer, database administrator, etc. ④ Product operation category, including data operation manager, data product manager, data project manager and big data sales.

Big data itself is an abstract concept. Generally speaking, big data refers to a collection of data that cannot be obtained, stored, managed and processed by conventional software tools in a limited time.

At present, there is no unified definition of big data in the industry, but it is generally believed that big data has four characteristics: volume, speed, diversity and value, which is referred to as "4V" for short, that is, huge data, fast data, diverse data types and low data value density, as shown in figure 1. Each function is briefly described below.

1) Volume: The volume of data representing big data is huge.

The scale of data collection has been expanding, from GB to TB to PB. In recent years, the amount of data has even started to be counted by EB and ZB.

For example, the video surveillance information of a medium-sized city can reach tens of TB a day. Baidu homepage navigation needs to provide more than 1-5PB of data every day. These data, if printed, will exceed 500 billion A4 sheets. Figure 2 shows the amount of data generated by the Internet every minute.

2) Speed: The speed of data generation, processing and analysis representing big data continues to accelerate.

The reasons for the acceleration are the real-time characteristics of data creation and the need to combine streaming data into business processes and decision-making processes. The data processing speed is fast, and the processing mode has begun to change from batch processing to stream processing.

The industry's ability to process big data has a name-"1second law", which means that high-value information can be quickly obtained from all kinds of data. The fast processing ability of big data fully embodies its essential difference from traditional data processing technology.

3) Diversity: There are many data types representing big data.

The data types generated and processed by the traditional IT industry are relatively simple, and most of them are structured data. With the emergence of new channels and technologies such as sensors, smart devices, social networks, Internet of Things, mobile computing and online advertising, countless types of data have been generated.

Nowadays, data types are not only formatted data, but also semi-structured or unstructured data, such as XML, email, blog, instant message, video, photos, clickstream, log files and so on. Enterprises need to integrate, store and analyze data from complex traditional and non-traditional information sources, including internal and external data.

4) Value: refers to the low data value density of big data.

Due to the increasing amount of big data, the value density of unit data is decreasing, but the overall value of data is increasing. Take surveillance video as an example. In an hour-long video, useful data may only be one or two seconds, but it will be very important. Now many experts have equated big data with gold and oil, which means that big data contains unlimited commercial value.

Through the processing of big data, we can find out its potential commercial value, thus generating huge commercial profits.