Traditional Culture Encyclopedia - Traditional stories - What are the differences between big data and traditional industries?

What are the differences between big data and traditional industries?

Speaking of data analysis, in fact, with the development of big data in recent years, data is considered as the key technology and core engine in the integration of physics and information. All walks of life are entering the era of big data non-stop and vigorously. The boundary between the traditional industry and the Internet industry began to cross, complement and penetrate. Traditional manufacturing industry is no longer a mode of production resale, but more about listening to the voice of the market. What the market needs, the consumer terminal will give it more diversification and personalization accordingly.

At present, the main differences between the two are as follows:

One: Structured data and unstructured data

Traditional industries are more structured data, that is, row data, which are stored in databases and can be logically expressed by two-dimensional table structure. For example, ERP systems of manufacturing enterprises use databases such as oracle and Sql Server. Internet industry is more unstructured data, which cannot be described in two dimensions, such as all formats of office documents, texts, pictures, XML, HTML, various reports, images and audio-visual information, and so on, such as medical imaging system, educational video on demand, video surveillance, land GIS, design institute, file server (PDM/FTP), media resource management and other specific applications.

Second, the amount of data

Massive data in the Internet industry, due to the characteristics of the Internet industry, will produce massive data at all times, and its data is often PB-level. How big is 1 PB? It is equivalent to the 50th power of 2 in bytes. If you have no idea about this, simply speaking, Historical Records has about 520,000 Chinese characters, and 1 PB can store at least 1 100 million historical records, represented by Baidu, Tencent and Ali. The data produced by a traditional manufacturing factory in three months is less than 100G. This is a big difference.

Third, the way to look at data is different from the purpose of data analysis.

The Internet industry will analyze and mine these massive data. Whether it is past data or real-time data, data is no longer static and outdated. Any data forgotten in the server may be reused, so as to find out the relationship between it and us, behavior and phenomenon. For example, every "Double Eleven" and "Chopper Party" is faced with a painful choice: there are too many discounted goods, so what should I buy? Finally, I accidentally maxed out my credit card, bought a lot of unnecessary goods, and had to eat Master Kong in tears for half a year …

Every day, Google receives more than 3 billion search instructions from all over the world. After years of data accumulation, Google has established links between search keywords such as "cough" and "fever" and flu areas, so in 2009, Google successfully predicted the spread of winter flu in the United States, and it was accurate to regions and States. Traditional industries don't care too much about past data. Generally, they will take stock at the end of the month and produce some financial data analysis reports. Historical data will be stored in the backup library and will only be searched when there is a problem.

Fourthly, the efficiency and security of data search.

Internet industry often stores users' personal behavior information, and requires absolute security or accuracy, such as 12306. At the end of each year, hundreds of millions of people are under pressure to migrate to buy tickets. Near the peak of ticket purchase in Spring Festival travel rush, its requirement is that users can open the webpage slowly. It doesn't matter, but it is necessary to ensure the absolute security of the user's ticket purchase information. If the user pays for a high-speed train ticket and you don't receive the money, it is definitely a big problem in the face of hundreds of millions of people's money.

Traditional industries do not have such a large amount of data and visits, and often solve problems such as concurrency and deadlock to ensure the high reliability and stability of the system. Occasionally, a purchase record or production record will be lost, because ordinary users will not only enter the system, but also make paper records, so this is tolerable.

Five: Big data technology can quickly obtain valuable information.

Based on the above characteristics of the Internet industry, when the amount of data is increasing, it also brings a series of problems.

For example, suppose there are algorithms A and B to solve a problem. When running with a small amount of data, the result of algorithm A is obviously better than that of algorithm B, that is to say, as far as the algorithm itself is concerned, algorithm A can bring better results; However, it is found that when the amount of data is increasing, the result of algorithm B running in a large amount of data is better than that of algorithm A running in a small amount of data. This discovery has brought landmark enlightenment to both computer science and computer derivative science: when the data is getting larger and larger, the data itself (rather than the algorithms and models used to study the data) ensures the validity of the data analysis results. Even if there is no accurate algorithm, as long as there is enough data, we can get a conclusion close to the fact.

Because it can handle a variety of data structures, big data can make maximum use of human behavior data recorded on the Internet for analysis. Before the emergence of big data, all the data that computers can handle need to be structured in the early stage and recorded in the corresponding database. The requirements of big data technology on data structure are greatly reduced. The information of various dimensions left by people on the Internet, such as social information, geographical location information, behavior habit information, preference information, etc., can be processed in real time, and the various characteristics of each individual can be outlined in a three-dimensional and complete way.

A large amount of unstructured and semi-structured data created by a company will spend too much time and money when it is downloaded to a relational database for analysis. Big data analysis is often associated with cloud computing, because real-time large-scale data set analysis requires a framework such as MapReduce to distribute work to dozens, hundreds or even thousands of computers. In short, the ability to quickly obtain valuable information from all kinds of data is big data technology. Simply put, big data needs distributed storage, such as Hadoop=HDFS (file system, data storage technology related) +HBase (database) +MapReduce (data processing)+...+... others, processing big data in a distributed way, not just the traditional disk array data storage processing method.

Internet has greatly changed people's lives, and a lot of high-speed and changeable information surrounds people every day. We need better ways to deal with this change anytime and anywhere. Big data technology will profoundly change the Internet world and change the entire production and lifestyle. With the development of technology, big data analysis is getting easier and cheaper, and it is easier to accelerate the understanding of business than before. More and more people are beginning to enter the ranks of big data and data analysis, ready to do their own business here.

Editor? 20 19- 10-2 1