Traditional Culture Encyclopedia - Traditional festivals - What are the similarities and differences between data analytics in the Internet and traditional industries?
What are the similarities and differences between data analytics in the Internet and traditional industries?
At present, the main difference between the two is still in the following points:
One: structured data and unstructured data
The traditional industry is more structured data, that is, the line data, stored in the database, you can logically express the realization of the data with a two-dimensional table structure, such as the application of oracle, Sql Server and other databases, the manufacturing ERP system. ERP system of enterprises. The Internet industry is more unstructured data, that is, can not be described in two-dimensional form, such as all formats of office documents, text, pictures, XML, HTML, all kinds of reports, images and audio/video information, etc., such as medical imaging systems, education video on demand, video surveillance, land GIS, design institutes, file servers (PDM/FTP), media resource management and other specific Applications.
Two: the volume of data
Internet industry massive data, due to the characteristics of the Internet industry, every moment will produce massive amounts of data, its data is often petabytes of data, how big is a petabyte? It is equivalent to 2 to the 50th power bytes. If you have no concept of this, then simply put, the Historical Records has about 520,000 Chinese characters, and 1 PB is capable of storing at least 1 billion Historical Records, with Baidu, Tencent, and Ali as the representatives of the enterprises. A traditional manufacturing plant three months manufacturing data is also less than 100G. this is a big difference.
Three: the way to look at the data and data analysis for different purposes
The Internet industry will do data analysis of these massive data, mining, whether it is the past data or instant data, data is no longer static and stale, any data that is forgotten in the server may be reused to find the relevance of which with us, with the behavior, with the phenomenon, for example, "double eleven" every year, "the first time", "the first time", "the first time", "the first time", "the first time", "the first time", "the first time", "the first time". Whenever "Double 11", "hand choppers" are faced with a painful choice: there are too many discounted goods, what is the best thing to buy? In the end, a careless, credit card brush burst, bought a lot of their own do not need the goods, only to tears to eat half a year of "Master Kong" ...
Google will receive every day from the world's more than 3 billion search commands, after many years of cumulative data, Google Inc. established the After years of data accumulation, Google has established a link between search keywords such as "cough" and "fever" and flu areas, so in 2009 Google successfully predicted the spread of winter flu in the U.S., and was accurate to the region and state and so on. The traditional industry will not pay too much attention to the past data, usually at the end of the month will be inventory, out of some financial data analysis report, historical data will be stored in the backup library, there are problems will go to find.
Four: the efficiency of the data to find and security
Internet industry often stores the user's personal behavioral information, he asked to ensure absolute security or accuracy, such as 12306, at the end of each year, faced with hundreds of millions of people migrating to the pressure of the tickets, the peak of the peak of the Spring Festival tickets near the time of the peak of the purchase of tickets, it is absolutely the requirements of the user to open the web page can be slower it does not matter, but to ensure that the user to purchase tickets. But to ensure the absolute safety of the user's ticket information. If a user pays for a high-speed train ticket, and you don't receive the money, then facing hundreds of millions of people's ticket money, this is definitely going to be a big problem.
Traditional industries do not have such a large amount of data and access, often solve the problem of concurrency, deadlock and so on, to ensure the high reliability and stability of the system, and occasionally will happen to lose a purchase record or production records of the problem, due to the general user will be in addition to the system will be entered, but also paper records, then this can be tolerated
Five: Big Data Technology Fast access to valuable information
Based on the above characteristics of the Internet industry, when the amount of data continues to increase, it also brings a series of problems.
For example, suppose there is an algorithm A and an algorithm B for solving a certain problem; when run on a small amount of data, the result of algorithm A is significantly better than that of algorithm B. That is to say, as far as the algorithm itself is concerned, algorithm A is capable of delivering a better result; however, it has been found that, when the amount of data is constantly increasing, the result of algorithm B run on a large amount of data is better than the result of algorithm A run on a small amount of data. This discovery was a landmark revelation for both the discipline of computing and computer-derived disciplines: as data gets larger, the data itself (not the algorithms and models used to study it) guarantees the validity of the results of data analysis. Even in the absence of precise algorithms, with enough data, it is possible to get close to the truth.
Because of its ability to handle a wide range of data structures, Big Data is able to make maximum use of the data on human behavior recorded on the Internet for analysis. Before the emergence of big data, the data that computers are able to handle need to be structured upfront and recorded in the appropriate databases. But big data technology for data structure requirements are greatly reduced, the Internet people left social information, geographic location information, behavioral habits information, preference information and other various dimensions of information can be processed in real time, three-dimensional and complete outline of each individual's various characteristics.
Big data analytics is often associated with cloud computing because real-time analysis of large data sets requires a framework like MapReduce to distribute work to dozens, hundreds, or even thousands of computers. In short, the ability to quickly obtain valuable information from a wide variety of types of data is Big Data technology. Simply put, Big Data requires Hadoop = HDFS (file system, data storage technology related) + HBase (database) + MapReduce (data processing) + ......Others such distributed storage, distributed processing of Big Data architecture, not just the traditional disk array data storage processing method.
The Internet has dramatically changed people's lives, with massive, high-speed, and volatile information surrounding people every day, and we need better ways to process it to cope with this kind of change anytime, anywhere. Big data technology will profoundly change the Internet world and the whole way of production and life. With the development of technology, big data analysis is becoming easier and cheaper, and can accelerate the understanding of the business more easily than before, more and more people began to enter the ranks of big data and data analytics, ready to do their own business here.
Edited on?2019-10-21
- Previous article:There is a good way to beat the virus
- Next article:Wang Erni Songs
- Related articles
- The spirit of struggle and the spirit of hard work and thrift *** the same and difference
- What do you mean by three products and one standard?
- How about Shangmei mural?
- What animals are endemic to Thailand?
- What are the benefits of leg press before dancing
- Theme Prose Sharing during the Epidemic Period (10)
- Application of Quantum Dots in Analytical Chemistry
- How about Chizhou Benniaocheng Distribution and Transportation Co., Ltd.
- How are glass containers made?
- The 28 most shocking last words in the history of China, the great mystery of life and great enlightenment.