Traditional Culture Encyclopedia - Traditional festivals - Talking about Data Mining and Data Warehouse

Talking about Data Mining and Data Warehouse

Talking about Data Mining and Data Warehouse

1 data mining

1. 1 Difference between data mining and traditional data analysis

The essential difference between data mining and traditional data analysis such as query, report and online application analysis is that data mining is to mine information and discover knowledge without clear assumptions. The information obtained by data mining should have three characteristics: previously unknown, effective and practical. In other words, data mining is to find information or knowledge that intuition can't find, even information or knowledge that goes against intuition. The more unexpected the information is, the more valuable it may be. However, the traditional data analysis trend is to grab the required data from a large database and use the exclusive computer analysis software. Therefore, data mining is very different from traditional analysis methods.

The application value of 1.2 data mining

(1) classification: firstly, select the classified training set from the data, and use the technology of data mining classification on this training set to establish a classification model to classify the unclassified data. (2) Estimation: Similar to classification, the difference is that classification describes the output of discrete variables, while estimation deals with the output of continuous values; Classification is a definite number, and estimation is uncertain. (3) Clustering: grouping records. The difference between clustering and classification is that clustering does not depend on predefined classes and does not need training sets. China Mobile uses Ma Kewei Analysis System, an advanced data mining tool, to cluster and analyze users' wap surfing behaviors, and conduct precise marketing through customer grouping. (4) Discovery of association rules and sequence patterns: Association is such a connection that when something happens, other things will happen. For example, people who buy beer every day may also buy cigarettes, and the proportion can be described by the support and credibility of the association. Different from association, sequence is a vertical association. For example, if banks adjust interest rates today, the stock market will change tomorrow. (5) Prediction: A model is obtained through classification or estimation, which is used to predict unknown variables. (6) Deviation detection: describe a few extreme special cases of the analysis object and reveal the internal reasons. In addition, it is also widely used in customer analysis, logistics and enterprise resource optimization, anomaly detection and enterprise analysis model management.

2 data warehouse

2. 1 characteristics of data warehouse

(1) Subject-oriented data set. Data warehouse is organized around topics such as customers, suppliers, products and sales. Data warehouse focuses on the data modeling and analysis of decision makers, rather than the daily operation and transaction processing of organizations. (2) Comprehensive data set. The data in the data warehouse is obtained through systematic processing, summary and sorting on the basis of extracting and cleaning the original scattered database data. Inconsistencies in the source data must be eliminated to ensure that the information in the data warehouse is consistent and global information about the whole enterprise. (3) Time-varying data sets. Data storage provides information from a historical perspective. The data in data warehouse usually contains historical information, through which the development course and future trend of enterprises can be quantitatively analyzed and predicted. (4) Non-volatile data set. The data in the data warehouse is mainly used for enterprise decision analysis, and the data operations involved are mainly data query, with few modification and deletion operations, and usually only need to be loaded and refreshed regularly. The data in a data warehouse usually only needs two operations: initial loading and data access, so its data is relatively stable and rarely or never updated. 2.2 Types of data warehouses

Types of data warehouses According to the types of data managed by data warehouses and the scope of enterprise problems solved, data warehouses can generally be divided into the following three types: enterprise data warehouses (EDW), operation databases (ODS) and data marts. ① Enterprise data warehouse is a general data warehouse, which contains a lot of detailed data as well as a lot of complicated or aggregated data, and is not easy to change and face the history. This kind of data warehouse is used to make strategic or tactical decisions covering various enterprise fields. (2) Operational database can be used to make decision support for working data, and it can also be used as a transition area when loading data into data warehouse. Compared with EDW, ODS is subject-oriented, comprehensive and changeable, and only contains current and detailed data, excluding cumulative and historical data. ③ A data mart is a part of data separated from a data warehouse for a specific application purpose or scope, which can also be called department data or subject data. Several groups of data marts can form an EDW.

2.3 Comparison between Data Warehouse and Traditional Database

There are both connections and differences between them. The emergence of data warehouse is not to replace database. At present, most data warehouses are managed by relational database management system. It can be said that database and data warehouse complement each other and have their own advantages. The difference between the two can be compared from the following aspects:

(1) The starting point is different: the database is a transaction-oriented design; Data warehouse is subject-oriented. (2) The stored data are different: the database generally stores online transaction data; Data warehouses generally store historical data. (3) Different design rules: database design is to avoid redundancy as much as possible, and generally adopts rules that conform to the normal form; In the design of data warehouse, redundancy is intentionally introduced and designed in an unconventional way. (4) The functions provided are different: the database is designed for grabbing data, and the data warehouse is designed for analyzing data. (5) Different basic elements: the basic element of database is fact table, and the basic element of data warehouse is dimension table. (6) Different capacity: The basic capacity of the database is much smaller than that of the data warehouse. (7) Different service objects: the database is designed for efficient transaction processing, and the service object is the staff of enterprise business processing; Data warehouse is designed to analyze data and make decisions, and the service object is the top decision-makers of enterprises.

3 the relationship between data warehouse and data mining

Of course, data mining does not have to build a data warehouse. A data warehouse is not necessary. Establishing a huge data warehouse, unifying data from different sources, resolving all data conflicts, and then importing all data into a data warehouse is a huge project, which may take several years and millions of dollars to complete. Take data mining for example, you can import one or several transaction databases into a read-only database, treat it as a data mart, and then do data mining on it.