Traditional Culture Encyclopedia - Traditional festivals - Method and Implementation of Data Mining
Method and Implementation of Data Mining
As a new technology for data processing, data mining has many new features. First of all, data mining faces a huge amount of data, which is also the reason for data mining. Secondly, the data may be incomplete, noisy and random, with complex data structure and large dimensions. Finally, data mining is the intersection of many disciplines, using the technology of statistics, computer, mathematics and other disciplines. The following are common and widely used algorithms and models:
Traditional statistical methods: ① sampling technology: We are faced with a large number of data, and it is impossible and unnecessary to analyze all the data, so we should conduct reasonable sampling under the guidance of theory. ② Multivariate statistical analysis: factor analysis, cluster analysis, etc. ③ Statistical forecasting methods, such as regression analysis and time series analysis.
Visualization technology: use charts and other means to express data features intuitively, such as histograms. , which uses many methods to describe statistics. One of the difficult problems faced by visualization technology is the visualization of high-dimensional data.
Decision tree: a tree diagram is established by using a series of rules, which can be used for classification and prediction. Commonly used algorithms are CART, CHAID, ID3, C4.5, C5.0 and so on.
Neural network: simulates the function of human neurons, adjusts and calculates the data through the input layer, hidden layer and output layer, and finally obtains the results for classification and regression.
Genetic algorithm: Based on the theory of natural evolution, it simulates the optimization technology of gene combination, mutation and selection.
Mining algorithm of association rules: Association rules describe the relationship between data in the form of "a1∧ A2 ∧… an → b1∧ B2 ∧ … bn". Generally, it is divided into two steps: ① Finding large data sets. ② Using large data sets to generate association rules.
In addition to the above-mentioned common methods, there are rough set method, fuzzy set method, Bayesian belief network, K nearest neighbor method (KNN) and so on.
Implementation process of data mining
We discussed the definition, functions and methods of data mining. The key issue now is how to implement it. The general data mining process is as follows:
Understand and ask questions → data preparation → data arrangement → modeling → evaluation and explanation.
Question Understanding and Proposing: Before starting data mining, the most basic thing is to understand the data and actual business problems, and on this basis, ask questions and have a clear definition of the goal.
Data preparation: get the original data, extract a certain number of subsets from it, and establish a data mining library. One of the problems is that if the original data warehouse of an enterprise meets the requirements of data mining, it can be used as a data mining library.
Data collation: Because the data may be incomplete, noisy and random, and have a complex data mining structure, it is necessary to preliminarily collate the data, clean the incomplete data, make a preliminary description and analysis, select variables related to data mining, or change variables.
Modeling: according to the goal of data mining and the characteristics of data, choose the appropriate model.
Evaluation and interpretation: evaluate the results of data mining, select the best model, make an evaluation, apply it to practical problems, and interpret the results with professional knowledge.
The above process is not completed at one time, and some or all steps may be repeated.
- Related articles
- What are the characteristics of Cantonese cuisine?
- What kind of dance does Dunhuang Dance belong to?
- What do you need to do before choosing a fishing spot for wild fishing in strange waters?
- Confucius, the master of Confucianism, what are his main political views?
- What is the welcome etiquette?
- What is the symbol of the cow
- What does this idiom mean?
- Which Huang Wu bacon family in Zhifu District of Yantai is delicious?
- Are there other ethnic groups in Japan besides the Yamato?
- The concrete manifestation of the pattern of difference order around us