Traditional Culture Encyclopedia - Traditional customs - Three classic data mining algorithms

Three classic data mining algorithms

Algorithms can be said to be the core of many technologies, and the same is true for data mining.

There are many algorithms in data mining. It is the existence of these algorithms that enables our data mining to solve more problems.

If we master these algorithms, we can successfully carry out data mining work. In this article, we will briefly introduce the classic algorithms of data mining, hoping to help you.

1. KNN algorithm The full name of KNN algorithm is called k-nearest neighbor classification, which is K nearest neighbor, referred to as KNN algorithm. This classification algorithm is a relatively mature method in theory and one of the simplest machine learning algorithms.

.

The idea of ??this method is: if a sample is the most similar k in the feature space, that is, most of the nearest samples in the feature space belong to a certain category, then the sample also belongs to this category.

The KNN algorithm is often used for classification in data mining and plays a vital role.

2. Naive Bayes algorithm Among the many classification models, the two most widely used classification models are the Decision Tree Model and the Naive Bayesian Model (NBC).

The Naive Bayes model originated from classical mathematical theory and has a solid mathematical foundation and stable classification efficiency.

At the same time, the NBC model requires few estimated parameters, is not very sensitive to missing data, and has a relatively simple algorithm.

Theoretically, the NBC model has the smallest error rate compared with other classification methods.

But this is not always the case. This is because the NBC model assumes that attributes are independent of each other. This assumption is often not true in practical applications, which has a certain impact on the correct classification of the NBC model.

When the number of attributes is relatively large or the correlation between attributes is large, the classification efficiency of the NBC model is not as good as that of the decision tree model.

When the attribute correlation is small, the NBC model performs best.

This algorithm has a high usage rate in data mining work. An excellent data miner must know how to use this algorithm.

3.CART algorithm CART, which is Classification and Regression Trees.

It is our common classification and regression tree. There are two key ideas under the classification tree.

The first is about the idea of ??recursively dividing the independent variable space; the second is about pruning with validation data.

These two ideas also determine the status of this algorithm.

In this article, we introduce to you the relevant knowledge about the KNN algorithm, Naive Bayes algorithm, and CART algorithm. In fact, these three algorithms occupy a high position in data mining, so if you want to engage in the data mining industry, you must not

Ignore the learning of these algorithms.