Traditional Culture Encyclopedia - Traditional culture - Dry goods

Dry goods

This content mainly aims at beginners of machine learning and introduces commonly used machine learning algorithms. Of course, colleagues are also welcome to communicate.

The basic questions to be answered by philosophy are where I come from, who I am and where I am going. The process of finding the answer may draw lessons from the routine of machine learning: organizing data->; Mining knowledge-> Predict the future. Organizing data is a design feature, generating samples that meet specific format requirements, mining knowledge is modeling, and predicting the future is the application of the model.

Feature design depends on the understanding of business scenarios, which can be divided into continuous features, discrete features and combined high-order features. This paper focuses on machine learning algorithms, which are divided into supervised learning and unsupervised learning.

There are many unsupervised learning algorithms. In recent years, the topic model that the industry pays more attention to is-> LSA; PLSA->； LDA is a typical algorithm in three development stages of thematic model, which is mainly due to the difference of modeling assumptions. LSA assumes that a document has only one topic, PLSA assumes that the probability distribution of each topic is constant (θ is fixed), and LDA assumes that the topic probability of each document and word is variable.

The essence of LDA algorithm can be understood with the help of God's dice. Specifically, you can participate in the article "LDA Data Gossip" written by Rickjin, which is easy to understand. By the way, I also popularized a lot of mathematics knowledge, which is highly recommended.

Supervised learning can be divided into classification and regression. Perceptron is the simplest linear classifier, which is rarely used in practice now, but it is the basic unit of neural network and deep learning.

When linear function fits data and classifies based on threshold, it is easily disturbed by noise samples, which affects the accuracy of classification. Logistic regression uses sigmoid function to constrain the model output between 0 and 1, which can effectively weaken the negative impact of noise data and is widely used to predict the click-through rate of Internet advertisements.

The parameters of logistic regression model can be solved by maximum likelihood method. Firstly, the objective function L(θ) is defined, and then the multiplication logic of the objective function is transformed into summation logic (maximum likelihood probability->; Minimize the loss function), and finally solve it by gradient descent method.

Compared with linear classification, nonlinear classifiers such as decision trees have stronger classification ability. ID3 and C4.5 are typical decision tree algorithms, and the modeling process is basically similar, but the main difference lies in the definition of gain function (objective function).

Linear regression and linear classification are similar in expression, the essential difference is that the objective function of classification is discrete value, while the objective function of regression is continuous value. Different objective functions lead to regression, and regression usually defines the objective function based on the least square method. Of course, under the assumption that the observation error satisfies Gaussian distribution, the least square method and the maximum likelihood method can be equivalent.

When gradient descent is used to solve model parameters, batch mode or logic mode can be used. Generally speaking, the accuracy of batch mode is higher, while the complexity of logical mode is lower.

As mentioned above, although perceptron is the simplest linear classifier, it can be regarded as the basic unit of deep learning, and model parameters can be solved by automatic encoder and other methods.

One of the advantages of deep learning can be understood as feature abstraction, which can obtain high-order features from the bottom feature learning and describe more complex information structures. For example, the edge contour features describing the texture structure are abstracted from the pixel layer features, and the higher-order features representing local objects are obtained through further learning.

As the saying goes, two heads are better than one. Whether it is linear classification or deep learning, the single model algorithm is single-handed. Is there any way to further improve the accuracy of model data processing? Of course, the Ensembe l model is to solve this problem. Bagging is one of the methods. For a given data processing task, different models/parameters/features are used to train multiple sets of model parameters, and finally the final result is output by voting or weighted average.

Boosting is another method of model integration. The idea is to improve the overall processing accuracy of data samples by adjusting the loss weight of error samples in each iteration of the model. Typical algorithms include AdaBoost and GBDT.

For different data task scenarios, we can choose different model integration methods, and for deep learning, we can use DropOut method for hidden nodes to achieve similar results.

This paper introduces so many basic algorithms of machine learning, and discusses the basic standards for evaluating the advantages and disadvantages of models. Underfitting and overfitting are two common situations. The simple way to judge is to compare the relationship between training error and test error. Under-fitting can design more features to improve the training accuracy of the model, while over-fitting can optimize the number of features to reduce the complexity of the model and improve the testing accuracy of the model.

Feature quantity is an intuitive reflection of model complexity. It is a method to set the input feature quantity before model training. Another common method is to introduce regular constraints of feature parameters into the objective function/loss function during model training, and select high-quality features based on the training process.

Model tuning is a meticulous work, which ultimately needs to be able to give reliable prediction results to the actual scene and solve practical problems. I look forward to applying what I have learned! I don't understand, this article is transferred from Ali Technology, and it needs authorization to reprint.

Previous article: basic machine learning algorithm
Next article:What is OTA?