Traditional Culture Encyclopedia - Traditional festivals - How to become a qualified algorithm engineer?

How to become a qualified algorithm engineer?

Becoming a qualified development engineer is not a simple matter. You need to master a series of abilities from development to debugging to optimization, and each ability requires enough effort and experience. It is even more difficult to become a qualified machine learning algorithm engineer (hereinafter referred to as algorithm engineer), because in addition to mastering the general skills of engineers, it is also necessary to master a huge knowledge network of machine learning algorithms.

Let's break down the skills needed to become a qualified algorithm engineer, and then look at the skills needed to become a qualified algorithm engineer.

1. Basic development capability

The so-called algorithm engineer needs to be an engineer first, so he must master some abilities that all development engineers need to master.

Some students have some misunderstandings about this point, thinking that the so-called algorithm engineer only needs to think and design algorithms. No matter how these algorithms are realized, someone will help you realize the algorithm scheme you come up with. This idea is wrong. In most positions of most enterprises, algorithm engineer needs to be responsible for the whole process from algorithm design to algorithm implementation to algorithm online.

I have seen some enterprises implement the organizational structure of separating algorithm design from algorithm implementation, but under this structure, it is not clear who is responsible for the algorithm effect, and both algorithm designers and algorithm developers are full of bitterness. The specific reasons are beyond the scope of this article, but I hope you can remember that all algorithm engineer need to master basic development skills.

2. Probability and statistical basis

Probability statistics can be said to be one of the cornerstones of machine learning. From a certain point of view, machine learning can be regarded as a systematic thinking and cognitive way to the uncertain world based on probabilistic thinking. Learning to look at problems from the perspective of probability and describe problems in probabilistic language is one of the most important foundations for deeply understanding and skillfully using machine learning technology.

There are many contents in probability theory, but they are all embodied by concrete distribution, so it is very important to learn the commonly used probability distribution and its various properties well.

For discrete data, Bernoulli distribution, binomial distribution, polynomial distribution, beta distribution, Dirichlet distribution and Poisson distribution all need to be understood.

For offline data, Gaussian distribution and exponential distribution family are more important distributions. These distributions run through various models of machine learning, and also exist in various data of the Internet and the real world. Only by knowing the distribution of data can we know how to deal with them.

In addition, the relevant theories of hypothesis testing also need to be mastered. In this so-called era of big data, the most deceptive thing is probably data. Only by mastering hypothesis testing, confidence interval and other related theories can we distinguish the authenticity of data conclusions. For example, whether there are really differences between the two groups of data, whether the indicators have really improved after the introduction of the strategy, and so on. This kind of problem is very common in practical work. If you don't master the relevant capabilities, it is equivalent to the blink of an eye in the era of big data.

In statistics, some commonly used parameter estimation methods also need to be mastered, such as maximum likelihood estimation, maximum posterior estimation, EM algorithm and so on. These theories, like optimization theory, are applicable to all models and are the foundation of foundation.

3. Machine learning theory

Although there are more and more open source toolkits out of the box, this does not mean that algorithm engineer can ignore learning and master the basic theory of machine learning. This has two main meanings:

Only by mastering the theory can we use all kinds of tools and skills flexibly, instead of copying them blindly. Only on this basis can we really have the ability to build a machine learning system and constantly optimize it. Otherwise it can only be regarded as a machine learning to move bricks, not a qualified engineer. The problem will not be solved, let alone optimized.

The purpose of learning the basic theories of machine learning is not only to learn how to build a machine learning system, but more importantly, these basic theories embody a set of thoughts and thinking modes, including probabilistic thinking, matrix thinking, optimal thinking and other sub-fields. This set of thinking mode is very helpful for data processing, analysis and modeling in today's big data era. If you don't have this set of thinking in your mind and are still thinking about problems with old non-probabilistic and scalar thinking in the face of big data environment, then the efficiency and depth of thinking will be very limited.

The theoretical connotation and extension of machine learning are very extensive, which is by no means exhaustive in an article, so here I list some core points and introduce some useful contents in practical work, so that you can explore and learn after mastering these basic contents.

4. Develop languages and tools

We have mastered enough theoretical knowledge and need enough tools to put these theories to the ground. In this section, we introduce some commonly used languages and tools.

5. Architectural design

Finally, we spend some time talking about the architecture design of machine learning system.

The so-called machine learning system architecture refers to a whole system that can support machine learning training, prediction, stable and efficient operation of services and their relationships.

When the business scale and complexity develop to a certain extent, machine learning will definitely move towards systematization and platformization. At this time, it is necessary to design a set of overall architecture according to the characteristics of business and machine learning itself, including the architecture design of upstream data warehouse and data flow, the architecture of model training and the architecture of online service. The learning of this framework is not as simple as the previous content, and there are not many ready-made textbooks to learn. More importantly, it is an abstract summary based on a large number of practices, and the current system is constantly developing and improving. But this is undoubtedly the most worthwhile job on algorithm engineer's career path. The advice that can be given here is to practice more, summarize more, abstract more and iterate more.

6. Current situation of machine learning in algorithm engineering

Now it can be said that it is the best era of machine learning in algorithm engineer, and the demand for such talents is very strong in all walks of life. Typically includes the following sub-industries:

Recommendation system. The recommendation system solves the problem of efficient matching and distribution of information in massive data scenes. In this process, machine learning plays an important role in candidate set recall, result ranking and user portrait.

Advertising system. There are many similarities between advertising system and recommendation system, but there are also significant differences. In addition to the platform and users, we should also consider the interests of advertisers. The two parties have become three parties, which makes some problems more complicated. It is also similar to recommendation in the use of machine learning.

Search system. Machine learning technology is widely used in many aspects of search system infrastructure and upper ranking, and in many websites and apps, search is a very important traffic portal. The optimization of search system by machine learning will directly affect the efficiency of the whole website.

Wind control system. Risk control, especially internet financial risk control, is another important battlefield of machine learning that has emerged in recent years. It is no exaggeration to say that the ability to use machine learning can largely determine the risk control ability of an Internet finance enterprise, and the risk control ability itself is the core competitiveness of these enterprises' business guarantee. You can feel this relationship.

But as the saying goes, "The higher the salary, the greater the responsibility", and the requirements of enterprises for algorithm engineer are gradually increasing. On the whole, a high-level algorithm engineer should be able to handle the whole process of "data collection, data analysis model training and optimization model online", and constantly optimize each link in the process. An engineer may start with a link in the above process when he is getting started, and constantly expand his ability range.

In addition to the fields listed above, there are many traditional industries that are constantly tapping the ability of machine learning to solve traditional problems, and the future of the industry can be described as great potential.