Traditional Culture Encyclopedia - Traditional stories - What is the application of artificial intelligence in TV?

What is the application of artificial intelligence in TV?

The application of artificial intelligence in TV human-computer interaction

No matter traditional TV or smart TV, the problem to be solved is the same, that is, "how to make users get content conveniently". There are two key points here: "convenience" and "content" In terms of convenience, traditional TV is similar to Internet TV, which is based on remote control for human-computer interaction. "Content" is the biggest difference between traditional TV and Internet TV, which needs no elaboration. The rapid development of artificial intelligence technology has greatly improved the user experience at these two key points.

Keywords: artificial intelligence human-computer interaction deep learning

Far-field speech NLP natural language processing ASR speech recognition

The concept of "artificial intelligence (AI)" appeared in 1956, but limited by the computing power and algorithm theory of the computer at that time, it was not applied to real life, and naturally few people knew it. With the development of

GPU capability and deep learning theory, artificial intelligence technology has finally entered the production stage from laboratory theory, and has made rapid progress in various fields. Internet TV is one of them.

before discussing the application of artificial intelligence technology in TV, we need to clarify some basic concepts: the so-called artificial intelligence refers to the intelligence displayed by artificial machines. This kind of intelligence may simulate people's thinking, or it may be completely different from people. At present, the core of research is mainly "self-learning like people". Machine learning is a branch of artificial intelligence, and deep learning is a branch of machine learning. The study of completely different ways of thinking is still a philosophical problem.

whether it is traditional TV or internet TV, the problem to be solved is the same, that is, "how to make users get content conveniently". There are two key points here: "convenience" and "content" In terms of convenience, traditional TV is similar to Internet TV, which is based on human-computer interaction of remote control. "Content" is the biggest difference between traditional TV and Internet TV, which needs no elaboration. The rapid development of artificial intelligence technology has greatly improved the user experience at these two key points.

It's convenient to talk about it first

Because the artificial intelligence technology has been able to reach 9%

intention recognition rate in the field of natural language processing, it is possible to directly use natural language to control TV to obtain content. What I want to emphasize here is "natural language", and a language like "Give me some European gangster movies with the same level as" The Godfather "is the natural language, not the" machine language "such as" increasing the volume by 2% "that some brand manufacturers often use. Understanding and feedback of natural language is one of the key indicators to measure the artificial intelligence level of a TV set.

A few years ago, TV with voice remote control can't be called artificial intelligence TV, the main reason is that it can only recognize fixed instructions, and artificial intelligence TV can not only understand natural language, but also learn by itself online, so as to understand more users' intentions by analogy, and the more it is used, the more accurate it is.

in the process of dialogue, human beings will automatically bring context. For example, in the first conversation, a user asks, "Are there any good movies", and then he may ask, "No Hollywood movies" or "Only this year's movies". This kind of dialogue is based on context, and we call it multi-round dialogue. Whether to support multiple rounds of dialogue is also a key indicator to measure the level of artificial intelligence of a TV.

in addition to semantic understanding, convenience is also reflected in the far-field sound receiving ability. It can make users no longer need to "hold down and talk" with the remote control, but call the TV to talk to it anywhere in the living room. The typical scene is: "Storm ears, what good movies are recommended recently?" "How to make fish-flavored shredded pork?" "Remind me to go to the airport at 7: tomorrow morning"

Figure 1 Voice-evoked service in Storm TV

Far-field sound collection is realized by microphone array, which was always the research object in the laboratory until Amazon launched

Echo smart speaker, and finally realized scale production. Microphone array needs at least two microphones, and there are 4Mic, 6Mic or even 8Mic schemes on the market at present.

The array can sense the special waveform of the user's speech from the background noise, and accurately receive the sound in the direction of the user's position through beamforming technology, ignoring the noise in other directions. Manufacturers will choose different microphone arrays according to the characteristics of the equipment. Generally speaking, TV uses linear microphones, and smart speakers use ring microphones.

The two main layouts of microphone arrays in Figure 2

The author has been paying attention to the development of Amazon

Echo

. In the actual experience, it is found that there are still big defects in pure voice interaction, but the experience will be better if far-field voice is applied to TV. For example, the user basically doesn't know what to say to operate a smart speaker that has no display at all; In the face of a TV with a big screen, the user's nervousness will be reduced a lot, because the screen always reminds the user what he can say to operate the TV at present. Google calls this interactive mode "visual feedback", and applies this feedback interactive mode to the latest "Google

Assistant for

Android TV" system just released in October this year. At present, the interactive mode of Storm AI

TV is also similar. At the same time, Amazon is also aware of this problem, and soon launched an "EchoShow" with a screen as a supplement.

Figure 3 Visual feedback tips of Storm AI TV

At present, the latest technology can not only recognize human voices, but also distinguish voiceprints of different people, and realize more advanced operations, such as shopping, payment and personalized recommendation. Amazon and Google abroad, iFLYTEK and Ruoqi in China all have this technology. The ability of natural language understanding and far-field voice processing will eventually make TV users get rid of the remote control and make a huge leap in human-computer interaction, which is no less than Apple's

iPhone with no keyboard but only touch screen.

Besides content

Besides natural language understanding, artificial intelligence is actually more widely used in personalized content recommendation. AI

can abstractly sort out the user's "voiceover" from a large number of user conversations and user behaviors, understand the user's preferences and habits, and then actively recommend the content that the user may like to him according to these characteristics. Sometimes, the system will recommend a type of content that a user has never been exposed to, and the user will exclaim, "This is so beautiful", and he may not even realize that this kind of content will appeal to himself. This kind of intelligent recommendation has been widely used in Internet products, typical of which is today's headlines. Traditional personalized content recommendation is mainly based on tag system. First of all, operators should "label" all the contents, such as "terror", "blood", "second element" and "city", and the workload is extremely huge, and the accuracy depends entirely on the level of operators; Then the system draws a portrait of the user according to the user's behavior, and extracts tags to match. In this process, various professional recommendation algorithms were born, and technicians adjusted the parameter optimization algorithms at any time to improve the opening rate.

There are differences and connections between the personalized recommendation system based on

AI and the traditional recommendation system. The biggest difference is the label system. The "tags" in the AI

recommendation system are actually automatically extracted from records such as content and behavior logs, without the participation of operators. For example, we can extract the attribute tags from the movie metadata (the introduction of the starring director, etc.), extract the user's attribute tags from the user's Weibo and Douban comments, and then rely on the

GPU chip to carry out large-scale matrix operation, gradually reduce the high-dimensional vector data to three-dimensional space, and finally simplify it to give recommendations according to the aggregation in three-dimensional space. The principle is similar to "collaborative filtering" in traditional recommendation systems. Simply put, it is assumed that a person likes a movie, so his good friends may also like that movie.

personalized recommendation of "thousands of people and thousands of faces" in turn promotes the change of TV interface.

Traditional TV uses the concept of "program schedule" to compile channels, and users like to wait for the next time if they are not mistaken. Internet TV is completely based on the on-demand mode, and you can watch a lot of content casually. Don't blame me if you can't find it. Smart TV based on

AI changes the traditional "people looking for content" into "people looking for content", and AI

brings the dishes you might like to you, and tastes them before buying them. "Taste" is to preview the wonderful scenes in the complete film for users, and guide users to watch long films with short films, which reduces the difficulty of users' choice and saves users' time. Please note that the wonderful bridge is not just a VCR for film promotion, but how to choose the bridge is also a subject, so we can start another article.

Greater Possibility

The application of artificial intelligence in TV is not only human-computer interaction and video content recommendation, but also can be used to recommend any content service. As mentioned earlier, the use of far-field voice has changed the man-machine interaction mode of TV, so the TV interface is no longer bound by the tree menu structure of traditional TV, and can accommodate more content services, and users can directly serve.

The typical usage of TV with the characteristics of artificial intelligence is as follows:

● "Help me find a classic literary film of the 198s";

● "Play some Jay Chou songs casually";

● "Buy some ternary milk like the one I bought last time", "Yes" and "Buy two more boxes";

● "How to get to Dayanpear", "Yes, it's the nearest one";

● "Remind me to turn off the fire in half an hour";

● "Good night (turn off the smart electrical equipment at home and put the TV to sleep)".

It can be seen that the TV set with artificial intelligence technology has greatly surpassed the use method and scope of traditional TV sets. Television can help users choose content, choose services, help users control smart home appliances, remind users of memos, and even help users place orders to buy daily necessities. These are not imaginary scenes, but scenes that have become reality.

TV is still TV, but TV is no longer a TV. It has become a big-screen terminal for family assistants. And the brain of this "family assistant" is artificial intelligence.