Traditional Culture Encyclopedia - Traditional festivals - How to design the function of car voice software from scratch

How to design the function of car voice software from scratch

With the popularity of intelligent hardware in vehicles, more and more vehicles are equipped with speech recognition function. At present, the most important function of in-vehicle system is in-vehicle voice. I have been in contact with the car voice function for many years. Here, I want to share with you how to design the car voice function from scratch from the perspective of voice service integration. Welcome to exchange and discuss.

In-vehicle market voice technology solutions companies mainly include: Iflytek, Nuance, Baidu, Spirits, Yunzhisheng, etc. Because the promotion scope and strength of Tencent voice service in the vehicle field are relatively small, Tencent voice service is worse than the mainstream solution providers in the market in terms of service and quality, and the possibility of exerting strength will not be ruled out in the future. Ali's voice service is mainly used in AliOS, and currently it is mainly mass-produced on Roewe models.

Voice recognition ability-Note: When the speed is lower than 80 km/h, the recognition rate of buses in confined space can be kept above 95%.

Speech synthesis ability-Note: The highly anthropomorphic experience broadcast by TTS is a test of R&D investment, and the actual experience effect varies greatly.

Oral understanding of dialects-note: high robustness is the key, otherwise the phonetic function is chicken ribs.

Semantic recognition-Note: For online service integration, the ability of resource service integration of each family is basically the same, with slight differences.

Multi-round dialogue-Note: Some manufacturers support multi-round dialogue in specific scenarios. To be honest, the current experience is not very good.

The market competition is fierce, so far there is no clear business model, and everyone is in the stage of capital investment. Each function is gradually converging, and it is also changing from a simple technology provider to a technology platform, and the role of providing a holistic solution provider is changing.

Weilai automobile NOMI voice assistant

For vehicle-mounted projects with general development ability and low customization requirements, when accessing voice services, the integrated voice SDK provided by solution providers is basically used for secondary development, or APK of service providers is used for simple customization and adaptation. The advantage of doing this is to save a lot of development costs and ensure the quality of the core voice service module.

Intelligent vehicle: A highly integrated system platform can better support voice usage scenarios, and make voice, system and vehicle trinity, resulting in better linkage effect;

Intelligent rearview mirror: generally, it is mainly assembled in the form of afterloading. Compared with smart cars, the smart rearview mirror system is slightly lighter, focusing on giving more system resources to the driving record function, and the voice function will only be responsible for some simple tasks.

Intelligent HUD: The core resource focuses on the projection of information such as vehicles, roads and safety during driving, and pays more attention to the quality of visual presentation. The voice function is an important function of auxiliary operation;

Car speaker companion: Voice service is the core function of speaker products, whether it is car or home speakers, while car speakers are more aimed at the scenes in the car space, mainly focusing on the interactive dialogue experience of users and the richness of car life services.

Business architecture diagram

This will involve self-built TSP platform, voice service provider, hardware manufacturer, Internet service provider, tripartite hardware and other roles. The whole business architecture is in the form of hardware as the carrier and service platform integration, which is packaged and provided to the terminal owners and users.

The main logical point of voice operation for car owners and users is to judge whether the platform is self-built or tripartite. It is necessary to filter the data, prepare the corresponding service resources and execute the returned results.

If there is no customized self-built service resource platform, business processes can be customized appropriately. This flow chart is for your reference only.

Car voice is mainly divided into the following functional modules, excluding customized voice semantic functions, and the business part has also been deleted accordingly.

As we all know, there are two main ways to start voice, interface click and voice wake-up.

When we design the voice wake-up function point, we will judge and record the voice wake-up mode at the start step. After the voice service is started, we will present the prompt information and feedback of voice access status. In the process of recognition, we will mainly judge whether the voice input is normal. If it is normal, we will request the background and return the corresponding recognition results. If there is an interruption, we need to restart the voice stream.

The semantic richness of automobile voice directly affects the direct use experience of voice function. The lack of supported semantics will make users feel that the voice function is too simple to meet the needs of users' scenes, thus losing their favor on the product function and giving up using the voice function. How to define the mapping relationship between user satisfaction and semantic integrity, which needs to be based on the investigation and analysis of users and the summary of experience in the actual work process to get the corresponding relationship between requirements and products.

Navigation scene

Music/radio scene

Telephone scene

System control class

Vehicle control level

Customized service category

As shown in the figure below (business requirements have been deleted, please do not copy mechanically), and the corresponding scenarios can be further subdivided. Of course, there are many semantic scenes, and the core functional scenes for the car scene have been covered. More scenes need to be customized according to the market demand of vehicles. We can refer to Maslow's hierarchy of needs theory and classify the scene requirements according to the driving scenes to guide our semantic design strategy.

Help: There are two main prompt scenarios. The first is to wake up the home page with voice (it is not recommended to display voice globally), and the second is to give prompt information when voice fails or waits to help guide users to use voice functions correctly.

Settings: This will mainly set the basic functions of voice, such as the commonly used wake-up switch, wake-up words, sound source logic, voice theme package changes and settings.

As the last step in the process of interacting with users, the function of voice broadcasting can be said to be closely related to users. TTS(Text To Speech) voice broadcast is mainly to intelligently synthesize text information through AI technology and relay it to users, thus giving users an intelligent anthropomorphic interactive experience.

At present, the AI synthesized speech in the industry scheme is mostly based on the recorded basic speech materials for secondary processing, which is essentially inseparable from the constraints of the recording of basic speech materials, so the category, quantity and quality of speech packets are slowly improved. Therefore, because TTS function is limited by the comprehensive ability of service providers in product design, we will pay more attention to how to better improve the experience of voice interactive dialogue in product function integration.

The main appeal of man-machine dialogue is the exchange of information, the second is that dialogue can make users feel happy in the interactive experience, and finally, in the whole dialogue stage, there are good guidance and error avoidance strategies, which can ensure more empowerment when meeting users' needs, and reduce users' bad experiences through circuitous strategies when they cannot meet users' needs.

As far as the overall situation of voice function is concerned, the overall state and performance of automobile voice products are not very mature, and the application of artificial intelligence is still in the primary application stage. How to better use AI technology to improve service quality, how to improve multi-round interactive experience through NLP technology, and how to better enrich platform content resources all need more resources and a lot of time to gradually improve.

This paper introduces the main function design and usage scenario analysis of car voice software, mainly to help you better understand car voice. How to make the voice function improve the user experience and satisfaction more effectively requires more in-depth differentiated research and design according to the actual needs of each business and the characteristics of the target user group.