Traditional Culture Encyclopedia - Traditional culture - Technical methods of speech detection

Technical methods of speech detection

Voice Activity Detection (VAD) is used to detect the initial position of voice signals and separate voice segments from non-voice segments (silence or noise). VAD algorithms can be roughly divided into three categories:? VAD based on threshold, VAD based on classifier and VAD based on model. ?

Phonetics, the material shell of language, is the external form of language and a symbol system that directly records people's thinking activities. It is a sound with certain social significance issued by human vocal organs. The physical basis of pronunciation mainly includes pitch, sound intensity, sound length and timbre, which are also the four elements of pronunciation.

Speech synthesis is a technology that produces artificial speech by mechanical and electronic methods. TTS technology (also known as text-to-speech conversion technology) belongs to speech synthesis. It is a technology that transforms the text information generated by the computer itself or input from the outside into understandable and fluent spoken Chinese output.

Phonetics refers to the sounds made by human beings through vocal organs, which has certain significance and is used in social communication. Among the three basic attributes of language: form, sound and meaning, sound is the first attribute. Human language was first formed in the form of pronunciation. There are languages without words in the world, but there is no language without pronunciation, and pronunciation plays a decisive auxiliary role in language.