Traditional Culture Encyclopedia - Traditional stories - Basic terms and concepts of NLP-I

Basic terms and concepts of NLP-I

Words are the smallest meaningful language elements that can move independently. English words take spaces as natural separators, while Chinese words take words as basic writing units, and there are no obvious distinguishing marks between words. Therefore, Chinese word segmentation is the basis and key of Chinese word segmentation. Both Chinese and English need word segmentation, but in comparison, English words can be segmented with spaces, which is relatively convenient to handle. But because there is no separator in Chinese, the problem of word segmentation is more important. The longest string matching based on dictionary is often used in word segmentation, which is said to solve 85% of the problems, but ambiguous word segmentation is more difficult. For example, the US will pass the arms sales bill to Taiwan, which can be divided into the US/Congress/Taiwan Province arms sales bill and the US/Congress/Taiwan Province arms sales bill.

Chinese word segmentation technology can be divided into three categories:

In the method based on machine learning, it is often necessary to mark the part of speech of words. Part of speech generally refers to verbs, nouns, adjectives, etc. The purpose of labeling is to express the hidden state of a word, and the transformation of hidden state constitutes a state transformation sequence. For example: I /r love /v Beijing /ns Tiananmen /ns. Where ns stands for nouns, v stands for verbs, ns and v are labels, and so on.

Part of speech, as a generalization of words, plays an important role in language recognition, syntactic analysis, information extraction and other tasks.

/s/qjpozo 8 mt 17 mtnc 7 eft 8 NQ