Traditional Culture Encyclopedia - Traditional stories - Technical field of analyzing and processing text features

Technical field of analyzing and processing text features

The application scenario is as follows:

1. Property insurance: insurance application, claim application form, medical certificate and contract review can be automatically filled in.

2. Great health: key information extraction of public relations release materials, disease prescription review, medical papers and drug instructions.

3. Retail: product description comparison, product packaging error correction; Information extraction of transport documents.

4. Manufacturing industry: invoices, purchase and sales orders, transport logistics sheets and contract review.

Key technologies:

Describe text with vector space model. Convert unstructured text into structured text.

Why not use word frequency statistics and word segmentation algorithm? Because the dimension of feature vectors obtained by these two methods is very large, and the cost of vector processing in the later stage is also very high, which is not conducive to classification and clustering in the later stage.

The mainstream method is to use characteristic words to represent the text, which must meet the following requirements: being able to identify the content of the text, being able to distinguish other texts, not too many, and being easy to realize.

After the feature words are selected, they should have corresponding weights to express different influences, and it is best to sort them.