Traditional Culture Encyclopedia - Traditional festivals - Search engine synonyms, near synonyms, superlatives mining
Search engine synonyms, near synonyms, superlatives mining
? In the e-commerce search environment, synonyms are divided into several categories:
?1. brand synonyms: nokia=Nokia, Adidas=Adidas
?2. product synonyms: projector ≈ projector, phone ≈cell phone;?automobile and car.
?3. old and new words: bicycle ? -> bicycle
?4. Southern and northern words: tomato -> tomato.
?5. Traditional synonyms: locker and organizer.
?6. Wrong synonyms: yoga and yoga (incorrectly written as oblique wangbian)
Corresponding to the English language, there are also stem extractions, such as singular and plural, the original form of the verb, and the form of the ing; there is also a special phenomenon in the English language, for example, two words that can be written separately, or merged together, for example, keychain and key chian (key chain).
There are many more near-synonyms: ? including size plus-size ≈ plus-size; shorts and hot pants; border and borderline.
? Superordinate word: Apple phone Superordinate word is cell phone.
Antonyms: loose and slim. When we do query rewrites, rewrites should never rewrite antonyms.
If we look carefully, we will find that some words can be replaced by each other, and some words can only be replaced in one direction (in another direction is not right, for example, Jay Chou can be replaced by Zhou Dong, but Zhou Dong can only be replaced by Zhou Dong under certain circumstances).
We can get from user search terms, commodity titles, searches and clicks. The most fundamental source is still the merchants' optimization of item titles; smart merchants will stack synonyms in their titles in the expectation of getting more traffic.
Looking at the click logs, if w1 and w2 are synonyms, then searching for w1 and searching for w2 will theoretically result in a huge number of *** same clicks on items x1, x2, x3 and so on.
? Headline commodity titles get a large corpus, e.g. projector and projector, draw bar box (draw bar box) and suitcase (luggage).
Find highly relevant words by training the relevance of the words through statistics or word2vec. Count the number of times these words **** together in the headline, i.e. the number of **** occurrences of w1 and w2.
fromgensim.test.utilsimportcommon_texts,get_tmpfile
fromgensim.modelsimportWord2Vec
model_path=". /data/word2vec_en_50d.model"
model=Word2Vec.load(model_path)
model.wv['computer']
Out[6]:
array([- 0.48867282, -0.10507897, -0.23138586, -0.10871041,? 0.1514824 ,
? -0.01487145, -0.385491? ,? 0.01792672, -0.32512784, -0.9063424 ,
? -0.5428677 ,? 0.6565156 ,? 0.02183418,? 0.07939139,? 0.03485253,
0.319492? , -0.27633888,? 0.52685845, -0.0582791 , -0.4844649 ,
0.249212?,? 0.8144138 , -0.03233343, -0.36086813,? 0.34835583,
? -0.07177112,? 0.0828275 ,? 0.6612073 ,? 0.74526566, -0.12676844,
? -0.08891173, -0.08520225, -0.04619604,? 0.13580324,? 0.183159?,
0.15528682,? 0.01727525, -0.43599448, -0.2579532 , -0.23192754,
? -0.32965428, ? 0.09547858,? 0.00419413, -0.06285212,? 0.18150753,
? -0.21699691,? 0.60977536, -0.06555454,? 0.35746607, -0.06610812],
? dtype=float32)
In[13]:
model.wv.similarity('case','cover') # case and cover are basically synonyms when describing phone cases
Out[13]:
0.8538678
In[22]:
defget_top_sim(word):
similary_words=model.wv.most_similar(word,topn=10)
forw,sinsimilary_words:
print(word,"=",w,s)
?
get_top_sim('case')
case = holder 0.8879926800727844
case = clamshell 0.887456476688385
case = tablet 0.8748524188995361
case = storage 0.8703626990318298
case = carrying 0.8672872185707092
case = hardcase 0.8580055236816406
case = carring 0.8558304309844971
case = seal 0.8552369475364685
case = cover 0.8538679480552673
case = stand 0.8476276993751526
With word2vec, we can find out the original word and the 10 most similar words, then we count the number of times that ORIGIN and SUBSTITUTE (original and alternative words) *** appear in the title, through this mining, we find a large number of candidate pairs of words. Such words can be candidates for synonyms by manual REVIEW.
Extending this slightly, we get the correspondence from synonym query to synonym query.
Statistical analysis of superordinate words, statistics of product words under each product category, the number of occurrences of top n of the product word w, corresponding to the product category word c, then w -> c is likely to be a superordinate word relationship.
In the maintenance of the word list, we must not forget the manual word list. Manual word lists must be maintained with backend tools.
1, in the commodity title corresponding to the index word to do synonym expansion, when not used regardless of which one of the synonyms to search can be searched.
2, in the QueryProcess module, the word to do synonym expansion, do the rewriting of near-synonyms, rewritten near-synonyms of the weight than the weight of the original word is smaller. In the rewriting, we will also encounter a problem, Q (split into w1, w2, w3) rewritten into q1 (w1, w2) and q2 (w2, w3), we will encounter the problem of how to calculate the relevance of q1 and q2 respectively and Q.
?3, when query to do synonym rewriting, need some words to do context (context). For example, "Zhou Dong's new song" can be modified to "Jay Chou's new song", but "Zhou Dong's company" may not be Jay Chou's company.
References:
1, Search Engine Synonym Feedback Mechanism Baidu Search R&D
2, /p-1136208118.html
3, Synonym Mining for Retrieved Information
- Previous article:What is Sima Qian's historical thought?
- Next article:What is the custom of Shaanxi Double Ninth Festival?
- Related articles
- What is the power source and running principle of the train?
- Gratitude Inspirations: Gratitude Inspirations.
- What kinds of miscellaneous grains do you mean?
- Which country is the Mid-Autumn Festival
- Design the Party Building Cultural Wall
- How do older foreign women working in first-tier cities find reliable marriage partners?
- Artisan spirit part
- Furniture characteristics of the Tang Dynasty furniture
- The representative dishes of salt dishes are as follows
- How to play new media marketing? Please accept the encyclopedia of raiders.