Traditional Culture Encyclopedia - Traditional customs - Knowledge map completed

Knowledge map completed

Foreword and background: In the process of constructing knowledge map, a lot of knowledge information comes from documents and web pages, and there are often deviations in the process of extracting knowledge from documents. These deviations come from two aspects: (1) There will be a lot of noise information in the document, that is, useless information, which may come from the knowledge extraction algorithm itself or be related to the effectiveness of the language itself; (2) The amount of information in the document is limited, and it will not cover all knowledge, especially a lot of common sense knowledge.

All of the above will lead to the incompleteness of knowledge map, so the completeness of knowledge map is becoming more and more important in constructing knowledge map.

It is often mentioned that only the extraction of entities and relationships is mentioned in the process of knowledge map construction, and then RDF composed of entities and relationships can be generated.

However, it is not enough to obtain triples, and these should be considered, because the entities in triples can be mapped to types associated with the hierarchy of knowledge concepts in addition to their attributes and relationships, and an entity can have multiple types.

For example, Obama's entity type is different in different relationships.

In the description of birth information, the type is human; In the description of creating memoirs, it can also be a writer; You can also be a politician in the job description.

Here: there are levels among the concepts of people, writers and politicians. This is a hierarchical model of concepts.

As mentioned in the previous example, once an entity is identified as a human type, it still needs to search for lower concepts in addition to the human type in order to find more category description information.

Ontology and pattern: both entities can belong to an ontology, and this ontology has a set of patterns to ensure its uniqueness, which can be described by rules, so for ontology, this set of rules can be used to describe it.

For example, Obama is an entity, and his ontology can be attributed to people, while the human model is to use language and tools to transform other affairs, and so on. These patterns can be described by rules, so the rule reasoning method based on description logic appears.

Description logic is a common knowledge representation, which is based on concepts and relationships.

For example, you can collect entity instances (which can be text) about people, extract patterns from them, and record them in the form of rules. In this way, as long as you encounter a new entity instance, you only need to substitute the previously recorded rules for comparison to make a judgment. If it conforms to the rules, it means that the instance can be classified as a human conceptual type, otherwise it will be judged as a non-conceptual type.

After experiencing the development stage of rule reasoning based on descriptive logic, machine learning related research began to occupy the mainstream. At this time, we should not only use internal clues such as rules generated by examples to judge, but also use external characteristics and clues to learn type prediction.

For an entity e 1 of unknown type, if an entity e2 of similar and known type can be found, it can be inferred that the type of the entity e 1 should be the same as or at least similar to that of e2.

This kind of method can be divided into three directions: content-based type reasoning, link-based type reasoning and statistical relationship-based type reasoning (such as Markov logic network).

Embedded learning and deep learning are introduced into type reasoning. Most types of reasoning methods based on machine learning assume that there is no noise in the data, and its characteristics still need to be considered as selection and design. Introducing deep learning can avoid feature engineering. Type reasoning should be based on text content, and also need the support of other features such as link structure. At this time, embedded methods can play their own advantages.

It can be understood that for an example triple (SPO, subject-predicate-object), the possible omissions are (? ,P,O),(S,? , o) or (s, p,), just like there is no triplet in the knowledge base, so it is necessary to predict what the missing entity or relationship is.

Note: Sometimes knowledge is not missing, but new, that is, a new triple appears, which was unknown in the original knowledge base. At this time, it needs to be added to the knowledge base as new knowledge, but this situation is not completed in the traditional sense.

① Structural embedding characterization

② Tensor neural network method

③ Matrix decomposition method

④ Translation methods

Cross-knowledge base completion method, knowledge base completion method based on information retrieval technology and common sense knowledge completion in knowledge base.

(1) Solve the sparsity of long tail entities and relationships.

There will be many examples of the relationship between celebrities and stars, but there are few examples of ordinary people, but they are a dime a dozen, which leads to the sparse examples of their related relationships, and this situation will become more obvious with the increase of the number.

(2) One-to-many, many-to-one and many-to-many problems of entities.

For large-scale data, it is not as simple as tens or tens of orders of magnitude, but hundreds of orders of magnitude. Traditional solutions can't be effective, and Shenzhen can't solve this order of magnitude relationship learning problem at all.

(3) The dynamic increase and change of triplet lead to the aggravation of the dynamic change of KG.

New knowledge is constantly produced, and the previous knowledge may be later proved to be wrong or need to be revised. All these will make the process of knowledge completion need to be revised and changed. How to make the knowledge map completion technology adapt to the dynamic change of KG is becoming more and more important, but this technology has not attracted enough attention.

(4) The predicted path length of the relationship in 4)KG will continue to increase.

The length of relationship prediction reasoning is limited, but when a large-scale knowledge map flashes, the relationship path sequence between entities will become longer and longer, which requires a more efficient model to describe a more complex relationship prediction model.

References:

Shark Wang, Du Zhijuan and Meng Xiaofeng. Research progress of large-scale knowledge map completion technology [J]. China Science: Information Science, 2020,50 (04): 551-575.