Traditional Culture Encyclopedia - Traditional festivals - Present situation and development of network information retrieval

Present situation and development of network information retrieval

Before 1990, the present situation and development of network information retrieval no one could retrieve information on the internet. It should be said that all network information retrieval tools began with Archie invented by Alan Emtage and others in 1990, although only FTP file retrieval in a simple sense could be realized at that time. With the appearance and development of the World Wide Web, information retrieval tools based on web pages appear and develop rapidly. 1995 The meta-search engine based on the network information retrieval tool itself was invented by Eric Selberg of the University of Washington. With the development of network technology, network information retrieval tools have also been fully developed, so what is the current situation and development trend of these retrieval tools? This article will discuss this.

1. The present situation and development trend of network information retrieval tools

The status of 1. 1. Web pages are the most important part of the Internet and the most important source for people to obtain network information. In order to facilitate people to find the information they need in a large number of complex web pages, this kind of retrieval tools have developed the fastest. Generally speaking, there are two kinds of information retrieval tools based on web pages: network search engine and network classified catalogue. Network search engine is a retrieval tool that searches web pages through automatic web search software such as "Web Spider", and then automatically indexes some or all characters on the web pages to form a target abstract format file and a network-accessible database for people to retrieve network information. Web directories are completely different from search engines. It doesn't put all the pages of every website in the whole network, but the professionals carefully select the homepage of the website and put it in the corresponding category. The information of network directory is much less than that of search engine, and the different classification standards of network directory are somewhat confusing, which brings inconvenience to people. Therefore, although its index quality is relatively high, it uses far fewer people than search engines.

However, due to the complexity of network information and the limitations of network retrieval technology, such retrieval tools also have obvious shortcomings. (1) With the rapid increase in the number of web pages, it is no longer possible to effectively classify, index and utilize them manually. Internet users are faced with massive unorganized information and simple keyword search, and the amount of information returned is too large for users to bear. (2) It is difficult to evaluate the usefulness of information. In order to improve their status, some websites repeat a large number of certain keywords in their web pages, which makes them easily selected by some famous search engines, but in fact they may not provide users with any valuable information. (3) With the rapid change of network information, people always expect to pick out the latest information. However, the network information is constantly changing, and it is almost impossible to search in real time. Even the web pages you have just visited may be updated, expired or deleted at any time.

1.2 development trend. The development of network information retrieval tools is mainly reflected in further improving and perfecting retrieval tools and technologies to improve the quality of retrieval services and change the unsatisfactory aspects of network information retrieval. Mainly reflected in the following aspects:

1.2 1 The cooperation between providers of web search tools is getting closer and closer. In the past, the general network search tool providers only relied on their own databases to provide search services, and the search scope was limited. Now, some famous search engines are buying databases or technical cores from other companies, and some have established cooperative relations with other search engines for the convenience of users. For example, the famous Yahoo now uses Google's search kernel, Netease has also used Google's search kernel technology to enrich its search engine database, and search engines such as Silicon Valley Power, Guangzhou Windows, Sina, Sohu, Chinaren, 2 1cn, 263 and Tom use Baidu's search kernel technology and so on.

1.22 specialization of information retrieval tools and deepening of service content. Some retrieval tools no longer blindly pursue increasing the amount of inclusion and indexing, but pay more attention to highlighting professional characteristics. In the lycos search engine directory, we can see that specialized network information retrieval such as commercial search engine, IT search engine, talent search engine, financial search engine and medical search engine have appeared one after another, and the specialization of information retrieval tools has become an irreversible trend. Information retrieval service providers have deepened their services: Google has launched a web page reference query service, through which you can check whether the information you want to query is quoted by other websites, so that users can better grasp the quality of web page information; In August 2003, the third generation Chinese search engine HC came out. It integrates many search functions such as "extensive regional search", "powerful industry search", "perfect MP3 and Flash search", and also develops "content-related query" and "fuzzy query with China characteristics", which can realize pinyin query and homophone error correction.

1.23 The development trend of the intelligentization of network information tools: (1) The intelligentization of information retrieval tools is firstly the intelligentization of web spiders. In view of the dynamic alternation of network information, web spiders adopt the most effective search strategy through heuristic learning and choose the best time to obtain the information automatically collected from the Internet. Web spiders can work anywhere in the network, and can mine and obtain as much information as possible. Web spiders should also have the function of tracking and monitoring web pages. If the webpage is updated or deleted, it should be updated in the database in time. Web spiders have the ability to work across platforms and deal with various mixed document structures. (2) Secondly, the intelligence of retrieval software. Now there are mainly intelligent search engines, intelligent browsers and intelligent agents. These online retrieval tools attach great importance to the development and implementation of input based on natural language forms. Searchers can input their own retrieval questions and natural language forms such as phrases, phrases and even sentences they are used to, and intelligent retrieval software will be able to automatically analyze them and then form retrieval strategies for retrieval. For example, the current Baidu search can provide some similar keywords for you to choose after you enter the keywords until you find the results you need. With the help of machine translation technology, Google transforms a natural language into another language, enabling users to search non-native web pages in their mother tongue and browse search results in their mother tongue. Eureka, Ask and ASK Jeeves abroad, through the combination of semantic technology and retrieval technology, can realize the semantic understanding of search words by retrieval tools and provide users with the most accurate retrieval services.

2. Search tool based on FTP file.

2. 1 status quo. As mentioned above, the embryonic form of search engine and the earliest search engine are all based on FTP file search. The earliest FTP search engine was Archie based on text display. Later, due to the appearance of WEB, the development of FTP search engine was affected to some extent. It was not until the appearance of FTP search engine based on WEB that it became more and more popular, and the number of users increased rapidly, and its importance became increasingly apparent. The function of FTP search engine is to collect the directory list provided by anonymous FTP server and provide file information query service to users. At present, the best and largest FTP file search engine in China is Skynet, which can now search 24 million files (data comes from the homepage of Skynet). In 2002, there were 400,000 daily visits, and it was also a leader in the FTP search engine industry in the world. In addition, there are Tsinghua 9# search engine, Xi Jiaotong University Siyuan search engine, South China Kapok search engine, network compass, Sirius search engine of Chinese University of Science and Technology, and Nanjing University of Science and Technology "Grabbing the Net" search engine, etc. There are Philes.com, AlltheWeb.com, Filesearching.com, souborak.com and ftpfind.com in foreign countries, among which ftpfind.com is the most advanced in foreign countries, supporting new functions such as site snapshot and file classification, and the amount of file data is very large.

In recent years, although the technology of FTP search engines has developed rapidly, compared with WWW search engines, the number of FTP search engines is small and the technology is immature, and there are still many places to be improved: (1) The number of FTP search engines is still relatively small, and the scale and quality of search engines still depend on the amount of information they maintain. According to statistics, there are hundreds of millions of file entries provided by anonymous FTP services in the world. Even in the largest Philes.com, according to the statistics of Chen Hua and Li Xiaoming in July 2002, there are only 209,698,206 copies. (2) The retrieval function is not perfect. Search function is the most important part of search engine. Many search engines can't support simple Boolean retrieval such as "AND" and "OR", which leads to the files in the database can't be retrieved. (3) The characteristics of the FTP server itself determine the weakness of the FTP search engine, that is, because the FTP server has opening hours, some limit the IP address, some limit the number of users who log in, and different servers set different connection port numbers, some search results cannot be accessed, which greatly reduces the satisfaction of users.

2.2 Development trend. As mentioned above, FTP file search engine technology is not very mature, but it is developing very rapidly, and its development trend is mainly manifested in the following aspects: (1) the retrieval function is increasingly rich. Skynet FTP file search engine can now achieve advanced search based on file size, file upload date and network segment (such as North China Network and East China Network) and limit search results. AlltheWeb.com has added retrieval methods (regular expression retrieval, accurate retrieval, browsing, case sensitivity, etc.). ), and restricted the host (edu or gov or com, etc.). ), file type, file size, date and other functions. (2) Personalization of retrieval service. Now ftp search engine researchers have begun to pay attention to this aspect. Skynet FTP search engine has many options that can be personalized: it can set the sorting method of users' different preferences, it can set the priority of foreign files or domestic files, whether foreign files should be given priority to foreign users, whether files on FTP or WWW should be given priority, and whether to choose Chinese or English. AlltheWeb.com can complete more personalized settings, such as selecting the host to provide the results, setting the language, setting the file size of the search, whether to display the search keywords in brightness, setting the user language, and keyboard shortcuts.

3. The present situation and development trend of retrieval technology based on network retrieval tools.

3. 1 With the expansion and development of online information resources, a search engine, no matter how perfect, can't meet all the retrieval needs of a person. In the case of literature investigation, special inquiry, news investigation and traceability, software and MP3 download address search, people need to use a variety of search engines to compare, screen and verify each other. In order to solve the tedious operations such as logging into each search engine one by one and inputting the same search request (search string) in each search engine many times, search tools based on network search tools came into being.

At present, there are only two retrieval tools: integrated search engine and meta-search engine. The so-called integrated search engine is a network search tool that links several independent search engines on a search interface. When searching, you can specify a search engine or require multiple engines to search at the same time, and the search results are submitted by each search engine on different pages. In fact, it is a collection of search engines formed by using website link technology. The manufacturing and maintenance technology of integrated search engine is simple, and linked search engines can be added, deleted, adjusted and updated at any time, especially for large-scale majors (such as FLASH and MP3). ) search engines are very popular among specific user groups. For example, in China, Skynet and Baidu Search Overlord, in foreign countries, there are "SouFun" (/) and "Internet Swiss Army Knife" (,Yahoo! , Infoseek, Lycos and other commonly used search engines, some large search engines such as NorthernLight and HotBot are excluded, which artificially limits the use of search resources; (5) In terms of retrieval results, meta-search engines can only return dozens of results with high "relevance", and a large number of retrieval results of source search engines with potential value are ignored, which affects the comprehensiveness of retrieval results.

3.2 Development trend. The development trend of this kind of retrieval tools is mainly manifested in the following aspects: (1) Deepening the collation of retrieval results. Such as Vivisimo, EZ2WWW, MetaCrawler, etc. It can realize the automatic classification of search results, so that users can browse the results in the traditional way, and can also use the classified results on the same screen to prompt them to find what they need. EZ2WWW advanced search function provides 1000 kinds of characteristic resources search, which can be used for directory search. SurfWax has a unique function that other meta-search engines don't have, that is, clicking the "URL button" icon on the left side of each result can browse any page contained in the result, display the location of the search sentence in the file, and store the search results and files for later use. Skynet has a unique link detection function, which can check whether the query results of the current page can be accessed within a few seconds. If the mark is green, the link can be connected (so far only links starting with http:// and ftp:// in the page have been detected). (2) Personalized trend of search interface. Both Skynet Souba and Google provide plug-ins for IE browser, which will be embedded in IE toolbar after installation, so users can search without logging in to Skynet's homepage. Users can set their favorite search engines as the main search, or add their favorite search engines. Not long ago, Skynet Souba just launched a plug-in that can be embedded in the taskbar of Windows system, and now users don't even need to open IE browser. Mamma can choose to use the phrase retrieval function, set the retrieval time, and set the number of records that can be displayed per page. It also provides a special retrieval service for searching page file titles and a special function for transmitting retrieval results by e-mail. MetaCrawler can select and call search engines, filter search results according to domain name, region or country, set the longest search time, set the number of search results that each page can display and allow each search engine to return, and set the sorting basis of search results (including relevance, domain name and source search engine) to customize and save. (3) intelligence. ProFusion can automatically realize the conversion that meets the requirements of special retrieval syntax, such as converting "NEAR" into "AND" when calling Excite, InfoSeek and WebCrawler, and deleting "NOT" when calling GoTo and Yahoo. Mamma also supports the conversion of common search grammars in different search engines; C4 can support natural language retrieval. Although it doesn't have its own database, it can provide online search results.

Previous article:Necessity of developing and popularizing ancient prescriptions of traditional Chinese medicine
Next article:Longquan haochide branch