Metasearch engine

A metasearch engine (or aggregator) is a search tool that uses another search engine's data to produce its own results from the Internet. Metasearch engines take input from a user and simultaneously send out queries to third party search engines for results. Sufficient data is gathered, formatted by their ranks and presented to the users.

Metasearch engines have their own sets of unique problems. All of the websites stored on search engines are different, which draws irrelevant content. Problems such as spamming reduces result accuracy. The process of fusion aims to tackle this issue and improve the engineering of a metasearch engine.

The first person to incorporate the idea of meta littling was Colorado State University's Daniel Dreilinger. He revealed SearchSavvy, which let users search up to 20 different search engines and directories at once. Although fast, the search engine was restricted to simple searches and thus wasn't too reliable. University of Washington student Eric Selberg released a more "updated" version called MetaCrawler. This search engine improved on SearchSavvy's accuracy by adding its own search syntax behind the scenes, and matching the syntax to that of the search engines it was probing. Metacrawler reduced the amount of search engines queried to 6, but although it produced more accurate results, it still wasn't considered as accurate as searching a query in an individual engine.

Ixquick is a search engine more recently known for its privacy policy statement. Developed and launched in 1998 by David Bodnick, it is currently owned by Surfboard Holding BV as of year 2000. On June 2006, Ixquick began to delete private details of its users following the same process with Scroogle. Ixquick's privacy policy includes no recording of users' IP addresses, no identifying cookies, no collection of personal data, and no sharing of personal data with third parties. It also uses a unique ranking system where a result is ranked by stars. The more stars in a result, the more search engines agreed on the result.

By sending multiple queries to several other search engines this extends the search coverage of the topic and allows more information to be found. They use the indexes built by other search engines, aggregating and often post-processing results in unique ways. A metasearch engine has an advantage over a single search engine because more results can be retrieved with the same amount of exertion. It also reduces the work of users from having to individually type in searches from different engines to look for resources.

A metasearch engine can also hide the searcher's IP address from the search engines queried thus providing privacy to the search. It is in view of this that the French government in 2018 decreed that all government searches be done using Qwant, which is believed to be a metasearch engine.

Metasearch engines are not capable of decoding query forms or able to fully translate query syntax. The number of links generated by metasearch engines are limited, and therefore do not provide the user with the complete results of a query. The majority of metasearch engines do not provide over ten linked files from a single search engine, and generally do not interact with larger search engines for results. Sponsored webpages are prioritised and are normally displayed first.

Metasearching also gives the illusion that there is more coverage of the topic queried, particularly if the user is searching for popular or commonplace information. It's common to end with multiple identical results from the queried engines. It is also harder for users to search with advanced search syntax to be sent with the query, so results may not be as precise as when a user is using an advanced search interface at a specific engine. This results in many metasearch engines using simple searching.

A metasearch engine accepts a single search request from the user. This search request is then passed on to another search engine’s database. A metasearch engine does not create a database of webpages but generates a virtual database to integrate data from multiple sources.

Web pages that are highly ranked on many search engines are likely to be more relevant in providing useful information. However, all search engines have different ranking scores for each website and most of the time these scores are not the same. This is because search engines prioritise different criteria and methods for scoring, hence a website might appear highly ranked on one search engine and lowly ranked on another. This is a problem because Metasearch engines rely heavily on the consistency of this data to generate reliable accounts.

Spamdexing is the deliberate manipulation of search engine indexes. It uses a number of methods to manipulate the relevance or prominence of resources indexed in a manner unaligned with the intention of the indexing system. Spamdexing can be very distressing for users and problematic for search engines because the return contents of searches have poor precision. This will eventually result in the search engine becoming unreliable and not dependable for the user. To tackle Spamdexing, search robot algorithms are made more complex and are changed almost every day to eliminate the problem.