Indexing is the process of scanning files located on an Internet resource by a search robot. This procedure is carried out so that the site is available in the search results for various queries in the search engine. Among the largest search engines today is Yandex, which conducts this scan in its own way.
Instructions
Step 1
The indexing of the Internet site is carried out by special automatic programs - search robots, which automatically track the appearance of new sites on the World Wide Web, constantly scanning Internet pages located on the Internet, files and links to them on each resource.
Step 2
To scan, the robot goes to the directory where the resource is located on a particular server. When choosing a new site, the robot is guided by its availability. For example, there is an opinion that Yandex first scans sites created in a Russian-language domain and in Russian - ru, rf, su or ua, and only then moves to other regions.
Step 3
The robot navigates to the site and scans its structure, first looking for files that indicate further search. For example, a site is scanned for Sitemap.xml or robots.txt. These files can be used to set the behavior of the search robot when scanning. Using the sitemap (sitemap.xml), the robot gets a more accurate idea of the structure of the resource. The webmaster uses robots.txt to define files that he would not like to be shown in search results. For example, it could be personal information or other unwanted data.
Step 4
Having scanned these two documents and received the necessary instructions, the robot begins to parse the HTML code and process the received tags. By default, in the absence of a robots.txt file, the search engine starts processing all documents stored on the server.
Step 5
By clicking on links in documents, the robot also receives information about other sites that are queued for scanning following this resource. The scanned files on the site are saved as a text copy and structure on servers in Yandex data centers.
Step 6
The need for re-scanning is also determined automatically by robots. The program compares the existing scan result with the updated version of the site when it goes through the indexing again. If the data received by the program differs, the site copy is updated on the Yandex server as well.