Breaking

Thursday, July 23, 2020

What is Robots.txt File and How to Use It for SEO?

What is Robots.txt File and How to Use It for SEO

A robots.txt file is a little text file that lives in your site's root folder. It tells the search engine which part of the website to crawl and index and which part not to.

If you make any mistake while editing/customizing it, the search engine Bots will stop crawling and indexing your site and your site will not be visible in the search results.

In this article, I will tell you what is Robots.txt file and how to create a Perfect Robots.txt file for SEO.

Why is The Robots.txt File Website Required?

At the point when search engine Bots come to sites and blogs, they follow the robots.txt file instruction and crawl the content. But your website won't have a Robots.txt file, so the search engine Bots will start crawl and indexing all the content or pages of your site which you don't want to index.

Search engine Bots search the robots.txt file before indexing any site or webpages. At the point when they don't get any Instructions by Robots.txt file, they start indexing all webpages or contents of the site.

Note: Robots.txt file is required for these reasons. If we don't give instructions to the search engine Bots through this file, then they index our entire site. Also, you index some data that you didn't want to index.

Advantages Of Robots.txt File

  • The search engine tells Bots which part of the website to crawl and index or which not to.
  • A particular file, folder, image, pdf, etc. can be prevented from being indexed in the search engine.
  • Sometimes search engine spiders crawl your site like a black mamba, which affects your site performance. But you can get rid of this issue by adding crawl-delay to your robots.txt file. However, Googlebot does not obey this command. But you can set the Crawl rate in Google Search Console. This protects your server from being overloaded.
  • You can private the entire section of any website.
  • Internal search results can prevent pages from appearing in SERPs.
  • You can improve your Website SEO by blocks of low-quality pages.

Where is Located Robots.txt File Inside On The Website?

If you are a WordPress user, it resides in your site's root folder. If this file is not found in this location, the search engine bot starts indexing your entire website. Because the search engines don't search your entire site for the bot Robots file.

If you don't know if your site has a robots.txt file? So on the web search address bar all you simply should type it - example.com/robots.txt

A text page will open in front of you as you can see in the screenshot.

Digital Paratha Robots txt
Screenshot of Digital Paratha Robots.txt

This is the robots.txt file of DigitalParatha. If you do not see any such txt page, then you have to create a robots.txt file for your site.

Basic Format of Robots.txt File for SEO

The fundamental configuration of the robots.txt file is very simple and looks like this,

User-agent: [user-agent name]
Disallow: [URL or page you don't want to crawl]

These two commands are considered a complete robot file. However, a robot's file can contain multiple commands of user agents and directives (disallows, allows, crawl-delays, etc.).
  • User-agent: Search Engines are Crawlers / Bots. If you want to give the same instruction to all search engine bots, use the * sign after user-agent: Like - User-agent: *
  • Disallow: This prevents files and directories from being indexed.
  • Allow: This search engine allows bots to crawl and index your content.
  • Crawl-delay: How many seconds the bots have to wait before loading and crawling the page content.

Preventing All Web Spiders from Indexing Websites


User-agent: *
Disallow: /

Using this command directly in the robots.txt file can stop all web crawlers/bots from crawling the website.

All Web Spiders Allowed to Index All Content


User-agent: *
Disallow:

This order in the robots.txt file allows all search engine bots to crawl every one of the pages of your website.

Blocking a Specific Folder for Specific Web Spiders


User-agent: Googlebot
Disallow: /example-subfolder/

This command only stops Google spiders from crawling for example-subfolder. But if you want to block all Spiders, then your robots.txt file will be like this.

User-agent: *
Disallow: /example-subfolder/

Preventing a Specific Page (Thank You Page) from Being Indexed


User-agent: *
Disallow: /page URL (Thank You Page)

This will stop all spiders from crawling your webpage or blog URL. But if you want to block Specific Spider, then you write it like this.

User-agent: Bingbot
Disallow: /page URL

This command will only stop Bingbot from crawling your page URL.

How To Add a Sitemap To Robots.txt File and Why it is Important?

There are thousands of search engines in the world and it is not possible to submit your site to every search engine, but when you add your sitemap to Robots.txt file, you do not need to submit your site to all search engines.

However, submitting your site to Google and Bing is important.

What is Sitemap and Robots.txt File?

A sitemap is a list of all the URLs on your website that tells the search engine about all the pages and posts URLs on your website. The sitemap does not improve your search ranking, but it allows your website to crawl better for search engines.


The robots.txt file helps search engines understand which parts of your site to index and which not. When search engine robots visit your site, they follow a robots.txt file on your site and index the part that you want to be indexed in the search engine.

How To Add a Sitemap to a Robots.txt File?

First, go to the root directory of your site and select the robot.txt file and add your sitemap URL by clicking on the Edit button.

Now your robots.txt file will look something like this.

Sitemap: http://www.example.com/sitemap.xml

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemaps can be placed anywhere in the robots.txt file. It does not matter where you keep it.

How To Add Multiple Sitemap to Robots.txt File?

You can add different URLs for your multiple Sitemap files like this

Sitemap: http://www.example.com/sitemap_host1.xml
Sitemap: http://www.example.com/sitemap_host2.xml

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

In this way, you can manage your sitemap with the help of a robots txt file.

You can comment on any type of question or suggestion related to this article. If this article has proved helpful for you, then do not forget to Share it!

No comments:

Post a Comment