Manage robots.txt to control crawlers

Questions:

  • How can I prohibit or allow crawlers?
  • Why can't I carry out SEO analyses on my site?

Description:

Managing the robots.txt file in TYRIOS allows you to control which web crawlers are allowed to access your website. By default, TYRIOS pursues a preventive security strategy: all crawlers that are not on the whitelist are automatically excluded. This allows you to protect your website from unwanted traffic and access and increase performance. However, if you prefer a different strategy, the behaviour can be flexibly adapted.

Benefits for you:

  • Security and data protection: Protect sensitive areas of your website from unwanted access and unwanted indexing.
  • Optimised performance: Reduce unnecessary data traffic and reduce the load on your server.
  • Flexibility: Adapt the default behaviour to your individual needs.

Why TYRIOS relies on exclusion by default

Many crawlers work inefficiently and place an unnecessary load on your website. With the automatic exclusion rule, TYRIOS ensures that only desired and trustworthy crawlers (defined by the whitelist) are allowed to access your website. This strategy protects you from unwanted traffic and prevents sensitive content or pages from being accidentally indexed.

However, if you have specific requirements, the default behaviour can be changed to allow all crawlers that are not on the blacklist.

Solution:

Allow and prohibit crawlers

TYRIOS manages a special blacklist and whitelist that you can use to block or unblock crawlers. TYRIOS automatically initialises the most important crawlers on the whitelist. However, you can change the list at any time.

  1. Go to the customer area as a user with extended rights > System > Manage crawler

    You will see all currently defined crawlers. A crawler is defined by the so-called "user agent". This is the name of the crawler with which it identifies itself.

  2. Click on "New" to add a new crawler Screenshot of the editor for adding a crawler to the robots.txt whitelist
  3. Enter any name for the crawler. The user agent must correspond to the name of the crawler.
  4. Specify whether the crawler should be whitelisted (allowed) or blacklisted (forbidden).
  5. Click on "Save". The setting is applied immediately

To delete or edit an existing entry, click on the corresponding entry in the line menu.

Change default behaviour

TYRIOS blocks unknown crawlers by default. To change this default behaviour, proceed as follows:

  1. Click as a user with extended rights in the customer area > System > System configuration
  2. Select the "Robots.txt management" area
  3. Select whitelist mode or blacklist mode

    By default, TYRIOS uses the whitelist mode. This means that crawlers must be explicitly enabled.

  4. By switching to backlist mode, you allow all crawlers. This allows unknown crawlers, but also allows competitors to analyse your site. The robots.txt entry is legally binding for crawlers and so you can control who has access.
  5. Click on Save

Tips and Tricks:

If you want to analyse your website using SEO tools, you must whitelist its crawler, otherwise it will be blocked. The crawler designation is important here. These crawlers are not allowed by default to protect you from unwanted competitor analyses.

If you want to use these tools, you must add them to the whitelist. The user agents are as follows:

  • Semrush: SemrushBot
  • Sistrix: sistrix
  • Seobility: SeobilityBot

Subscribe to our newsletter

Stay informed at all times. We will gladly inform you about product news and offers.