Robots txt generate dynamically

Last Updated : |  

You can also create the robots.txt with any text editor and it might look like this. Here we create a dynamic robots.txt, but there are several possibilities in PHP to write to a file. We use here file_put_contents() and write for it our own small function, which can be arbitrarily extended and adapted.

# robots.txt - https://www.tnado.com/
Sitemap: https://www.tnado.com/sitemap.xml

# If you're creating sitemaps that are in separate folders like en-en or en-gb because of the language. Can you also specify this
Sitemap: https://www.tnado.com/de-de/sitemap.xml
Sitemap: https://www.tnado.com/en-en/sitemap.xml

# You could refuse Google bots, bing bots, etc., but it's up to you
User-agent: Googlebot
Disallow: /wp-admin/

# User agents with a star means that all are allowed
User-agent: *
# You can ban folders from access
Disallow: /admin/
Disallow: /images/
Disallow: /temp/
# Or forbid access to files
Disallow: /fotoalbum.html


Robots txt file on the web

Use the robots.txt, because it also has a part in the web in importance and can save you unnecessary, even if not all bots listen to them and follow. There are a lot of bots that also cause unnecessary things on your website and thus Influence your statistics. You ignore any request in robots.tx, go into every folder on your web page and rummage through everything. Bots work automatically, they are programmed to do certain tasks and never get tired ;).


Tip

Do not leave important files unencrypted on your server and certainly not freely accessible if you have important files on the server. Try to protect your environment as much as possible, the reason why a website is unsafe is in most cases the programmer.



Note

There is no guarantee that search engines will abide by the bans in robots.txt.



Create your own function

So now we come to our function, we allow here an array which we call args, this is the abbreviation for arguments. In the we create our necessary attributes like domain, sitemap url, line and also path.


// PHP CODE
     0  /**  1   * Dynamic robots.txt  2   *  3   * @param array $args (domain|sitemap|line|path)  4   * @return boolean (true|false)  5   */  6  function createRobotsTXT($args = array()) {  7    8      $return = false;  9      $default = '# robots.txt - ' . (isset($args['domain'])?$args['domain']:'') . PHP_EOL;  10      $default .= 'Sitemap: ' . (isset($args['sitemap-uri'])?$args['sitemap-uri']:'') . PHP_EOL;  11      $default .= (isset($args['line'])?$args['line']:'') . PHP_EOL;  12        13      if (isset($args['path']) && !file_exists($args['path'])) {  14          $return = file_put_contents($args['path'], $default);  15      }  16        17      return $return;  18  } 

    So we finally use our function to create our file.


    // PHP CODE
       0  createRobotsTXT(array(  1      'domain' => 'https://www.tnado.com/',  2      'sitemap-uri' => 'https://www.tnado.com/sitemap.xml',  3      'line' => '',  4      'path' => 'robots.txt'  5  )); 

      The issue in this case would look like this for me and we would have handed over everything that is in there.


      // OUTPUT CODE
      # robots.txt - https://www.tnado.com/
      Sitemap: https://www.tnado.com/sitemap.xml


      Extended example of a robots txt file

      Another example to exclude subdirectories and files directly.


      // PHP CODE
         0  createRobotsTXT(array(  1      'domain' => 'https://www.tnado.com/',  2      'sitemap-uri' => 'https://www.tnado.com/sitemap.xml',  3      'line' => 'User-agent: UniversalRobot/1.0  4  User-agent: mein-Robot  5  Disallow: /quellen/dtd/  6    7  User-agent: *  8  Disallow: /fotos/  9  Disallow: /temp/  10  Disallow: /fotoalbum.html',  11      'path' => 'robots.txt'  12  )); 

        Of course, this issue looks a bit more, because we also wrote in more than before.


        // OUTPUT CODE
        # robots.txt - https://www.tnado.com/
        Sitemap: https://www.tnado.com/sitemap.xml
        
        User-agent: UniversalRobot/1.0
        User-agent: mein-Robot
        Disallow: /quellen/dtd/
        
        User-agent: *
        Disallow: /fotos/
        Disallow: /temp/
        Disallow: /fotoalbum.html


        With Google Search Console

        In the search console of Google is a tab under Crawling -> robots.txt-tester and it can also be there robots.txt be created and also tested, but this is not necessary and takes a little longer. The other ways are faster and it will take until Google displays them in the Search Console.

        Here you can find a contribution from me about the  Google Search Console.

        Among other things, you can specify your sitemap in robots.txt or exclude it with certain bots (as long as you listen to it and do not ignore it).

         Learn more about robots.txt from Google.



        Alternatively with the htaccess

        One can with the .htaccess e.g. Do more and really make it possible to exclude, if one has to express knowledge with regulars and dominates them.

        Advertising

        Your Comment

        * This fields are required, email don't publish
        ?

        This field is optional
        Fill this field link your name to your website.

        Data entered in this contact form will be stored in our system to ensure the output of the comment. Your e-mail address will be saved to determine the number of comments and for registration in the future

        I have read the Privacy policy and Terms. I Confirm the submission of the form and the submission of my data.
        tnado © 2019 | All Rights Reserved
        In cooperation with Hyperly