You can also create the robots.txt with any text editor and it might look like this. Here we create a dynamic robots.txt, but there are several possibilities in PHP to write to a file. We use here file_put_contents()
and write for it our own small function, which can be arbitrarily extended and adapted.
# robots.txt - https://www.tnado.com/
Sitemap: https://www.tnado.com/sitemap.xml
# If you're creating sitemaps that are in separate folders like en-en or en-gb because of the language. Can you also specify this
Sitemap: https://www.tnado.com/de-de/sitemap.xml
Sitemap: https://www.tnado.com/en-en/sitemap.xml
# You could refuse Google bots, bing bots, etc., but it's up to you
User-agent: Googlebot
Disallow: /wp-admin/
# User agents with a star means that all are allowed
User-agent: *
# You can ban folders from access
Disallow: /admin/
Disallow: /images/
Disallow: /temp/
# Or forbid access to files
Disallow: /fotoalbum.html
Robots txt file on the web
Use the robots.txt, because it also has a part in the web in importance and can save you unnecessary, even if not all bots listen to them and follow. There are a lot of bots that also cause unnecessary things on your website and thus Influence your statistics. You ignore any request in robots.tx, go into every folder on your web page and rummage through everything. Bots work automatically, they are programmed to do certain tasks and never get tired ;).
Tip
Do not leave important files unencrypted on your server and certainly not freely accessible if you have important files on the server. Try to protect your environment as much as possible, the reason why a website is unsafe is in most cases the programmer.
Note
There is no guarantee that search engines will abide by the bans in robots.txt.
Create your own function
So now we come to our function, we allow here an array which we call args, this is the abbreviation for arguments. In the we create our necessary attributes like domain, sitemap url, line and also path.
0
/**
1
* Dynamic robots.txt
2
*
3
* @param $args (domain|sitemap|line|path)
4
* @return boolean (true|false)
5
*/
6
function createRobotsTXT($args = ()) {
7
8
$return = false;
9
$default = '# robots.txt - ' . (isset($args[
10
$default .= 'Sitemap: ' . (isset($args['sitemap-uri'])?$args['sitemap-uri']:'') . PHP_EOL;
11
$default .= (isset($args['line'])?$args['line']:'') . PHP_EOL;
12
13
if (isset($args['path']) && !file_exists($args['path'])) {
14
$return = file_put_contents($args['path'], $default);
15
}
16
17
return $return;
18
}
So we finally use our function to create our file.
0
createRobotsTXT(
( 1
'domain' => 'https://www.tnado.com/',
2
'sitemap-uri' => 'https://www.tnado.com/sitemap.xml',
3
'line' => '',
4
'path' => 'robots.txt'
5
));
The issue in this case would look like this for me and we would have handed over everything that is in there.
# robots.txt - https://www.tnado.com/
Sitemap: https://www.tnado.com/sitemap.xml
Extended example of a robots txt file
Another example to exclude subdirectories and files directly.
0
createRobotsTXT(
( 1
'domain' => 'https://www.tnado.com/',
2
'sitemap-uri' => 'https://www.tnado.com/sitemap.xml',
3
'line' => 'User-agent: UniversalRobot/1.0
4
User-agent: mein-Robot
5
Disallow: /quellen/dtd/
6
7
User-agent: *
8
Disallow: /fotos/
9
Disallow: /temp/
10
Disallow: /fotoalbum.html',
11
'path' => 'robots.txt'
12
));
Of course, this issue looks a bit more, because we also wrote in more than before.
# robots.txt - https://www.tnado.com/
Sitemap: https://www.tnado.com/sitemap.xml
User-agent: UniversalRobot/1.0
User-agent: mein-Robot
Disallow: /quellen/dtd/
User-agent: *
Disallow: /fotos/
Disallow: /temp/
Disallow: /fotoalbum.html
With Google Search Console
In the search console of Google is a tab under Crawling -> robots.txt-tester and it can also be there robots.txt be created and also tested, but this is not necessary and takes a little longer. The other ways are faster and it will take until Google displays them in the Search Console.
Here you can find a contribution from me about the Google Search Console.
Among other things, you can specify your sitemap in robots.txt or exclude it with certain bots (as long as you listen to it and do not ignore it).
Learn more about robots.txt from Google.
Alternatively with the htaccess
One can with the .htaccess e.g. Do more and really make it possible to exclude, if one has to express knowledge with regulars and dominates them.
Advertising