Brian's Blog Homepage
robot
robot

Every day I try to learn something new and today is no different.

Hopefully you are aware of the robots.txt file that comes with your joomla web site install and you might even have customised it to add more power and control.

The robots.txt file acts as a signpost to google and other search engine robots, telling which parts of your web site to index and which to ignore.

In advanced usage it can be used to serve slightly different content to the search engines, prevent images from being included in the google image index or even tell google the pages of your site to include in its mobile index.

But have you checked it is actually working?

Three Laws of Robotics

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given to it by human beings, except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

These laws were first written by Isaac Asimov in 1942 and have formed the basis of much scientific development in robotics ever since.

But how do they relate to joomla or is Brian going mad again?

The clue is in the first law.

Every joomla install includes a robots.txt file in the root of the web site (e.g the same directory as your configuration.php). If you have a look at it you will see that it is telling google and the other search engines and web crawlers NOT to index certain directories, which is a pretty good idea for security purposes.

Advanced users may even have edited this file to instruct google et al to perform other tasks or to instruct them not to.

But there is one golden rule of robots.txt that I was unaware of until today, and I suspect many other people are ignorant of it as well.

robots.txt will only work if it is in the top level directory of your web site

Search engines locate your robots.txt file by stripping the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For most people that's fine but if you installed joomla in a sub-directory e.g. www.example.com/cms the joomla supplied robots.txt file is in the wrong place and the search engines will never find it.

Luckily it is an easy job to fix.

  • Check to see if you have a robots.txt already in the top level directory.
  • If you don't copy the joomla supplied one there and then edit all the paths in the file to reflect your site
  • If you do then just add the details of the joomla robots.txt to your existing one, not forgetting to update the paths.

So remember to check you have placed your robots.txt in the correct directory before shouting at google for ignoring your commands.

A robot may not injure a human being or, through inaction, allow a human being to come to harm if you remember to charge it up.

J o o m l a !

Brian Teeman

Brian Teeman

Who is Brian?

As a co-founder of Joomla! and OpenSourceMatters Inc I've never been known to be lacking an opinion or being too afraid to express it.

Despite what some people might think I'm a shy and modest man who doesn't like to blow his own trumpet or boast about achievements.

Where is Brian?