web crawling

This blog is no longer being updated. I have MOVED to my new home.

It was nice here, while it lasted, but hopefully a fresh start will make life more interesting elsewhere!
1st November 2005

Google Bandwidth

I recently changed my stats package for the web sites I run, and I noticed that web crawlers are responsible for over 80% of the bandwidth that my sites are using. GoogleBot is the monster crawler, covering over 80% of that. The main difference between GoogleBot and the others is that GoogleBot is taking the images too, probably for Google Image Search.

With my Galleries, this is quite a load – so I have added all the Gallery pages to robots.txt to try and stop them indexing them. Also, I have implemented Google Sitemap across all sites to try and get Google to back off from re-indexing unchanging content. Hopefully, these steps will result in the percentage dropping to something more reasonable.

tags: bandwidth blog google home computing web crawling | 58 comments

Archive

User login