Sample Header Ad - 728x90

Throttling web crawlers

7 votes
1 answer
4809 views
My website is being DoS'ed by Google webspiders. Google is welcome to index my site, but sometimes it is querying a tagcloud on my site faster than my webserver can produce the results, making my webserver run out of resources. How can I limit access to my webserver in such a way that normal visitors are not affected? robots.txt is no option because it would block the whole site from being indexed. iptables -m recent is tricky, because some pages have a lot of images or other data files and 'recent' triggers on those too (typically my RSS aggregator, loading images and feeds). iptables -m limit has the same disadvantage and on top of that, I wasn't able to be selective per IP source address. How can I limit visitors that cause my server load to rise too high? I am running apache2 on Ubuntu server in a VirtualBox VM.
Asked by jippie (14566 rep)
Apr 27, 2012, 07:06 PM
Last activity: Dec 15, 2023, 09:15 PM