Need linux expert: How to limit access to web site to prevent harvesting
$30-250 USD
Cancelled
Posted almost 15 years ago
$30-250 USD
Paid on delivery
We need the help of a Linux expert. We have a very large web site, with millions of pages, and we need to be able to limit access to the public by restricting access to 50 pages within 24 hours per IP address. In other words if you surf the web and go to our web site, we will let you surf through 50 html pages before we block you. We need this to prevent data harvesting by end users and also by our competition. We are running the latest version of Linux Slackware. We know it can be done, but we can not find how. We need someone to explain to us how to set this up by sending us the instructions.
I have profound linux systems engineering and network protocol skills which I'd be glad to use for helping you. Please check your PMB for further discussion.
I would accomplish this by writing a perl script which would scan the web server access log for GETs by IP address and accumulate the totals. If any IP address went over the limit within 24 hours they would then be blocked. Timestamps would be used so the blocked IP address would be unblocked after 24 hours.
I could write this for a generic access file and allow it to be configured for any specific log file desired. That way if you had multiple web servers you could scan any specific access logs needed. You might also find that some IPs would need to be excluded, ie, never banned, so you might want such a feature implemented.
I may give you the configuration which can even limit bandwith for each client at the moment based on the total number of connections. So one who has a fast connection wont be able to suck all your bandwith.