Web

Blocking AI Scrapers in Apache/nginx

February 11, 2025 Rich 2 min read

You can place the following into your .htaccess to have it return an HTTP 403 if the User Agent is matched.

Apache

The following command will generate a list of BrowserMatchNocase directives for User-Agents that match the given AI scrapers. You can copy the output to your .htaccess file.

echo -e "\n\n\n# BEGIN ai-scraper block\n# $(date)\n$(curl -s https://darkvisitors.com/agents | grep -oP '(?<=<div class="name agent-name">).*?(?=</div>)' | sed 's/^/BrowserMatchNocase ^/; s/$/(.*) ai_scraper/')\n<RequireAll>\nRequire all granted\nRequire not env ai_scraper\n</RequireAll>\n# END ai-scraper\n"

.htaccess File

# BEGIN ai-scraper block
# Thu Mar  6 12:53:02 AM EST 2025
BrowserMatchNocase ^Operator(.*) ai_scraper
BrowserMatchNocase ^ChatGPT-User(.*) ai_scraper
BrowserMatchNocase ^DuckAssistBot(.*) ai_scraper
BrowserMatchNocase ^Meta-ExternalFetcher(.*) ai_scraper
BrowserMatchNocase ^AI2Bot(.*) ai_scraper
BrowserMatchNocase ^Applebot-Extended(.*) ai_scraper
BrowserMatchNocase ^Bytespider(.*) ai_scraper
BrowserMatchNocase ^CCBot(.*) ai_scraper
BrowserMatchNocase ^ClaudeBot(.*) ai_scraper
BrowserMatchNocase ^cohere-ai(.*) ai_scraper
[...SNIP...]
<RequireAll>
Require all granted
Require not env ai_scraper
</RequireAll>
# END ai-scraper

nginx

echo -e "\n\n\n# BEGIN ai-scraper block\n# $(date)\nmap \$http_user_agent \$ai_scraper {\n$(curl -s https://darkvisitors.com/agents | grep -oP '(?<=<div class="name agent-name">).*?(?=</div>)' | sed 's/^/~*^/; s/ /\\s/g; s/$/ 1;/')\n}\n# END ai-scraper\n" > /etc/nginx/snippets/bad-bots.conf

bad-bots.conf

The above script creates an nginx map and writes it to /etc/nginx/snippets/bad-bots.conf.

# BEGIN ai-scraper block
# Mon Mar 17 12:46:03 PM EDT 2025
map $http_user_agent $ai_scraper {
~*^Operator 1;
~*^ChatGPT-User 1;
~*^DuckAssistBot 1;
~*^Meta-ExternalFetcher 1;
~*^AI2Bot 1;
~*^Applebot-Extended 1;
~*^Bytespider 1;
~*^CCBot 1;
~*^ClaudeBot 1;
~*^cohere-training-data-crawler 1;
~*^Claude-Web 1;
~*^cohere-ai 1;
[..SNIP..]
}
# END ai-scraper

nginx.conf

You will need to include this configuration in your http configuration block in nginx.

http {
...
include /etc/nginx/snippets/bad-bots.conf;
...
}

server {} block

server {
    listen 80;
    server_name _;
    location / {
        if ($ai_scraper) {
            return 403; # You could also do 444 if you want to
                    # close the connection and not return
                    # a 403/404.
        }
    }

Testing

You can test if the blocking is working by using curl.

curl -I -A cohere-ai techish.net

It should return a 403 error.

HTTP/1.1 403 Forbidden
Server: nginx
Date: Tue, 28 Feb 2025 13:31:08 GMT
Content-Type: text/html
Content-Length: 146
Connection: keep-alive

Leave a comment