Block crawlers with Caddy

2024-01-20

A quick tip on how to block crawlers from accessing your website by rejecting requests containing their user agent tokens in Caddy.

If you have yet to install Caddy, please refer to the installation instructions.
Go into the folder containing Caddyfile

cd /etc/caddy

Edit Caddyfile.

sudo nano /etc/caddy/Caddyfile

Add a named request matcher with a list of bots and its corresponding handler.

@crawlers { header_regexp User-Agent (?i)(ChatGPT-User|cohere-ai|anthropic-ai|Bytespider|CCBot|FacebookBot|Google-Extended|GPTBot|omgili|Amazonbot|Applebot|PerplexityBot|YouBot) }

handle @crawlers { abort }

The above example list contains only AI-related crawlers. For a full list of bot user agents refer to Dark Visitors.

Example full config for a static website:

my-website.com {

	@crawlers { 
				header_regexp User-Agent (?i)(ChatGPT-User|cohere-ai|anthropic-ai|Bytespider|CCBot|FacebookBot|Google-Extended|GPTBot|omgili|Amazonbot|Applebot|PerplexityBot|YouBot) 
				} 
	
	handle @crawlers { 
				abort 
				} 
	
	file_server 
	
	root * /var/www/my-website 

}

Reformat Caddyfile.

sudo caddy fmt --overwrite

Validate Caddyfile. Make sure there’s no errors before next steps.

sudo caddy validate

Restart Caddy.

sudo systemctl restart caddy

Check if Caddy’s running correctly.

sudo systemctl status caddy

(Optional) Check if the request is properly rejected by using CURL or another tool.

curl --request GET --url https://my-website.com/ --header 'User-Agent: ChatGPT-User'