Obfuscating My Contact Email
I stumbled across this great post by Spencer Mortensen yesterday, which tested different email obfuscation techniques against real spambots to see which ones actually work. It's a fascinating read, and I'd recommend checking it out if you're into that sort of thing.
The short version is that spambots scrape your HTML looking for email addresses. If your address is sitting there in plain text, they'll hoover it up. But if you encode each character as a HTML entity, the browser still renders and uses it correctly, while most bots haven't got a clue what they're looking at.
From Spencer's testing, this approach blocks around 95% of harvesters, which is good enough for me.
Where my email appears
On this site, my contact email shows up in two places:
- The Reply by email button at the bottom of every post.
- My contact page.
Both pull from the site_email value in Pure Blog's config, so I only needed to make a couple of changes.
The reply button
The reply button lives in content/includes/post-meta.php, which is obviously a PHP file. So the fix there was straightforward - I ditched the {{ site_email }} shortcode and used PHP directly to encode the address character by character into HTML entities:
<?php
$_email = load_config()['site_email'] ?? '';
$_encoded = implode('', array_map(fn($c) => '&#x' . dechex(ord($c)) . ';', str_split($_email)));
?>
<a class="button reply-button"
href="mailto:<?= $_encoded ?>?subject=Reply to: {{ post_title }}">Reply by email
</a>
Each character becomes something like k, which is gibberish to a bot, but perfectly readable to a human using a browser. The {{ post_title }} shortcode still gets replaced normally by Pure Blog after the PHP runs, so the subject line still works as expected.
The contact page
The contact page is a normal page in Pure Blog, so it's Markdown under the hood. This means I can't drop PHP into it. Instead, I used Pure Blog's on_filter_content hook, which runs after shortcodes have already been processed. By that point, {{ site.email }} has been replaced with the plain email address, so all I needed to do was swap it for the encoded version:
<?php
declare(strict_types=1);
function on_filter_content(string $content): string
{
$config = load_config();
$email = trim((string) ($config['site_email'] ?? ''));
if ($email === '') {
return $content;
}
$encoded = implode('', array_map(fn($c) => '&#x' . dechex(ord($c)) . ';', str_split($email)));
return str_replace($email, $encoded, $content);
}
This goes in config/hooks.php, and now any page content that passes through Pure Blog's filter_content() function will have the email automatically encoded. So if I decide to publish my site_email elsewhere, it should automagically work.
One more layer of protection
As well as the obfuscation, I also set up my email address as a proper alias rather than relying on a catch-all to segregate emails. That way, if spam does somehow get through, I can nuke the alias, create a new one, and update it in Pure Blog's settings page.
Is this overkill? Probably. But it was a fun little rabbit hole, and now I can feel smug about it. 🙃