Over the last few months I’ve built a couple of websites using Jekyll: the website for Oxford Hack 2019 and this website. Both of these websites contain links to email addresses, and I wanted a way to make it harder for spam bots to harvest the data without making it prohibitively difficult for users of the website to read them.
One option is to use a service like Cloudflare, but I didn’t want to spend much money on the solution.
The other option is to encode the email address in such a way that the browser renders it normally, but most web scrapers won’t recognise it as an email address. This is the option chosen by the base implementation of Markdown, but does not come by default with Jekyll’s Kramdown.
Thankfully Alex Chan has ported the implementation, which you can read more about here. It works amazingly for encoding email addresses:
This makes it much harder for bots to scrape, but browsers still render it as a regular email address. Perfect!
But I wanted slightly more than that. When I was writing the committee descriptions for the Oxford Hack site I wanted to be able to automatically obfuscate any email addresses that happened to be in a given block of text. Thankfully Jekyll’s liquid scripting allows us to do just that. I used the jekyll-regex-replace plugin to check if each word in a string was an email address, passing it to
create_mailto_link if it matched the pattern. I also added a check for email addresses ending in a fullstop or semicolon, stripping it before creating the mailto link and adding it back at the end.
The full code is below, and you can find a minified version here - the unminified version creates a lot of unnecessary whitespace. Replace
item.description with the string you want filtered.