A harrowing tale of a spambot attack: how to use Cloudflare to defend your website against bad bots

It started with an S-O-S email from Eric Clark, Web Administrator for Operation Smile. Their contact form was getting slammed by a spambot, and they had a mess on their hands. We had to shake this bot ASAP!

Operation Smile provides cleft surgery and care for children in more than 30 countries around the world. Their work is critical for improving the health and dignity of their patients. We’ve managed the OperationSile.org website for many years and recently rebuilt it on Drupal 9. An assault on their website can impact their ability to successfully deliver cleft palate surgeries to people who urgently need them.

This sort of thing happens all the time across the world wide web, and produces a lot of sleepless nights. Apart from disabling all the forms, thereby hindering an important function of the website, what other recourse was there? And for how long? When would the bot lose interest and move on? Would it ever move on? Eric did disable the forms, but only until we got rid of the bot. (Spoiler alert: It didn’t take long!)

Eric asked whether Operation Smile’s recent implementation of Cloudflare could help. “Yes. Yes it can.” was my response.

Cloud… what?

Cloudflare is many things, but foremost, and in the context of this situation, Cloudflare protects websites against bad bots; denial of service attacks, hacking attempts, credit card stuffing on donation forms and, of course, form spam. Cloudflare sits out in front of the website, analyzes all the traffic, profiles it, and allows us to decide whether to allow it in or block it out. Cloudflare has extremely powerful algorithms that make determinations about which traffic is generated by a human, a good bot, or a bad bot. 

So instead of panicking, I sat back and confidently logged into the Cloudflare dashboard.

Block the bot!

The Cloudflare dashboard gives me all sorts of information about the traffic – region, browser, page requested, IP address to name a few. In a moment like this, I’m looking for an abusive IP address. I quickly found it as the top IP address hitting the site. The attacker was easy to spot: the IP address had by far the highest hit count. I ran that IP address through AbuseIPDB and, sure enough, it had been flagged for recent abuse. I added this IP address to the IP Access Rules form to block the attacker. 

Did that work? Only one way to tell: monitor the traffic. Cloudflare’s dashboard gives me a real-time, accessible view into the access logs with the ability to see the traffic that is triggering IP Access Rules, WAF (web application firewall) rules, and other thresholds of naughty behavior. I filtered the log for the IP address and watched the bot triggering the IP Access Rule I just added. I watched Cloudflare repeatedly deny this bot’s access to the site. Satisfied that our defenses were holding, I sent Eric the “all clear.” He re-enabled the forms and kept an eye on things from his end. No further issues materialized. We’d won the day!

Total downtime for the forms was minutes, not hours. The mess the bot created was cleaned up quickly and Eric could get back to his important work for Operation Smile.

What if we accidentally block the humans?

A small but important detail: There’s understandable concern about the possibility of blocking legitimate traffic. What if we get this wrong and block humans who want to convert into donors, activists, or subscribers? That, of course, must be avoided. 

Above I said I added a rule to block the IP address of the bot. That’s not the literal truth. Instead, I instructed Cloudflare to initiate what they call a Managed Challenge. From Cloudflare’s blog:

“… the Managed Challenge option will decide to show a visual puzzle or other means of proving humanness to visitors based on the client behavior exhibited during a challenge and based on the telemetry we receive from the visitor. A customer simply tells us, “I want you (Cloudflare) to take appropriate actions to challenge this type of traffic as you see necessary.” When a visitor encounters a Managed Challenge, we first run a series of small non-interactive JavaScript challenges gathering more signals about the visitor/browser environment. This means we deploy in-browser detections and challenges at the time the request is made. Challenges are selected based on what characteristics the visitor emits and based on the initial information we have about the visitor. Those challenges include, but are not limited to, proof-of-work, proof-of-space, probing for web APIs, and various challenges for detecting browser-quirks and human behavior. They also include machine learning models that detect common features of end visitors who were able to pass a CAPTCHA before. The computational hardness of those initial challenges may vary by visitor, but is targeted to run fast. After our non-interactive challenges have been run, we evaluate the gathered signals. If by the combination of those signals we are confident that the visitor is likely human, no further action is taken, and the visitor is redirected to the destined page without any interaction required. However, in some cases, if the signal is weak, we present a visual puzzle to the visitor to prove their humanness. In the context of Managed Challenge, we’re also experimenting with other privacy-preserving means of attesting humanness, to continue reducing the portion of time that Managed Challenge uses a visual puzzle step.”

Is this amazing? Yes, yes it is! 

Frankly, it boggles my mind that we have this much security firepower out in front of the website, fully configurable by us, any time.

Here’s what the user experience looks like, click/tap the “play” icon to watch the workflow:

Further considerations

There are other weapons in the Cloudflare arsenal that I could have used, including:

  • Increase the Security Level on the site as a whole.
  • Put the site into “Under Attack Mode.

Both methods would have increased the sensitivity and response of the firewall, being more aggressive about blocking traffic that looks odd. And I might have used one of these tools had I discovered the attack was distributed, meaning there were just too many IP addresses to swat one at time. Even if I was able to block a range of IP addresses in the moment, if the attack was distributed, this would have told me that it was coming from a sophisticated operation, which probably would have come at us immediately with another set of IP addresses, and we’d be right back in the stew.

I just want to take a moment to note this extraordinary workflow. We were able to fend off a bot attack immediately and on our own without having to open a ticket with a hosting provider – and wait – for them to do – or not do – something about it. The whole incident was over in a matter of minutes, not hours. Functionality was restored, damage was mitigated, success!

Note that Drupal core has a mechanism for blocking IP addresses. I could have used that, but it has a key drawback: Nefarious traffic would be hitting the website and sucking up server resources. Drupal would have to bootstrap just to deny access. If not for having Cloudflare out in front of this website, I would have used this method, but it is less than ideal.

Let’s avoid this whole song and dance next time

At the time of this spambot attack, Cloudflare hadn’t been installed for long, and we hadn’t had a chance to observe and tune the WAF. Sure, we had added a handful of rules blocking traffic that’s probing for exploits, but the WAF rule that I added that initiates a Managed Challenge based on the Threat Score was configured conservatively. I didn’t want to tighten things down right away and cause false positives. The Threat Score had been set to 24, which is considered low or non-intrusive, and tells Cloudflare to only issue a challenge to the most threatening visitors. That Threat Score threshold has been updated in response to this incident. It is now set to 14, considered medium, and now a challenge will be issued to moderate and threatening visitors. We can tighten it down further if the need arises.

Given that this attack was specifically directed at forms, there’s another tool we can use. And yes, we were already protecting the forms with a CAPTCHA – Google’s reCATCHA in fact – but apparently that wasn’t good enough. Cloudflare has a new product that will protect forms on the web with the same technology that powers the Managed Challenge we’re using for our IP Access Rules and WAF Rules. It’s called Turnstile and we can’t wait to install it. OperationSmile.org is built on Drupal. There’s a Turnstile module available that will make integration a snap.

Bot mitigation is the new reality

This is what it’s like managing a website these days. If not dealing with a specific and noticeable attack like a spambot attack or DDoS attack, bots are constantly scanning your website looking for exploits. And while they may not find any, why leave this to chance when powerful, self-service tools exist to protect your mission critical website? Also consider that the cost of hosting your website is proportional to the amount of data you serve. It’s better to stop that unwanted traffic before it gets to your website’s server, so you can spend those dollars elsewhere.

Remember five years ago when your website switched from HTTP to HTTPS to make the connection more secure? I think we’ll look back at this time as another jump forward for website security, when all websites transitioned to setting behind a WAF. This move is even more urgent if your website hosts donation forms or contains personally identifiable information in the database.

“With the help of Stephen and everyone at Capellic we have been able to push back against the threat of spambots. I am confident in the features and abilities of Cloudflare to respond and protect our online properties to enable Operation Smile to help more children around the world.”

Eric Clark, Web Administrator for Operation Smile

How do we get there?

“This must cost a fortune!” 

Nope, you can get all the aforementioned security and functionality for free from Cloudflare. Yes, there are paid tiers of service for extra goodies, but we don’t always recommend those out of the gate. And if such a tier is deemed necessary, Project Gilileo might be able to reduce the cost.

Why free? Cloudflare’s machine learning requires a ton of data. 

“… protecting more sites means we get better data about the types of attacks on our network so we can offer better security and protection for all.” [source]

You’ll need to update your DNS servers to route traffic through Cloudflare and Cloudflare will need to be configured to send that traffic to your web host. Cloudflare configuration will also need to be monitored and tuned over time to deal with the ever-evolving reality of bad actors on the web.

And if you are not already behind Cloudflare and want to start somewhere with spambot protection, implementing Turnstile is a great first step.