Insights
DrupalCon Lightening Talk: Our Evolving Strategy for Taming Performance Nightmares on Drupal Faceted Search Pages
Below is a walkthrough of the deck I presented in the Pantheon booth at DrupalCon Chicago 2026. I hope you find it useful!
Stephen Musgrave talking about AI bot abuse at DrupalCon Chicago 2026
Hello, this is a walkthrough of the presentation I gave in the Pantheon booth at DrupalCon Chicago. I've made some slight tweaks to it based upon the fact that I'm now presenting it after DrupalCon.
AI Bot Abuse, Our Evolving Strategy for Taming Performance Nightmares on Drupal Faceted Search Pages.
My profile can vote, but can't yet have a beer.
Capellic builds and maintains Drupal websites for nonprofits. We do content strategy, information architecture, design, build, and the continuous improvement and so on.
Performance and accessibility are incredibly important. to us. I live in Austin. I love to run and take a dip in Barton Springs pool any chance I get.
So why was I presenting in the Pantheon booth? Because of Capellic's strategic partnership with them. We've been partners for nearly 10 years. Pantheon allows us to focus on delivering value at the application level. And they handle all the amazingly complex technologies that enable that. Trust has been at the center of our relationship from the start. And multi-devs, of course.
The incident. This is what it sounds like.
Content editors can't do their work. In fact, all authenticated users are affected. Even anonymous users using pages that aren't already cached at the edge. Any request that requires a trip back to the origin to build the page from the database will be affected. Website traffic is through the roof. And if you're not on Pantheon, you may be facing an overage fee for exceeding your bandwidth application.
And this is what it looks like.
This is what the incident looked like on ASPCAPro.org, a website that we built and managed. On the left, you see the visits graph.
Uh, the baseline to the left, even before it's spiky, is elevated. Uh, it's just not noticeable here. And these are spikes are pretty, pretty insane. Uh, nearly 1.8 million visitors in one day. And over here on the right, we have the cache hit ratio. Uh, let's be generous and say it's a 25% cache hit ratio. Um, this should be much higher. This is a content website with a couple of search pages. We will get back to this later.
So we need to identify the problem. What's going on? What's the cause of all this traffic? Why is our website crashing?
It does look like a distributed denial of service attack, but I doubt it. We had a cache hit ratio problem, but where was it? I requested a top uncashed page report from Pantheon support. All the URLs in the report included fast forward. And then it hit me. Bots are crawling every possible combination of facets. The bots are requesting different combinations of the same results over and over.
There are millions, if not billions, of combinations depending on the number of facets select widgets and items within those facets.
Why aren't they just visiting sitemap.xml? And they don't pay any attention to the nofollow attribute. And they don't care to obey the robots.txt file. It appears that the AI vendors are so desperate to win the AI gold rush that they haven't bothered to tune their crawlers. And they clearly don't care about being good stewards on our website. Do I know which AI crawlers? No, I do not. But what I do know is that facets are rendered as links and bots love links. Can facets be implemented in a way to avoid traversal? Let's put a pin in that for now. We have a fire to put out.
I just want to acknowledge that your problem might be something else.
You may be may not have facets on your website. The root cause could be one of these issues on the slide. Keep an eye out for the YouTube recording of the session "AI crawlers are crushing your website: Here's what you can do about it." It was a session at DrupalCon. The slides are are already available and I will include a link to them at the end of this video. Also, spending time in New Relic to profile the expensive requests is time well spent. New Relic comes free with your Pantheon subscription.
Let's put this fire out. What can you do in the short term?
Blocking the traffic before it hits the web server is ideal, a web application firewall and does just that. It blocks traffic. Do you have AGCDN plus WAF from Pantheon? If you do, open a ticket and work with support to profile the abusive traffic and block it. Careful, you could end up blocking legitimate human traffic or bot traffic that you, in fact, do want. If you don't have it, you might consider adding this service.
And how about Cloudflare? You can serve a managed challenge to all requests that include the facet query string key. If you don't already have Cloudflare, then I'm, I'm hesitant to say it's a viable short term solution. It requires coordination and changes to DNS, How about a contributed module? So, all that I've evaluated rely on the request count of a single IP. The AI bot abuse I've witnessed is a distributed crawl, meaning many IPs are used. So that one IP looks like normal traffic. And this approach requires a deploy. And it will, and it still allows the traffic to hit the web server. There was some breaking news during the AI crawlers are crushing your website session. I didn't get to attend the entire set of session, but John Brandenburg sent me the, sent me a note to say that there is a 2X version in the works of the bot blocker module. It now uses a threat profile approach, similar to what the WAFs do. Sounds like a big and positive step forward for those websites that don't already have a WAF at the ready.
A frank note about the above approaches. This is a reactive workflow, which is highly disruptive. They really aren't durable long-term solutions for repelling facet traversal.
What we really want is to be out in front on the front foot, not in a reaction mode or building shields into the app. But let's keep in mind that we're talking about immediate relief here. Let's not get too picky.
Remember that ASPCAPro.org visits chart I showed you earlier? March 19th was the first full day after Pantheon applied a WAF rule to block the abusive bot. That's, uh, March 19th, 2025. This flatline is gorgeous. And here's the corresponding chart for cache hit ratio. Over 80% sustained. Now, this is a content website with very few authenticated users and a low search volume. So, 80% isn't really good enough in my book. We'll come back to this later.
Alright, hopefully you've been able to take measures for a short term fix. Let's talk about fixing this for the long term.
There's good news. the latest major version of the facets module supports the ability to render facets as form elements instead of links. Training bots don't submit forms. Not that I've seen. I don't pretend and the facet refactor is an easy lift.
Many websites are minimally maintained and news of this refactor will be an unwelcome surprise to the budget and allocation plan. But it's worth it because it works. And then you save the chaos of future incidents that never matter. Let's take a look.
We did our first refactor for Easterseals.com last summer. You can see on the chart that it was a bumpy ride until July 22nd. July 22nd was the first full day of the refactored facets deployed. Just look at the steady line that we had there on out.
And here's the corresponding cache hit ratio chart. Really bad before the deploy, amazing after. This is where I want to see the cache hit ratio for content websites. In the high 90s. Here's looking in the rear view mirror for 12 months. Steady traffic after the deploy. And an exceptional cache hit ratio after the deploy.
Our second refactor was for ASPCAPro.org. What we showed earlier was how applying a WAF rule for this website helped us mitigate the traffic in the short-term. Let's start with the red. Red is the active incident. Site visits hits 2 million in one day. Yellow is after we deployed the facet refactor. Looks great. Green is even better. This is after Pantheon applied code in the Advanced Global CDN WAF. To redirect any requests hitting the old facet URL format. Yes, we could have done this in Drupal, but that involves a deploy, and it still allows traffic to hit the web server. So, wait a minute, you're thinking. We just deployed the facet refactor. Why are requests to the old facet format still rendering? Why isn't a 404 being returned? We're not sure. Or we're still looking into why. Let's go back to the orange. In comparing the orange to green, we can draw a conclusion. Facet traversal doesn't always appear as abusive with a massive spike in traffic. Facet traversal can be "ambient."
And here's the corresponding cache hit ratio chart. The yellow section was after the deployment. Of the refactored facets and the cache hit ratio was still bad. This is what caused us to take a second look at the requests. The redirect code was applied and the results, the result is what we are seeing in the green section. The cache hit ratio is now where I want it in the high nineties.
Next steps.
Here's what you should do. Look at the metrics page on the dashboard for your website if you're on Pantheon to see if you have a cache hit ratio problem. If you do, then stick with this runbook. Identify the short-term fix. Review the slide deck for the session "AI crawlers are crushing your website: Here's what you can do about it." I'll give you the link in a moment. Read my article. There's a lot of details there that I can't cover in this deck. And I'll provide a link to that in a moment as well. Make plans to refactor facets and then get to work.
I just want to point out over on the right, these are very healthy metrics. In fact, on the pages served graph, the purple line, it's something I haven't presented yet. You see those valleys, those are weekends, and that's typical of human behavior on the internet, people tend to browse the web or active on the web during the weekday and traffic goes down on the weekends. So we can conclude that the traffic on this website is highly human.
So good luck and let me know how it goes.
You can send me an email to be in touch at [email protected]. If you've got further questions or maybe you have a use case, that's slightly different. I did hear from many of you at DrupalCon about your particular use case. And there are potentially some gaps in my own knowledge here.
Here's a link in QR code to my article, AI Bot Abuse: Our Evolving Strategy for Tame Informer's Nightmares on Drupal Faceted Search Pages. There you can find a lot more details, including how to set up a WAF role to issue a managed challenge.
But wait, there's more. AI crawlers are crushing your website: Here's what you can do. That slide deck is available here. And again, I would urge you to keep an eye on the Drupal Association YouTube channel for the video presentation of this. You're just going to be able to pick up all the side comments and anecdotes.
And if you want to learn more from Capellic on the topic of performance, you can scan this QR code or visit this link.
And again, good luck and let me know how it goes. Thank you.
We help nonprofits develop digital strategies, build digital experiences, and support digital teams. With integrity, diligence, expertise, and humor, we bring meaningful change, lighten the load, and ensure our partners’ success. Our team of senior strategists, creatives, and technologists approach our work with a passion for social good. We have exclusively served nonprofits and government agencies since our founding in 2012.