Metafilter Slowness/Unavailability April 28, 2025 6:39 PM   Subscribe

Multiple users have reported slowness or unavailability. This was originally reported in the last Site Update Thread but that is now closed. Submitting this thread as a request for an update and a place for further discussion.

juliebug reported timeouts back on April 9th and asked for followups on the 11th and 17th. Brandon responded on April 17th saying there had been database timeout issues. Additional reports of slowness came in on the 23rd and 24th and 26th. Brandon reported on the 26th that AI bots were scraping the site but that frimble blocked them and things should be back up to speed.

But if you read the rest of that thread it is clear folks are still experiencing issues, myself included. Each page load is taking about 60 seconds. This has been consistent for at least the last 48 hours for me.

This seems bad! Any update would be great.
posted by SpiffyRob to Uptime at 6:39 PM (139 comments total) 7 users marked this as a favorite

This is embarrassing.
posted by phunniemee at 6:43 PM on April 28 [4 favorites]


It took 45 seconds from the time I hit post comment for that comment to post ^
posted by phunniemee at 6:44 PM on April 28 [2 favorites]


Each page load is taking between 25-45 seconds for me also.
posted by kate blank at 6:47 PM on April 28 [2 favorites]


Mod note: Hey y’all just pushing this through so people can talk about it, no updates at the moment. I’ve been off for several days, will look more into this issue tomorrow.

Apologies for the continued problems.
posted by Brandon Blatcher (staff) at 6:50 PM on April 28 [1 favorite]


I'm also experiencing slowness.
posted by Winnie the Proust at 6:58 PM on April 28 [1 favorite]


Sincerely appreciate you chiming in even though you don't have any more info, Brandon.

I accept that we've reached a place where it's rare to hear from any mods other than Brandon. That expectation has been set and it is what it is. But: Speaking only for myself it would have been great to hear from someone about this in Brandon's absence. Given that he'll need to be brought up to speed anyway maybe we could cut out the middle man and whatever mod is taking the lead on this could give us an update directly?
posted by SpiffyRob at 7:01 PM on April 28 [5 favorites]


It's hard to believe that a Web site serving thousands of users has this level of inaccessibility for several weeks and nobody in charge seems to be worried about it and is clearly not up to the task of fixing it. But here we are.

It feels to me like the slow loading gets a bit worse every day and, today, it is truly terrible. I'm no technician, but surely there are logs and things that would show a half-competent person what's going on? If there is nobody at the helm that can do this, at least share data with the community and there will no doubt be someone (or several someones) leaping in to diagnose the issue in a matter of minutes. It's just ridiculous to hide and pretend nothing is going wrong, then Brandon ends up getting thrown under the bus (not yet, but it's coming) because he's the only one willing to engage.
posted by dg at 7:04 PM on April 28 [1 favorite]


seems intermittent the last couple hours.
posted by clavdivs at 7:05 PM on April 28


In case useful: late Saturday night I was even seeing a "We're Sorry — A Server Error Occurred / MetaFilter is over capacity at the moment" message (which adds "The details of this error have been sent to a site administrator"), and earlier today that was popping up again, but only on the FanFare front page.

(Editing to add: there was at least 10+ second delay each time -- sometimes significantly more -- while the page tried to load before showing the error.)
posted by nobody at 7:13 PM on April 28 [1 favorite]


Surely being on AWS it should be pretty simple to increase resources as a stopgap measure.

If there is a lot of crawling going on are closed threads, at least, if not also older threads, being cached to relieve database load?
posted by ssg at 7:30 PM on April 28


For what it's worth, I check MeFi multiple times a day on different machines/browsers (Chrome and Firefox) and I haven't had any slowness issues at all until about two hours ago on my Chromebook. I'm now seeing page load delays of about 20-30 seconds.
posted by pdb at 7:33 PM on April 28 [3 favorites]


I am going to reiterate my suggestion from the other thread: Cloudflare is $20/month and if this is being caused by bad traffic (which i think is likely) it would likely be an extremely effective mitigation.
posted by adrienneleigh at 8:19 PM on April 28 [13 favorites]


It's been bad for at least a week or two.
posted by lookoutbelow at 8:44 PM on April 28


When I reload a thread now, it's a 50% chance that it'll time out and fail to load. If it does, then it takes 15–30 seconds to do so enough to display, and another 15–60 seconds to finish loading everything to run the JS to jump down to whatever comment I'd clicked on the timestamp for (to add the anchor to the URL, so I can get back to where I was in a thread) and add the indicator triangle in the left margin. Earlier this morning, I just couldn't load any threads and gave up; same thing over the weekend.
posted by JiBB at 8:50 PM on April 28 [1 favorite]


47 seconds when entering blue
about same to grey
but ran a test
no lag now.
posted by clavdivs at 9:22 PM on April 28


no lag
end test
edit at 420
end test
posted by clavdivs at 9:23 PM on April 28 [2 favorites]


Today the site timed out when I went to it the first time.
posted by bootlegpop at 9:23 PM on April 28


I’ve had repeated timeout issues this week
posted by sixswitch at 9:24 PM on April 28 [1 favorite]


I'll check the AE-35 unit.
posted by clavdivs at 9:31 PM on April 28 [7 favorites]


There’s no earthly way of knowing what bots are going. There’s no knowing where data is flowing or which log-in is a
slowing

is it coding, is it my end
is it mobile or the server

see some going,
let's bleme the mods
for showing
some reaction
to external reason
full well knowing....
the cellist stopped bowing
posted by clavdivs at 9:47 PM on April 28 [7 favorites]


I was having some slowness on my phone this weekend but I was travelling and assumed it was a function of crappy wifi.
posted by gentlyepigrams at 9:57 PM on April 28 [2 favorites]


As I commented on the site update post, this exact situation was happening in January 2024 and with the same apparent cause, AI scraper bots.

Why wasn't Cloudflare (or similar) put into place at that point? This is a no brainer, folks. You can't just keep smacking the bots down one by one when they crop up.

Or are the finances in such a dire state that the site can't even afford a measly $20 a month?
posted by fight or flight at 12:24 AM on April 29 [5 favorites]


slowness [g/wiki]

speed ¿♟️?
posted by HearHere at 12:55 AM on April 29 [1 favorite]


Ironically timed out while trying to load this thread.
posted by sciencegeek at 2:40 AM on April 29


I started having issues yesterday, before that I hadn't noticed any slowdown.
posted by Kattullus at 3:35 AM on April 29


It’s been intermittent for me over the last couple of days, but concerning nonetheless. At times MeFi has been unusable. It definitely smells like bots swamping the servers.

I belong to an extremely small forum, and we’ve been offline for the past couple of weeks due to a sudden and unrelenting swarm of bots overwhelming the servers. We tend to get loudly political there (and not in a way that is complimentary to the current US regime) and the more conspiratorial of us have posited (without any proof, of course) that someone is targeting sites that tend to be critical of the current regime, swamping their hosts’ servers with the goal of making the sites unusable.
posted by Thorzdad at 3:58 AM on April 29 [2 favorites]


Maybe we could turn off images so pages would load faster?
posted by snofoam at 4:10 AM on April 29 [19 favorites]


Last night (April 28), I couldn't usefully access the site. I was getting the front page *very* occasionally, but mostly just getting browser error pages, and https://downforeveryoneorjustme.com/ reported Metafilter was down.

I use PIA VPN and I tried switching between several regions—thinking it might be my VPN—before I checked.
posted by Wilbefort at 4:18 AM on April 29


As part of trying to track down the UTC morning database problems, I've had uptime-monitor.io and statuscake running for a while. They're both free services and pretty limited, but uptime-monitor reports long periods of downtime (defined as "failed to respond to a HTTP HEAD request within 5 seconds") and statuscake reports response times of 20+ seconds today and over the weekend (click metafilter.com and set the graph to 1w).

Also, +1 for Cloudflare - surprised the site doesn't already have this running. The new site will need it too.
posted by Klipspringer at 4:23 AM on April 29 [2 favorites]


Right now I’m getting "Metafilter is currently down. We know about it and are working on it.” on Status, despite the site being…up.

And all weekend while the site was running terribly Status just said everything was hunky dory; the only update I could find was inexplicably on Bluesky rather than on the Status subsite.
posted by bcwinters at 4:38 AM on April 29 [3 favorites]


Slow all weekend for me as well but it became quite zippy on the main site and subsites just before I posted this comment. Woot!
posted by ashbury at 5:40 AM on April 29


Been very slow (30-60 sec each page load) all week but seems better today. I thought we were being punished for complaining about the death of net neutrality
posted by toodleydoodley at 6:05 AM on April 29


The slowdown is annoying and not good for retaining users but I am always slightly surprised when people jump on the bandwagon recommending Cloudflare.

Cloudflare works by proxying all content through their servers. Do we want everything typed into Metafilter to go through a 3rd party? They also have a habit of incorrectly blocking legitimate but slightly unusual traffic - no more anonymously browsing AskMe with cookies turned off.

I will admit that sometimes Cloudflare can be a necessary evil but deploying it deserves more consideration.
posted by AndrewStephens at 6:19 AM on April 29 [6 favorites]


I forgot to add that, depending on the bots, Cloudflare might not actually fix the problem. I had the free tier of Cloudflare enabled on my site for a few months when I started to get hit a few years ago. It cut down on the bots by a huge amount but a few really persistent ones still got through no matter how I tried to configure the firewall.

In the end I decided it wasn't worth the bother.
posted by AndrewStephens at 6:36 AM on April 29


The site has been back to normal responsiveness for me today. It would be helpful to know if the staff did something to make this happen, or whether its just the tide of bots flowing out temporarily.
posted by Winnie the Proust at 7:17 AM on April 29 [3 favorites]


Things are loading as usual for me, which is great. Would love to know what observations were made by staff in recent days and what steps, if any, were taken.
posted by SpiffyRob at 8:48 AM on April 29 [5 favorites]


As of this morning, for me, the MeFi sites are back to loading as fast as I would normally expect them to on both Chrome and Firefox (Windows 11).
posted by pdb at 9:10 AM on April 29 [2 favorites]


edit at 420
end test

Say no more.

posted by y2karl at 9:19 AM on April 29 [3 favorites]


Clearly the slowdown gave everyone the chance to read other sites...I haven't seen the front page this busy with new posts in a while.
posted by mittens at 12:14 PM on April 29 [4 favorites]


> Cloudflare works by proxying all content through their servers. Do we want everything typed into Metafilter to go through a 3rd party? They also have a habit of incorrectly blocking legitimate but slightly unusual traffic - no more anonymously browsing AskMe with cookies turned off.

Everything already goes through multiple third parties, called the internet, and everything we type on here is exposed publicly, and scraped, stored in search engines, and also incorporated into LLMs. MeMail could be excluded, but I wouldn't treat MeMail like secure messaging, personally. Seems like a net win to introduce a Content Delivery Network to take load off the server(s).
posted by dis_integration at 12:19 PM on April 29 [9 favorites]


two bells and all is well
posted by clavdivs at 2:19 PM on April 29


This has been discussed before, but old content should become static, either on the back end entirely or through caching servers with a long expiration time. Rebuilding content from 2002 again and again is untenable between the budget of the site, nature of the content, and nature of scraping bots. A CDN is going to be of limited use unless the bots are requesting the same things again and again from the same IPs, but this sounds like they're marching through old content. And Akamai is not cheap.

It it's not been done already, looking at the slow query log on the SQL server might uncover some poorly written or indexed queries. At my previous employer, I solved a database performance load issue vaguely akin to this by adding a index to a horribly inefficient query, dropping to from 30 seconds each run to something measured in ms.
posted by Candleman at 2:34 PM on April 29 [3 favorites]


Mod note: SpiffyRob: "Would love to know what observations were made by staff in recent days and what steps, if any, were taken."

Frimble took the site offline for a few minutes around 6am ET time to do some maintenance. They also cleaned up some SQL queries that were running slow. Things have been running running a lot more smoothly all day as a result.
posted by Brandon Blatcher (staff) at 3:10 PM on April 29 [2 favorites]


That... does not have a ton of explanatory power.

What changed to slow things down? Was this related to bot/AI activity, or did these SQL queries decide to start acting up out of nowhere?
posted by sagc at 3:17 PM on April 29 [3 favorites]


Thanks Brandon, appreciate the update.

To everyone else who works for or runs MetaFilter: The site was close to unusable for many people for at least 48 hours. We reported this. We heard nothing from any of you. Were y’all not using the site at all in the period or did it just happen to not be an issue for any of you? Or did you notice and just not think it was a big deal and nothing needed to be communicated? I understand we’re in transition but it’s hard to have something like this happen and not be left feeling like Brandon is the only one who actually gives a shit about MetaFilter on a day to day basis
posted by SpiffyRob at 3:24 PM on April 29 [17 favorites]


I have always assumed that the staff were active participants in Metafilter; that they posted and commented and read Metatalk. That was obviously the case in Matthowie's day and Cortex's. But is it true now? I don't know. It certainly does look like the staff haven't been reading Metatalk recently, and maybe they haven't been using the site at all. If that's true, it's a very odd place for Metafilter to be.
posted by Winnie the Proust at 4:44 PM on April 29 [5 favorites]


If I were a staff member, I would find it very hard to engage with the site as a user, given how some users treat staff.
posted by sixswitch at 5:23 PM on April 29 [21 favorites]


Right, but at a certain point the job is to engage with users, particularly around critical site updates like downtime or unexpected behavior.
posted by bluloo at 5:59 PM on April 29 [12 favorites]


Wasn't there a post recently about how we were saving a significant amount per month in AWS fees... Maybe a little too much?
posted by TheJoyOfRighteousViolence at 6:36 PM on April 29


FB slows down and goes wonky all the time. How about you take your complaining attitude to the Zuck man instead of the poor volunteer mods here?
posted by Melismata at 6:38 PM on April 29


What volunteer mods?

That misconception explains so much about your posts here!
posted by sagc at 6:39 PM on April 29 [34 favorites]


Mod note: Just to confirm, so there's no confusion, members of the moderation team are paid
posted by Brandon Blatcher (staff) at 6:41 PM on April 29 [5 favorites]


Melismata: "How about you take your complaining attitude to the Zuck man instead of the poor volunteer mods here?"

HAHAHAHAHAHAHAHAHAHAHAHA OH MY GOD
posted by phunniemee at 7:00 PM on April 29 [26 favorites]


FB slows down and goes wonky all the time. How about you take your complaining attitude to the Zuck man instead of the poor volunteer mods here
💖🎀Aw thanks helpful dear heart!🎀💖
posted by bunton at 7:23 PM on April 29 [2 favorites]


On a tangent, I realized there is one day left in the month, but I don't see any site update for April.
posted by NotLost at 7:55 PM on April 29 [3 favorites]


Being a moderator here is a paying job but I'm not sure any salary would be worth it, lol.
posted by kittens for breakfast at 8:08 PM on April 29 [7 favorites]


A CDN is going to be of limited use unless the bots are requesting the same things again and again from the same IPs, but this sounds like they're marching through old content.

I thought this was more about Cloudflare’s hopefully-superior ability to detect and block bots by virtue of their position as a middleman for everyone than it is their value as a straight-up CDN?
posted by atoxyl at 8:35 PM on April 29 [2 favorites]


atoxyl: "I thought this was more about Cloudflare’s hopefully-superior ability to detect and block bots by virtue of their position as a middleman for everyone than it is their value as a straight-up CDN?"

Correct. You can turn on a lot of fairly sophisticated filters, and Cloudflare's heuristics are better than just about anyone else's at this point. They're absolutely not catching 100% of shitty LLM scrapers, and they're constantly working on improving that, but they catch more than just about anyone else. (And you also have the ability to turn on various levels of "are you a person?" challenge.)

Look, i don't love Cloudflare as a company. Their CEO is one of those Free Speech Bros who's happy to do business with Nazis, among other things. But given the absence of a dedicated team of really, really good sysop/admin people who can help deploy and tune something like AWS' WAF, they're absolutely the best available option for Metafilter.
posted by adrienneleigh at 9:00 PM on April 29 [8 favorites]


it seems like the slowness is a little better now but it still kinda sucks
posted by metafluff at 10:21 PM on April 29


it seems like the slowness is a little better now but it still kinda sucks

Site performance has mostly been fine for me today (two or three people describing their subjective experience is MetaFilter’s APM dashboard) after several days where it was often borderline unusable. But I do think having a plan in mind other than “hope it blows over” would be a good idea in the future, since sketchy scraper bots are a fact of life these days.
posted by atoxyl at 10:29 PM on April 29 [4 favorites]


I also saw some slowness and outages, but it's been OK for the last couple of days. Presumably that's just a temporary respite though, given that scrapers are crippling small sites all over the Internet.

I'd say put in Cloudflare now, and work out some kind of hand-knitted artisanal solution later if we want one.

I'm also a bit concerned about what the cost is of serving up so many pages to scrapers. Are we going to be hit with a massive hosting bill because it it?
posted by TheophileEscargot at 12:51 AM on April 30 [1 favorite]


Wasn't there a post recently about how we were saving a significant amount per month in AWS fees... Maybe a little too much?

I think that saving came from changing the way the automatic backups work. It should not affect anything about the way the site itself operates.
posted by TheophileEscargot at 1:01 AM on April 30 [2 favorites]


It would be really good to have an authoritative statement from a technical member of staff on:
- what caused the site to run slowly for 48-96 hours
- what was the resolution
- what steps are being taken to prevent recurrence
- whether the community's suggested actions (cloudflare, caching old content) will be acted on. if so, on what timeline. if not, why not.
- where we should expect to find status updates during downtime - bluesky? the status page?
posted by Klipspringer at 1:31 AM on April 30 [16 favorites]


Also, we're seeing the traditional 9-10am UTC round of timeouts and connection failures (recorded by uptime-monitor.io). An update on this too would be appreciated.
posted by Klipspringer at 2:29 AM on April 30 [4 favorites]


I have to be honest, Klipspringer - as a user, I would never READ a statement on "why we were slow and what the mods did to fix it" and anything else you said, because a) I wouldn't understand it anyway, since I do not know jack shit about computer programming, and b) it DID get fixed.

I appreciate that several people want to know why, but to be frank, several people's comments are coming across as Monday-morning quarterbacking. I mean, sometimes shit just breaks, and people fixed it, so it works again. And I just don't think the fact that it didn't get fixed faster is as big a problem as others seem to feel.

If not being able to load Metafilter as quickly is the biggest problem you have, you are fortunate indeed.
posted by EmpressCallipygos at 4:16 AM on April 30 [7 favorites]


I would read and understand such a statement.

Also, it's not Monday morning quarterbacking if the game - of being a website - is still ongoing.

Finally, who said it was their biggest problem? Nobody that I can see.
posted by sagc at 4:23 AM on April 30 [8 favorites]


EmpressCallipygos: "If not being able to load Metafilter as quickly is the biggest problem you have, you are fortunate indeed."

Oh please. This is an online community. Not being able to load the online community is actually a pretty serious problem to the online community.

I am also a person who wouldn't understand the technical details about what happened or what was done to resolve it, but now this is allegedly a community owned website, the people who have this information have a duty to share that information with the community so that the knowledge of the community can be leveraged to solve it.

It doesn't require any technical knowledge whatsoever to put out a statement like "we are aware the site is experiencing timeouts and slowness and we are working on a solution" or "frimble did a whatsit this morning at 6am, please email us at blah blah blah if you're still having a connection issue." Literally ANY proactive communication WHATSOEVER would be BETTER. Instead our asses got ghosted for days, all of us talking about timeouts in the March update thread got ghosted for days (weeks), and even when things started to get better we only got info why when Spiffy specifically asked. Absurd.

And like others have already said above, what this indicates to me more than anything else is that the mods do not use Metafilter and do not care. I suppose it's possible that the 6 people in the whole world who didn't experience any MeFi slow time happen to work for MeFi, but that seems real convenient. So what happened there. Y'all just don't log on? You notice it's slow and don't say anything? You toss a note in the Slack for Brandon to deal with later? Who knows! No one talks to us so we gotta fanfic out the scenarios over here ourselves, and from my Ao3 they all look fucking pathetic.

So please, gently, if you don't care about Metafilter enough to wonder why the paid staff doesn't seem to care about it, maybe go back to whatever else you were doing and leave Metatalk alone today. If watching people be concerned about Metafilter is the biggest stress you have, you are fortunate indeed.

and while I'm at it where's the hecking financial statements jesus christ.
posted by phunniemee at 4:30 AM on April 30 [24 favorites]


In the world of software it's considered good practice to run a retrospective on any major outage, once it's fixed, to establish exactly the things that Klipspringer mentions - what were the contributing factors to the outage, and are there things that can be learned from it?

Since Metafilter is meant to be community led these days, it seems highly appropriate that knowledgeable users help by raising awareness of good practices where it's likely to be useful. It makes complete sense for people who aren't familiar with software development not to want to get involved, but that's not a reason why people who are familiar should refrain from helping out.
posted by quacks like a duck at 4:31 AM on April 30 [18 favorites]


I appreciate that several people want to know why, but to be frank, several people's comments are coming across as Monday-morning quarterbacking.

I realize Metafilter isn't like, a company, and expecting a high level of professionalism out of it is kind of fruitless, but I did want to say that the quarterbacking and nagging is actually very useful in a professional environment, as it can guide communication policies around downtimes, can nudge management into asking for more reflection on the root causes, and forces a certain amount of transparency that makes customers feel listened to.

In my company, when we have big downtimes, my team is usually the first to know, because customers call us (and call and call) asking about it. The speed and number of the calls give you a sense of just how bad the outage is. People get honestly nervous around outages. They wonder if they broke something themselves. They're in this anxious state of not-knowing, and if you're on my team, part of your job is to make them feel better about that--it was our fault, don't worry, you didn't break it.

And the engineers would just like to fix it and go back to watching cat videos or whatever they do when a server's not on fire, but the really nosy customers demand a root cause analysis, and that's helpful. Nobody's like celebrating that now we've got to have a call with the customer where you explain stuff you don't really understand yourself ('well you see...there's a...um...database?'), but the function of that conversation is to strengthen the strained relationship, because in the moment you went down, the customer began thinking about other options, other vendors. So fessing up is really an important part of my job.

And again, I'm not someone with any technical expertise. My job is just to talk to people. But I think that's a really important job when there's a problem, and I think that's all people are asking for here, someone to talk them through the downtime and explain what happened.

From a business perspective...well, Metafilter is a small site, and there are a billion other places people could be. Clear communication around downtimes could be seen as a user retention effort.
posted by mittens at 4:46 AM on April 30 [28 favorites]


b) it DID get fixed.

Did it? The site was down/slow again for me this morning, as it has been every morning for a little while. "We played whack-a-mole with a few bots" isn't actually a fix.

If not being able to load Metafilter as quickly is the biggest problem you have, you are fortunate indeed.

Respectfully, you have the power and the freedom to close the tab and/or remove this post from your recent activity at any time, if you'd like to go and do something else.
posted by fight or flight at 5:40 AM on April 30 [6 favorites]


MetaFilter is a privilege, not a right. Be kind.

Mods, I see you and I know you are doing your best under difficult circumstances.
posted by chmmr at 6:30 AM on April 30 [9 favorites]


I am the kind of sysadmin/AWS/Unix nerd who would appreciate -- and understand -- a summary. Not an official Root Cause Analysis or After-Action Report or anything, but like...three sentences?

I mean, "changed some long-running SQL queries" is a legit description of an actual fix, especially if there were new indexes created or the syntax was improved or a cache was refreshed or something. But show me a "before" and "after" number (run queue length? performance time? something) that makes it clear what happened, and I will be much happier.

Thanks, mods, for the work you do.
posted by wenestvedt at 6:45 AM on April 30 [3 favorites]


MetaFilter: A privilege, not a right.
posted by snofoam at 8:45 AM on April 30 [2 favorites]


If not being able to load Metafilter as quickly is the biggest problem you have, you are fortunate indeed.

It certainly isn't the biggest problem I have in my life, but it is probably one of the top 5 problems Metafilter is having right now. I haven't alerted my neighbors or told random people on the street about it, but it seems to be worth discussing here on the area of the site created for discussing the site.
posted by snofoam at 8:50 AM on April 30 [9 favorites]


snofoam: "MetaFilter: A privilege, not a right."

Literally soooo true for so many things! I can't wait to start using this response in Ask! It works for everything if you really think about it! Incredible!
posted by phunniemee at 8:51 AM on April 30 [3 favorites]


Like I'm sorry, are you concerned about something? Wondering how it works? Need help? My goodness, your elite entitlement is showing.
posted by phunniemee at 8:52 AM on April 30 [7 favorites]


I appreciate that several people want to know why, but to be frank, several people's comments are coming across as Monday-morning quarterbacking. I mean, sometimes shit just breaks, and people fixed it, so it works again.

I wouldn’t want to demand that MeFi devs devote a lot of time to writing up detailed postmortems, given the skeleton crew, but the bot discussion literally started with a staff member identifying a known problem but saying they are unsure about permanent solutions, which lead to people with relevant technical backgrounds suggesting solutions, which is more or less the point of sharing the technical details in this context.

I mean, again, I don’t want to prescribe more process here necessarily because god knows there’s not broadly a shortage of that, but I don’t think someone who admits they don’t understand and don’t care about the details is particularly in a position to judge whether the people who do are “Monday morning quarterbacking.”
posted by atoxyl at 9:36 AM on April 30 [11 favorites]


Based on the statements thus far it seems a little ambiguous whether:

a.) the site was overwhelmed by bots: rough to deal with in the short term because it’s not directly in the site’s control, may have blown over, but probably not the last time, known hazard of existing online with paid off-the-shelf defenses so should do a cost-benefit analysis at least

b.) there was a DB/query issue: probably should have been caught sooner if that was the root cause, but conceivably more of a concrete fix or at least a short-term resolution with a road to a permanent fix.

Or (a) leading to (b) etc.
posted by atoxyl at 9:53 AM on April 30 [3 favorites]


More information from frimble:
- the site was running slowly due to a combination of scraper bots making up most of the traffic for several days and certain SQL queries being unoptimized such that if the site is under heavy load they could take minutes to complete rather than under a second

- The resolution so far has been blocking scraper bots, monitoring and re-blocking them as they change servers and rewriting the SQL queries in question to work consistently in under a second

- Further steps are finding the same error in other queries and looking into getting cloudflare or other caching working.

- I can’t give a date but having caching up for non-logged-in users is my priority beyond monitoring and checking the site, looking at early May

- Status Updates should go to BlueSky: the status page is primarily for intentional downtime.
posted by Rhaomi at 11:25 AM on April 30 [21 favorites]


- Status Updates should go to BlueSky: the status page is primarily for intentional downtime.

Put that on the status.metafilter.com page.
posted by Diskeater at 11:32 AM on April 30 [15 favorites]


Thanks Rhaomi, sincerely appreciate the update. I hope the calls for more information where taken in the spirit I (and I believe others) intended.

Personally, if Metafilter were working but a little janky I'd expect to see info about it in MeTa. I will keep an eye on BlueSky going forward if this happens again but I've always appreciated the ability to have a conversation with the community when things like this happen. Even if it's not possible until we build voting software from scratch I hope that someday we can return to a time when communication around things like this can be prioritized.
posted by SpiffyRob at 11:42 AM on April 30 [3 favorites]


On reflection that last bit was flip and not productive. My apologies.
posted by SpiffyRob at 11:47 AM on April 30


Thanks frimble and Rhaomi, that's a good update.
posted by Klipspringer at 12:49 PM on April 30 [4 favorites]


Thanks, Rhaomi!

(So (a) -> (b) it is)
posted by atoxyl at 2:47 PM on April 30 [1 favorite]


And yeah I think the one thing that was missing from a communication standpoint was posting an “official” MeTa acknowledging the issue. Inevitably that’s secondary to other channels for announcing an outage because in a sufficiently serious outage there is no MeTa but it was weird that initial staff communications on-site were confined to the bowels of a different thread. I know I had to dig around on a site that wasn’t working very well to find whether people were discussing the issue!
posted by atoxyl at 2:57 PM on April 30 [6 favorites]


Thanks for the update Rhaomi :-) While some may want more detailed technical information, what you provided made sense to this non-technical person and I appreciate getting a plain-English explanation.
posted by dg at 4:03 PM on April 30 [3 favorites]


Excellent summary -- thanks! It's whack-a-mole to block those bots: they're a scourge.

(Also, I don't use a TV Bluesky, but at least updates will be somewhere public.)
posted by wenestvedt at 6:28 PM on April 30


Twice now in 24 hours I've tried to post a comment only to get an ERR_EMPTY_RESPONSE page.
posted by mittens at 2:15 AM on May 1


Site seems sluggish this morning, I'm sad to say.
posted by kbanas at 4:52 AM on May 2


Mod note: One duplicate comment removed.
posted by Brandon Blatcher (staff) at 4:54 AM on May 2


kbanas: "Site seems sluggish this morning, I'm sad to say."

Yes same.
posted by phunniemee at 5:22 AM on May 2 [1 favorite]


Definitely sluggish.
posted by ssg at 6:24 AM on May 2


Ditto ditto.
posted by PussKillian at 8:16 AM on May 2


Today things have been loading fast enough for me but I've been unable to comment on the blue. Curious if this will work.
posted by trig at 2:32 AM on May 4


Okay, tried the blue again now and it worked. Had been getting no connection errors for about five minutes before that.
posted by trig at 2:35 AM on May 4


Perhaps of interest, or possibly irrelevant. I live in and access the site from Vienna, Austria and have not experienced any slowness, lag, or other technical issue, and certainly not recently. Not sure why this is so.
posted by 15L06 at 12:58 PM on May 4 [2 favorites]


If bot swarms are flooding the server, perhaps check out Anubis. I don't know if it can only require proof of work from agents claiming to be browsers, but it's being used in lots of large organizations.
posted by Pronoiac at 9:53 PM on May 4 [1 favorite]


I’ve had noticeable sluggishness this morning.
posted by nat at 10:00 AM on May 6 [1 favorite]


running very slow - east coast USA
posted by bowmaniac at 10:02 AM on May 6 [1 favorite]


The site is almost unusably slow for me right now. Downforeveryoneorjustme.com says that the site is down (which it obviously isn’t, but that means it’s not my internet connection or device causing the problem).
posted by maleficent at 10:21 AM on May 6 [2 favorites]


Very slow, Pacific Northwest.
posted by neuromodulator at 10:43 AM on May 6


can't open the front page, been slow all day, west coast of Europe
posted by chavenet at 11:12 AM on May 6


Very slow, from the border of MA, NY and VT.
posted by BlahLaLa at 11:13 AM on May 6


neuromodulator: "Very slow, Pacific Northwest."

Seconded, PNW (Firefox/Chrome, Win11)
posted by pdb at 11:14 AM on May 6


Still slow.
posted by ssg at 11:40 AM on May 6


Slowdown started at ~0929 pacific, and is continuing. See here: https://snipboard.io/xYtHiL.jpg
posted by Frayed Knot at 11:41 AM on May 6 [1 favorite]


Was fine this morning, now unusably slow.
posted by Winnie the Proust at 11:58 AM on May 6 [1 favorite]


I hate to suggest this, but could those "extraneous services" we cut from AWS a few months ago have caused this? I just understand the site to be a coldfusion kluge.
posted by frecklefaerie at 12:25 PM on May 6


As of now it seems to be back to normal responsiveness for me at least.
posted by pdb at 12:49 PM on May 6


frecklefaerie: "I hate to suggest this, but could those "extraneous services" we cut from AWS a few months ago have caused this? I just understand the site to be a coldfusion kluge."

Fair question, but no. All we did is delete old backups.
posted by Frayed Knot at 1:03 PM on May 6 [3 favorites]


It seems normal for me, now.
posted by neuromodulator at 1:33 PM on May 6


While we're reporting site issues, is anyone else seeing Google results for the front page anytime since April 1? Like, with a custom search for site:www.metafilter.com, I can see plenty of results from previous months this year, and I can see some results for site:ask.metafilter.com and plenty of results for site:metatalk.metafilter.com.

But if I set the custom time range to April 1 to May 1, there's nothing for the Blue. Also, the results for Ask include some /favorited/ pages that at a glance seem disallowed by its robots.txt, so maybe there's something wrong there too.
posted by Wobbuffet at 1:48 PM on May 6


Wobbuffet, I'm not seeing specifically that things from before April 1 are showing up, but definitely seeing that results from Metafilter are missing in Google search, including recent pages and older pages. It's not just if you use site:metafilter.com, it's in any Google search results. Distinctive phrases searched in quotes that appear on the site get no results in Google.

Maybe something was broken in the efforts to block AI data scraping? Or maybe the timeouts have caused Google to give up? In either case, it should be easy for whoever is responsible to check Google Search Console to see what's up.
posted by ssg at 2:24 PM on May 6 [2 favorites]


I'm getting some slow down today, enough that I came to MeTa to see what was up (usually I avoid MeTa). Just FYI.
posted by Acey at 2:44 PM on May 6 [1 favorite]


Site's very slow the past half hour (from Australia, if it helps)
posted by coriolisdave at 3:13 PM on May 6


Yes, slow for me in the Bay Area. Came here for the same reason.
posted by oneirodynia at 4:00 PM on May 6


Slow in Maine. Even if Brandon is off work, could we get any kind of update from the board or staff about whether the fixes that have been suggested might be implemented, or are we content to just let the site die this way?
posted by donnagirl at 4:44 PM on May 6 [4 favorites]


Running slow here in eastern NC
posted by Roger Pittman at 4:50 PM on May 6


Again, unless the bot traffic is only targeting recent content, the solution to this is caching the legacy content that no longer has a reason to be dynamically built for every request. It's a pretty simple thing to spin up a caching server.
posted by Candleman at 5:06 PM on May 6 [2 favorites]


The ongoing slowness has been a real bummer and made me use Metafilter less. Sorry to complain but it's been a real problem for two weeks now?

"Fun" fact: one of the first things I did when I starting working at Google was help mathowie figure out how to protect Metafilter from being crushed by Googlebot. I think the answer at the time was some combination of caching and doing If-Modified-Since better. It's depressing that twenty-five years later everyone on the Internet is still struggling with the same basic problems.
posted by Nelson at 5:23 PM on May 6 [5 favorites]


Mod note: Frimble notified of site slowness.
posted by Brandon Blatcher (staff) at 5:40 PM on May 6


5 may, 6 may - from nyc

nyt/guardian, take 0.5-1s to load entire webpage on my connection
metafilter (all subsites), takes ~10s+ to connect
posted by lalochezia at 5:42 PM on May 6


There's varying ways to do it & fixing the direct issue is surely a higher priority, but if at some point there's interest in setting up automated synthetic monitoring/alerting through SpeedCurve or the like, I'd be happy to help set that up & talk with whoever's interested about details. (no affiliation with SpeedCurve, I just rather like their stuff & their pricing has a pretty low floor.)

Also happy to help on the general topic of web-performance. Current Backend/DB stuff aside, Mefi missing the React/client-side-JS wave means this's far less of an issue; but I figure I should make the general offer as well.
posted by CrystalDave at 6:00 PM on May 6 [1 favorite]


Frimble notified of site slowness.

We need to do better than 8 hours to even notify the person responsible when the site is very slow, especially when this is an ongoing issue that's been on and off for quite a while.

Is there no monitoring set up? It would be trivial for frimble to be getting automatic notifications when load times cross a certain threshold for more than a few minutes.
posted by ssg at 6:36 PM on May 6 [3 favorites]


And here I thought it was my crappy Samsung phone. Why I oughta...
posted by y2karl at 7:17 PM on May 6


ssg: "Frimble notified of site slowness.

We need to do better than 8 hours to even notify the person responsible when the site is very slow, especially when this is an ongoing issue that's been on and off for quite a while.

Is there no monitoring set up? It would be trivial for frimble to be getting automatic notifications when load times cross a certain threshold for more than a few minutes.
"

I 100% agree with all of this. Maybe a bunch of somebody’s should be notified?
posted by ashbury at 7:20 PM on May 6 [3 favorites]


And tada! Back up to regular speed again, at least for me. Will there be an explanation at some point?
posted by ashbury at 7:21 PM on May 6 [2 favorites]


I wonder if the problem might have to do with time of day? It seems to be so widespread, but yet never affects me, in New Mexico.
posted by NotLost at 8:55 PM on May 6


It’s fixed for other people? Not back to normal for me at all. Slow to load when it doesn’t timeout.
posted by janell at 9:44 PM on May 6 [1 favorite]


Very slow for me still.
posted by rosiroo at 10:00 PM on May 6


I think you've got to go for Cloudflare or something equivalent at this point. Fiddling around with stuff is always going to look like something worked when the bot/crawler traffic happens to have a lull, but then break again when the traffic comes back. The only long term solution is to filter the traffic before it hits the servers.
posted by TheophileEscargot at 12:10 AM on May 7 [2 favorites]


Brandon Blatcher: "Mod note: Frimble notified of site slowness."

So, content to let it die this way.
posted by donnagirl at 3:03 AM on May 7 [3 favorites]


1. The MeFi ColdFusion error page says "we've been notified of this error". Are error notifications not being sent to frimble?

2. uptime-monitor.io connects to the site from three different regions: US (us-west-2), Europe (eu-west-1) and Asia (ap-southeast-2). You can see here that when the site is down, it's down for users everywhere. Geographical check-ins are basically a red herring imo.
posted by Klipspringer at 4:16 AM on May 7 [2 favorites]


What is the interim board doing to ensure the continued viability of the community’s primary asset and gathering space? Why can’t anyone tell us where all the money is going and why it is not being used to keep the website at a base level of functionality for the community?
posted by ohneat at 6:38 AM on May 7 [5 favorites]


We've been messaging with frimble, who is the person best equipped to deal with this.

The primary issue now appears to be automated scrapers "getting lost" in the tag pages. This makes a lot of sense:

- tags are required on every post
- tags are free-form; there are well over 100,000 (many used just once)
- the listings for each tag are dynamically generated on demand
- there are listings not only for each tag, but for both any combination of two tags and any combination of tag(s) and a specific poster, which increases the surface area by orders of magnitude

These haven't been a problem in the past (it is all text after all), but combine greedy, ill-behaved bots with effectively infinite content and you have a recipe for melted servers.

Frimble is working on a new round of mitigations (new bot blocklists, SQL adjustments to reduce CPU load, Cloudflare configuration); if we're hit again, they'll disable the tag pages entirely until we can increase our level of Cloudflare protection.
posted by Rhaomi at 9:44 AM on May 7 [52 favorites]


Thank you Rhaomi.
posted by 15L06 at 11:33 AM on May 7 [3 favorites]


Seconding Anubis... I want to see if it will help out the bot situation that I'm dealing with at work, but figured this is worth a read for MeFi too:

The Day Anubis Saved Our Websites From a DDoS Attack
About three weeks ago, I started receiving monitoring notifications indicating an increased load on the MariaDB server. This in itself is nothing too unusual. It usually means nothing but a sudden influx of new visitors, and in most cases, it is just a link being shared somewhere or a single IP trying to annoy us.

The notifications popped up and disappeared as quickly as they appeared. I started to look into the log files of our web server, and I didn’t notice anything too unusual, maybe a bit more background noise. This went on for a couple of days without seriously impacting our server or accessibility–it was a tad slower than usual.

And then the website went down.

We use a stack consisting of Apache2, PHP-FPM, and MariaDB to host the web applications. The server logs revealed that everything was saturated. Apache2 refused to accept new connections, the PHP-FPM pools were completely filled, and MariaDB also had no connections left.
...
The main problem is time. The URLs accessed in the attack are the most expensive ones the wiki offers since they heavily depend on the database and are highly dynamic, requiring some processing time in PHP. This is the worst-case scenario since it throws the server into a death spiral.

First, the database starts to lag or even refuse new connections. This, combined with the steadily increasing server load, leads to slower PHP execution. Eventually, all resources in the PHP-FPM pools are used up, and since Apache2 doesn’t get a reply from PHP-FPM in time, it waits until it runs out of free connections. At this point, the website dies. Restarting the stack immediately solves the problem for a couple of minutes at best until the server starves again.

To bring the website back up, I cranked up the configuration of our stack to insane values, risking that the server would eventually run out of memory.

I needed a proper solution, something that takes the load away from the web application stack.
posted by rambling wanderlust at 1:39 PM on May 7 [1 favorite]


Rhaomi: "We've been messaging with frimble, who is the person best equipped to deal with this.

The primary issue now appears to be automated scrapers "getting lost" in the tag pages. This makes a lot of sense:

Frimble is working on a new round of mitigations (new bot blocklists, SQL adjustments to reduce CPU load, Cloudflare configuration); if we're hit again, they'll disable the tag pages entirely until we can increase our level of Cloudflare protection.
"


Thanks Rhaomi.

If it's a quick fix, another option might be to move those tag pages so they're only visible when logged-in.
posted by coriolisdave at 3:58 PM on May 7 [9 favorites]


another option might be to move those tag pages so they're only visible when logged-in

but what if the automatic scraper has a really good storyyyyy
posted by phunniemee at 4:28 PM on May 7 [6 favorites]


« Older MetaFilter site rebuild update: 4/25/2025   |   why is the president of the board allowed to... Newer »

You are not logged in, either login or create an account to post comments