An Nginx Regression, a Node Version Drift, and a Very Quiet Tuesday Morning


Both Sites Down, No Obvious Reason

On June 9, 2026, after a routine reboot of the Linode VPS that hosts my two Ghost CMS publications, both sites became intermittently unreachable. The front-end would occasionally load but without consistent behavior. The Ghost admin panel — the /ghost/ dashboard where you write and manage content — loaded to a blank white page with nothing but a "Ghost" tab title and an empty canvas. SSH sessions to the server were dropping after one or two minutes of idle time.

Nothing had been changed. No config edits, no Ghost updates, no manual package installations. The server had simply been rebooted, and when it came back, the sites didn't.

The natural instinct — and I had it — was to restore from backup. A week earlier, these same sites had been through a genuine security incident (a separate story, involving a Ghost CMS vulnerability and a full stack rebuild). Everything was freshly patched, upgraded, and verified. The sites had been running perfectly for days. Now they weren't, and the temptation to roll the clock back was strong.

That would have been exactly the wrong move. The problem had nothing to do with my sites, my configuration, or my data. It was a broken package update that Ubuntu's own security pipeline had delivered silently, and the fix was a single apt upgrade command. But it took two hours of methodical diagnosis to get there, because the symptoms pointed in five different directions at once, and none of them pointed at the actual cause until the very end.

The Symptoms: A Misleading Constellation

The trouble with this failure is that it presented as several apparently unrelated problems simultaneously, each of which had its own plausible explanation that turned out to be wrong.

The first symptom was the Ghost admin panel loading to a blank white page. The browser's developer console showed four JavaScript files failing to load — the core bundles that make up Ghost's single-page admin application. Without them, the admin shell renders its HTML wrapper and then has nothing to execute, producing the blank page. The front-end theme, which uses completely separate assets, was unaffected by this specific failure — posts and pages rendered (when they rendered at all) with full styling.

The second symptom was intermittent site unreachability. Sometimes a page would load; sometimes the browser would report "This site can't be reached" or ERR_HTTP2_PROTOCOL_ERROR. The inconsistency made it look like a network or DNS issue rather than a server-side crash.

The third symptom was SSH sessions dying after brief idle periods. Established connections would drop with no warning after a minute or two of inactivity. This reinforced the "something is wrong with the network" hypothesis.

The natural diagnostic sequence — check DNS, check the firewall, check Ghost's configuration, check the database — produced confusing results. DNS resolved correctly. The firewall rules were unchanged. Ghost's own ghost doctor utility passed every check. The database was healthy. And curl from the server to its own sites sometimes returned clean 200 responses, which made the intermittent browser failures look like a client-side caching problem.

I spent time chasing a DNS theory (the server's resolver was pointed at Linode's nameservers, which I temporarily swapped to Cloudflare and Google public DNS), a browser-cache theory (switching to incognito windows, hard-refreshing), and a Ghost configuration theory (checking the url setting, the active theme, the database connection). None of these was the cause, and none of the fixes stuck, because the real problem was underneath all of them.

The Actual Cause: Nginx Workers Crashing on Every Request

The breakthrough came from reading nginx's error log rather than Ghost's:

worker process 4280 exited on signal 11 (core dumped)
worker process 4298 exited on signal 11 (core dumped)
worker process 4303 exited on signal 11 (core dumped)

Signal 11 is SIGSEGV — a segmentation fault. The nginx master process was alive and listening on ports 80 and 443 (which is why systemctl status nginx reported "active (running)" and why the listening-socket check looked healthy), but every worker process it spawned was crashing immediately on receiving a request. The master would respawn a new worker, that worker would crash on the next request, and the cycle would repeat every two seconds. The log was a wall of identical crash lines, each with a new PID and the same signal 11.

This is why the symptoms were so inconsistent. A curl that happened to land in the brief window between a worker spawning and crashing might get a response. A browser holding an HTTP/2 connection open would see the worker die mid-stream and report a protocol error. An SSH session (which doesn't go through nginx) would work fine until some other idle-timeout mechanism killed it — a red herring that made it look related when it wasn't.

The critical question was: why were the workers segfaulting? Nginx 1.24.0 on Ubuntu is stable, well-tested software. A segfault in a worker is not a configuration error — it's a code-level bug, either in nginx itself or in a dynamically-loaded module.

The Root Cause: A Buggy Security Patch

Checking the apt package history revealed the answer:

Upgrade: nginx:amd64 (1.24.0-2ubuntu7.9, 1.24.0-2ubuntu7.10),
nginx-extras:amd64 (1.24.0-2ubuntu7.9, 1.24.0-2ubuntu7.10), ...

Ubuntu's unattended-upgrades system had automatically installed nginx version 1.24.0-2ubuntu7.10 — a security patch — sometime before the reboot. This .10 build contained a regression that caused worker processes to segfault under certain conditions. The symptoms would have appeared the moment the server restarted nginx with the new binary, which happened automatically on reboot.

The fix was already in the queue. Running apt list --upgradable showed that version .11 of every nginx package was available — a point-fix for the regression:

nginx/noble-security 1.24.0-2ubuntu7.11 amd64 [upgradable from: 1.24.0-2ubuntu7.10]

A single command resolved the crash:

sudo apt upgrade -y nginx nginx-common nginx-extras libnginx-mod-*
sudo systemctl restart nginx

After the upgrade, the error log stopped showing signal-11 crashes, and both sites immediately began serving correctly. The maintenance pages (which had been configured as a fallback) returned proper HTTP 503 responses, and swapping back to the real Ghost vhosts produced clean 200s across the board.

The total time from the .11 package being available in the repository to the fix being applied was however long it took to type the command. The total time spent diagnosing before reaching that command was approximately two hours.

The Second Problem: Ghost Admin's Blank Page

With nginx stable and serving, the front-end sites loaded correctly — but the Ghost admin panel still showed a blank white page. The browser console showed the same four JavaScript bundles failing to load, now returning HTTP 404 instead of silently dying in a protocol error.

This turned out to be a separate, also-automated issue. Ghost's ghost doctor utility flagged it:

Warning: Ghost is running with node v22.22.2. Your current node version is v22.22.3.

Ubuntu's unattended-upgrades had also bumped Node.js from 22.22.2 to 22.22.3 — a minor patch update. Ghost's systemd service had been launched at boot time using the .2 binary, but the admin assets on disk had been built (or expected to be served) against the .3 environment. The JavaScript bundle filenames in Ghost Admin include content hashes — index-DbIP2g_C.js, vendor-df57a0...js — and when the Node version under which Ghost runs doesn't match the version against which the assets were built, the hashes can diverge. Ghost's HTML requests files with one set of hashes; the files on disk have different hashes; the result is 404s on every admin asset, and a blank page.

The fix was to realign the running Ghost process with the current Node version and rebuild the admin assets:

ghost update --force

This reinstalled Ghost's current version against the running Node, recompiled the admin bundles with correct hashes, and after a systemctl restart, the admin panel loaded normally.

It is worth noting explicitly: ghost update --force and the subsequent restart both failed when run through Ghost's own CLI, which reported "Systemd process manager has not been set up or is corrupted." This is a known ghost-cli quirk where the CLI expects to manage the systemd unit itself but can't due to permission constraints on the invoking user. The workaround is to drive systemd directly — sudo systemctl restart ghost_sitename — bypassing ghost-cli's service management entirely. The CLI's error message is alarming but cosmetically misleading; systemd itself is fine.

The Compounding Effect

What made this incident frustrating — and what makes it worth writing about rather than just fixing and forgetting — is the way two unrelated automated updates compounded into a single bewildering outage.

The nginx segfault made everything look broken at the network level. Because the workers were crash-looping, responses were intermittent, error messages varied by timing, and the instinct was to look at DNS, firewalls, and proxy configuration. The Ghost admin blank-page, which was caused by an entirely different mechanism (Node version drift), was invisible behind the nginx crash — you can't diagnose a 404 on a JavaScript bundle when the web server itself is dying before it can serve the 404. And the SSH idle-drops, which turned out to be unrelated to both (likely a NAT idle-timeout in the network path, fixable with client-side keepalives), added a third symptom that seemed to confirm "something is deeply wrong with this server" when in fact the server was fine and two packages were stale.

Each problem had a simple, fast fix. The nginx crash: one apt upgrade. The admin blank page: one ghost update --force and a service restart. The SSH drops: a ServerAliveInterval line in the client's SSH config. Total fix time, once each cause was identified: under five minutes. Total diagnostic time to get there: over two hours, most of it spent on theories that the evidence didn't ultimately support.

The lesson is not that the diagnosis was done poorly. The lesson is that automated security updates — which are, on balance, the right default for any internet-facing server — can introduce regressions, and when two regressions land simultaneously and interact, the symptom space becomes combinatorially misleading. You end up chasing a phantom "root cause" that unifies all the symptoms, when in reality there are two (or three) independent causes, each trivial, none obvious while the others are active.

Preventive Advice for Self-Hosters

If you run a self-hosted Ghost site behind nginx on Ubuntu with unattended-upgrades enabled — which is a common and broadly sensible configuration — here is what this experience suggests you should add to your operational practice.

Check for failed services after every reboot. A single command — systemctl --failed — shows any service that didn't come back cleanly. If nginx or Ghost appears in that list, you know immediately that the outage is service-level, not network-level, and you skip the DNS and firewall rabbit holes. Make this the first thing you check, before opening a browser.

Read the nginx error log before the Ghost log. When a site is unreachable or behaving erratically, tail /var/log/nginx/error.log is faster and more informative than ghost log. Ghost can only tell you about its own application errors; nginx tells you whether requests are even reaching Ghost. A wall of signal 11 lines is unambiguous in a way that "site won't load" in a browser never is.

Check for and apply pending updates immediately after a reboot. Run apt list --upgradable and look for nginx, Node, or any package your stack depends on. A buggy .10 with a fixed .11 already in the queue is a solved problem sitting in your package manager waiting to be applied. The time between "the fix exists in the repo" and "the fix is installed on my server" is the window where you suffer a known, already-resolved issue.

Pin or monitor your Node version relative to Ghost. Ghost is sensitive to the exact Node minor version it was started under. If unattended-upgrades bumps Node (e.g., 22.22.2 to 22.22.3), the running Ghost process doesn't pick up the change until it's restarted — and when it does restart (on reboot), the version mismatch can break admin asset serving. The mitigation is either to pin Node's minor version so it doesn't drift without your knowledge, or to restart Ghost after any Node update, or simply to be aware that a post-reboot Ghost admin failure is almost certainly a Node-version misalignment and ghost update --force plus a service restart is the fix.

Add SSH client keepalives. In ~/.ssh/config, set ServerAliveInterval 30 and ServerAliveCountMax 4 for any host you manage. This prevents idle SSH sessions from being reaped by intermediate NAT devices, firewalls, or VPN tunnels, and it removes "SSH keeps dying" from the symptom list of future incidents — which is valuable because SSH instability is a red herring that makes every other problem look more serious than it is.

Resist the urge to restore from backup when the site was working yesterday. If a site was healthy before a reboot and broken after, the cause is almost certainly something that changed during or after the reboot — a package update, a service that didn't restart, a mount that didn't come up. A backup restore is a sledgehammer that rolls back everything, including things that were working, to fix what is often a one-line issue. Diagnose first; restore only when the data itself is compromised, not when a service is misbehaving.

Conclusion

Both of today's failures — the nginx worker segfaults and the Ghost admin asset mismatch — were caused by automated security updates doing exactly what they are configured to do: applying patches without human intervention. In both cases, the patches introduced regressions (one in the nginx binary, one in the Node runtime alignment), and in both cases, the fix was already available or trivially achievable. The server, the data, the configuration, and the application were all healthy throughout. The only things that were broken were two packages, each off by a single point-release.

The two hours of diagnosis before reaching that conclusion were not wasted — they systematically eliminated the serious possibilities (compromise, data corruption, configuration drift) and narrowed the field to the mundane truth. But they were two hours that a ten-second check (apt list --upgradable | grep nginx) would have made unnecessary. The unglamorous takeaway: after a reboot, check your packages and your services before you check anything else. The most likely cause of a post-reboot failure is the most recent change, and on an unattended-upgrades system, the most recent change is whatever your package manager did while you weren't looking.


Jonathan Brown | Border Cyber Group bordercybergroup.com | Support independent security reporting

If you find our work helpful... Buy us a coffee!: https://bordercybergroup.com/#/portal/support