running my own cloud on a clamshell macbook

  • MacBook Air M-series
  • Tailscale
  • Cloudflare Tunnel
  • hono
  • SQLite
  • sharp
  • launchd

this site runs on the same laptop I write on. zero cloud bill. all the architectural decisions you usually outsource to a vendor — mine to make.

the premise

a 13-inch MacBook Air, lid closed, sitting on a shelf. that’s the cloud.

it serves this website, a photo backend with an admin panel, a few scratch services, and whatever I need next week. lives on a domestic ISP. the network knows where to find it because cloudflare’s edge holds a tunnel open back to the laptop.

I didn’t pick this setup because it’s clever. I picked it because I had the laptop, and I wanted to feel the architectural decisions cloud providers normally make for me. you can’t reason about availability properly until you’ve taken responsibility for it. so I did.

what runs on it

three processes, all bound to 127.0.0.1:

  • backend — a hono server on :8787 talking to a SQLite file on the local SSD. serves photo metadata, EXIF, processing logs, and the inspect API the frontend’s “see how this image was made” panel reads.
  • edge gateway — another hono server on :8090. routes by Host header. on the public hostname it serves the static frontend bundle and proxies read-only API calls. on the admin hostname it serves the admin SPA and proxies mutations after injecting the bearer token.
  • cloudflared — a managed tunnel daemon. holds four outbound TLS connections to cloudflare’s edge. cloudflare’s anycast network advertises my domains, and any request matching one of them gets pushed down a tunnel to :8090.

the bearer token never leaves the laptop. the database never accepts a non-loopback connection. the whole “perimeter” is one cloudflare account and a tunnel daemon.

the network architecture

two surfaces, two posture profiles.

public surfaceraditss.work. read-only. unauthenticated. anything that lands here comes from cloudflare’s edge, hits my edge gateway, and either gets a static asset or a GET /v1/... proxied to the backend. mutations on this hostname are simply not routed; the gateway has no codepath to map a public POST to an admin endpoint.

admin surfaceadm.raditss.work. gated by cloudflare access with google sso, restricted to my email. before any HTTP I write touches my own infrastructure, cloudflare has already verified an oauth session with google. the admin SPA never holds the bearer token; it makes relative /api/* calls and the gateway injects auth on the local hop.

private surface — tailscale. when I want to check logs, restart a service, or ssh into the host, I do it over the tailscale mesh. nothing private is ever exposed to the public internet, even gated. the philosophy is that public endpoints answer one question (read-only “show me this post”), private endpoints answer everything else (write, debug, deploy).

why a laptop, specifically

three reasons, in order of how much they actually matter.

it’s already there. a laptop I bought to write code on is, hardware-wise, a perfectly capable web server. ARM is fast. the SSD is fast. 16 GB of RAM is plenty for what runs on it. paying a monthly bill to rent equivalent compute would be paying for a thing I already own.

the constraint is honest. cloud providers sell you the fiction that scaling is a slider. for personal-scale work it isn’t — you don’t need 12 regions, you need one box that doesn’t fall over. running on a single machine forces every architectural choice to be about what actually matters: keep the database small, keep the hot path fast, treat memory as a real resource. those are the same lessons that scale up.

failure is observable. when the cloud breaks for me, I read someone’s status page. when this breaks, I look at the laptop. that’s a feedback loop you can’t buy.

what’s not nice

the honest list. anything I tell you is “fine in practice” actually broke at least once before I figured out the workaround.

  • wifi flakes. ISP outages are real. cloudflared reconnects within seconds, but during the gap the site is down. I’m okay with that posture for a personal site; I would not be okay with it for a customer-facing product.
  • macOS updates. apple loves to schedule a midnight reboot. launchd gets the services back up afterwards, but I’ve spent a morning debugging why “everything’s fine” turned into “everything is 502.” now I disable automatic updates on this machine and apply them on a schedule I control.
  • launchd’s privacy sandbox. the first attempt at running services under launchd died with operation not permitted because launchd-spawned bash can’t traverse ~/Documents without explicit Full Disk Access. obvious in retrospect, opaque the first time.
  • better-sqlite3 ABI rebuilds. node version bumps are not ABI-stable for native modules. switching from node 20 to 22 meant rebuilding the binding before anything would start. boring, repetitive, and a footgun if you forget.
  • the fan at 2am. image processing uses sharp, which uses libvips, which is happy to use every core. I had to cap concurrency in the develop pipeline so the laptop wouldn’t audibly think at me through the wall.

every one of these has a cloud-shaped solution. paying for it would have meant never learning what the actual problem was.

what I learned

a few things, more useful than I expected.

posture matters more than firewalls. binding to loopback, choosing what each hostname is allowed to do, and putting authentication at the network edge (cloudflare access) gives you a security model you can reason about end-to-end. you don’t need a 50-line firewall config when the surface area is “one tunnel, three hostnames, two methods.”

single-machine forces shared-nothing thinking. on a real distributed system you have to reason about consistency between replicas. on this one I have to reason about consistency between processes — backend, gateway, and tunnel are independently restartable, and the system has to behave when any one of them blinks. that’s the same architectural muscle, just with a smaller blast radius for getting it wrong.

the cloud abstracts away the interesting parts. the part of “running a server” I find most useful to know — what fails, when, why, in what order — is the part rented compute hides from you. once you’ve watched your own laptop come back from a power loss and replay its launchd state, you have a different relationship with the word “uptime.”

my QA brain helps here. I spent years writing tests for systems I didn’t operate. running my own infrastructure is the inverse: operating a system whose tests I also wrote. the failure modes I care about now are concrete, not abstract — “what happens if cloudflared restarts mid-request” instead of “what happens at scale.” it makes me a better architect because the tradeoffs are no longer hypothetical.

why this isn’t the cloud

cloud is “rent someone else’s computer with a good SLA.” this is “use the computer I already own with no SLA and full observability.”

for a project I’m responsible for, where downtime costs money or reputation, I’d absolutely use a real provider. for a personal site, where downtime costs me a few minutes of “huh, my site’s down again,” the tradeoff inverts. I get a $0/mo bill, real architectural decisions to write about, and direct control over every layer from the kernel up.

it’s the most useful infrastructure I’ve ever owned, and it cost nothing extra to build.