Dan Neff
← All writing

Building Personal AI Infrastructure

AIinfrastructureLLMpersonal-aiPAI

At 6:01 yesterday morning a message landed in a Google Chat space I share with no one. It opened with a callout of VIP emails that needed attention, six newsletter items worth reading, flagged with one-line summaries, and thirty more archived without me having to look at them. It ended with the one calendar conflict between my personal and work schedule I needed to figure out.

I was settling into my Caltrain commute when I read it. I didn't ask for the briefing. I didn't write the prompt that morning. The system did the work because it knew, last week, that this is what I wanted.

That's the difference between an AI chatbot and personal AI infrastructure: a chatbot doesn't move until you ask. but infrastructure runs jobs you set up weeks ago, while you're asleep.

It has a name Aubrey, a phone number that answers as him, a job description, and opinions about my calendar. It runs as eight always-on services on a small Linux box in my garage. It talks to me through Google Chat, phone calls, and the terminal. It also actively monitors a kanban board for changes, so I can add tasks, flag them as active and Aubrey will let me know in chat about how it was resolved or any blockers. The whole stack has a monthly cost of $20 + access to a Claude subscription. Note this isn't a DEDICATED subscription-- it's running off the spare capacity in my regular account.

This post is what it actually takes to build one: the architecture, the decisions, and the cost structure that makes it sustainable.

Why a system, not a chatbot

Most AI tools are stateless. Every conversation starts cold. They don't know which projects you're running, which decisions you made last quarter, or which problem you've been trying to solve since Tuesday. They're powerful and amnesiac at the same time.

For knowledge work this is the wrong shape. The win isn't generating answers faster, it's having a collaborator who remembers what's already been decided, what you're trying to protect, and what counts as done. Persistent context plus real integrations plus the freedom to act on a schedule. It's been a huge productivity unlock and it feels like I've only scratched the surface.

The framework is PAI: Personal AI Infrastructure. but I call mine Aubrey, running on a server call Pequod (I have this whole sailing theme going-- blame The Count of Monte Cristo).

What's actually running

Right now, on this machine, the following services run continuously under systemd:

  • pai-chat keeps a logged-in Claude session connected to my private Google Chat space. Anything I post there reaches Aubrey within twelve seconds. Anything he wants to say to me lands there.
  • pai-fizzy-poller watches my kanban (Fizzy, a small kanban-board-as-a-service I use). When I move a card to "In Progress," it long-polls an SQS queue, picks up the webhook, and runs the task, provided the card has clean acceptance criteria. Autonomous execution. Bounded by guardrails: no destructive operations, no work without acceptance criteria, no acting on its own cards.
  • pai-daily-briefing runs at 5:00 AM PT, summarizes overnight email, marks newsletters, and posts the result to Chat.
  • pai-board-health runs at 6:00 PM and tells me if any task has gone stale, any blocker has been sitting too long, or any card is missing the information it needs to move forward.
  • pai-telos-review runs Sunday evenings and walks through my long-term goals against the week that just happened. It's the rhythm I'd never sustain on my own.
  • pai-newsletter-digest runs Fridays, picks out the substack pieces worth my time, archives the rest.
  • pai-vip-monitor checks every four hours for mail from people whose silence I shouldn't tolerate. Stays quiet when there's nothing to surface. That discipline matters. The alternative is alerting fatigue.
  • pai-transcript-processor ingests voicemails from the Twilio number every fifteen minutes. ElevenLabs handles the voice synthesis. Claude Sonnet handles the reasoning during the call. If Aubrey ends up on the phone, this is how that conversation eventually shows up as a task in Fizzy.

Behind all of those sits a nodejs daemon PAI calls Pulse. It runs voice notifications, the dashboard, the hook system, and the cron-style scheduler. When I wanted a status board I could load on my phone, I added it to Pulse. When I wanted a probe that catches a specific silent-failure mode the Chat channel can fall into, I added that too.

The whole thing is roughly two thousand lines of TypeScript and YAML, plus the Claude Code platform underneath. Most of the surface area is the prompt files, the skill definitions, and the memory layout. The code is small. The doctrine, which lives in markdown the model reads at every session start, is what makes this thing actually work.

The architecture, in one picture

Personal AI Infrastructure architecture: input channels (Google Chat, Twilio, Terminal) feed a central Claude Code session containing Skills, Hooks, Algorithm, and Memory; the session fans out to the Pulse daemon, AWS Lambda + SQS, and Gmail service-account delegation, with pai-fizzy-poller pulling work from SQS for autonomous execution.

Eight systemd services run continuously: pai-chat, pai-fizzy-poller, pai-daily-briefing, pai-board-health, pai-telos-review, pai-newsletter-digest, pai-vip-monitor, pai-transcript-processor.

The architecture, in plain prose

The shape that emerged after a year of iteration:

  1. Identity files describe me. What I do for a living, who's in my family, what I value, how I write. The model loads them at every session start. No re-explaining context.
  2. Skills are composable units of behavior. "Polish this writing against my style." "Audit my Fizzy board." "Run the Algorithm on a hard problem." Each is a small markdown file with a clear trigger condition.
  3. Hooks fire on lifecycle events. Session start, every prompt, after every tool call, before compaction. They load context, capture state, enforce safety rules, and route notifications.
  4. The Algorithm is the doctrine the model follows on substantive work. Articulate the ideal state as testable criteria. Observe, think, plan, build, execute, verify, learn. Prove each criterion was met. It turns "do a thing" into "do a thing and show your work."
  5. Memory is structured. Knowledge notes by domain, work-session artifacts, learning reflections, observability logs. The model retrieves what it needs. I read what's interesting. The compounding is real.
  6. Channels are how the system reaches me when I'm not at the keyboard. Google Chat is the durable text channel. Twilio plus ElevenLabs is the voice channel. Pulse notifications cover the desktop.

When you stack these together you don't have a chatbot anymore. You have something closer to a consultant who never sleeps, who has my voice in their head, and who knows exactly which questions to escalate to me.

What it costs

This is the part most people get wrong about personal AI, so it deserves its own section.

Claude tokens are subsidized. I pay for a Max subscription — $100/mo. Flat monthly fee, unmetered use at the working tier I need. The marginal cost of an additional Algorithm run, skill invocation, or autonomous Fizzy task is effectively zero. Same economic shape as a gym membership. Once you're paying for the door, you may as well walk in. (The "why I won't pay per-token" decision and the $300 lesson that hardened it are in the section below.)

AWS is hobbyist-cheap. A Lambda receives Fizzy webhooks and writes them to an SQS queue. The poller drains the queue. Three-month average from my billing console: under $1/mo typical, well within free tier most months. The infrastructure is real. IAM users, KMS-encrypted secrets, CloudWatch logs, a dedicated SQS queue in us-west-2. It costs essentially nothing because I'm not at scale, and scale is exactly the dimension I don't need to optimize for.

The rest is small line items, current monthly averages:

Service Cost Notes
AWS $10/mo budget including web-hosting and DNS which I'd pay for regardless
Anthropic Claude $20/mo fractional share of a Max subscription I already pay for. Marginal cost of running PAI on top of it is effectively zero
ElevenLabs Creator $22/mo 250 min/mo conversational AI included, $0.10/min overage
Twilio $1-3/mo phone number + per-minute call charges
Domain (neff.cc) $45/yr annual renewal
VPS $0 runs at home
Total stack ~$50/mo

That's one decent Steam game or a weekend lunch date. For a system that drafts my week, polices my kanban, and answers my phone, the math is silly.

When it breaks

Anthropomorphism aside, it's JUST a nodejs service with a handful of integrations. And brother, you learn quickly that not all interfaces are the same. In March the Google chat channel went silent in a way that took me an hour to even see. The systemd service reported active. The bun channel process kept polling Google Chat over HTTPS. From every outside view the integration was healthy. But inside the headless tmux session where the Claude CLI runs, claude.ai had logged me out. The status bar read Not logged in. Run /login. I only knew because I asked Aubrey a question that morning and never got a reply.

Diagnosing the problem was a pocket-sized SRE adventure. It turned out I also had an observability problem. Three layers reported fine and one layer reported not fine, and the one layer was the one that mattered. The standard playbook offered no help. Restart the service: still active. Check the logs: still polling. Verify the credentials file: still valid. None of those signals reach into the tmux pane the model is staring at. I had to attach to the session manually to read the status bar.

The fix lives in Pulse now. A health probe runs every ten minutes, captures the bottom of the tmux pane, and matches the string Not logged in. If it sees that string, it fires a desktop banner and a voice alert. Total code: about thirty lines. The lesson is the part I'd write on a whiteboard. Real production discipline lives in the failure modes you've named, not the ones you've theorized. If you can't write the probe, you don't yet understand the failure mode.

Three decisions I'd defend, for now.

1. Markdown as persistent state — not a database.

Everything the model reads at runtime is plain markdown with light frontmatter. Identity, skills, hooks, memory notes, the Algorithm doctrine. No SQLite. No Postgres. No proprietary store.

The case for a database is obvious: structured queries, indexes, schema enforcement, the ability to retrieve "all knowledge notes tagged X created after Y" in one call. I considered it. What pushed me the other way was debuggability under stress. When the model does something I don't expect, I want to read the exact bytes it read. Markdown plus grep plus git diff gets me there in seconds. A database adds a layer between me and the truth, and that layer fails at exactly the moments I need to trust it. For a personal system where I'm the operator and the developer, the operational simplicity wins. If I were running this for a team, the answer might flip.

2. Autonomous execution is gated by acceptance criteria — not by the model's judgment.

When a Fizzy card moves to In Progress, the poller runs the task autonomously, but only if the card has clean acceptance criteria, no destructive-keyword triggers, and all DEPENDS ON cards are resolved. The model never decides "yes this seems doable, I'll start." The structure decides.

The alternative was letting the model evaluate task suitability. Read the card, judge whether it has enough information, proceed if confident. That approach scales worse than it sounds. Language models are good at producing plausible work. They are less good at recognizing the absence of necessary information. Forcing me to write acceptance criteria upfront pushes the judgment work to where I'm best positioned to do it. The guardrails turn "should I do this autonomously?" from a model-judgment call into a structural check, and that's how I keep the autonomous layer trustworthy.

3. Subscription tokens only — no per-token API billing.

All AI work in PAI runs through my Claude Max subscription. The daemon strips ANTHROPIC_API_KEY from its environment at process startup. Any tool that wants to spend tokens uses the same authenticated session that's already paid for.

The argument for the API key path is flexibility. Certain SDKs, certain MCP servers, certain agent frameworks expect a key. I had to give up a small number of those to enforce this rule. What I got back is predictability. The cost of "Aubrey did one more thing today" is exactly the same as the cost of "Aubrey did nothing today." Both are flat. That changes the math on every automation decision: anything that pays back even a small amount of attention is worth building, because building it doesn't add to next month's bill. I learned this lesson the expensive way. $198 in Opus tokens and $72 in various other models appeared on an February invoice when all the various cron tasks and triggers were chewing through $25+/day when idle.

What I've learned

A few things I'd tell my past self.

The hard part isn't the model. It's the calibration. Teaching the system when to talk and when to stay quiet is more work than wiring the integrations. The "silent when clean" discipline was the single biggest UX win. No notification unless there's actually a signal. It took deliberate work to get there. The default impulse is to alert on everything; that path leads to a system you stop trusting.

Doctrine compounds. The Algorithm is the single thing I've iterated on most. Seven phases for substantive work, with verification doctrine I tightened every time I caught the model cutting a corner. The current version is the cleanest yet. None of this would have been possible without the model itself being able to read and reason about its own doctrine.

Reproduce before you fix. UI bug? Open the browser before reading code. Failing API? curl -i before grepping the handler. Deploy broken? Read the actual deploy log. The reproduce-first rule is doctrine for me now, because every shortcut I've taken around it cost me an hour.

The system rewards specificity. "Help me with email" is a chatbot prompt. "Summarize the last 24 hours of inbox, archive anything that looks like an LinkedIn invitation, flag anything from this list of people, post the result to Chat at 5 AM" is infrastructure. Specificity is the cost of admission.

It's still mine. Every memory file, every skill, every hook is something I wrote or signed off on. The model can suggest changes, but the doctrine, the identity, and the goals are mine. That ownership is non-negotiable. The minute you can't read the file the model is reading, the system stops being yours.

What you'd build first

If you wanted to start, this is the order I'd recommend. None of these steps requires the entire stack above.

  1. Get Claude Code and pay for Max. Tokens at flat cost is the unlock. The cost math doesn't work any other way for the volume of work this enables.
  2. Write a CLAUDE.md. Half a page is enough. Your role, the projects you're running, the tools you prefer, the things you don't want to be asked twice. Claude Code reads it at every session start. You will not believe how much less context-setting you do once this exists.
  3. Add one persistent memory. A MEMORY.md index plus a couple of subject files. Force the model to keep notes the way you'd keep notes. The compounding starts here.
  4. Pick one integration that matters. Mine was Fizzy because the kanban friction was killing me. Yours might be email, calendar, Slack, or your house's heating system. Wire it through Claude Code as a tool. One integration in, you'll see the shape.
  5. Schedule one job. A morning briefing. A weekly review. A "what did I commit to and not deliver this week" check. Cron is fine to start. Pulse is fine when you outgrow cron.

That's the first month. By the second month you'll know what to build next, because you'll know which manual habit cost you the most time.

What I haven't solved

Four things I haven't figured out:

  1. Long-term memory consolidation. The knowledge archive grows. Pruning is manual. Aubrey can find what's there but can't yet decide what to forget. I have a knowledge-harvester script that pulls candidate notes from completed work, but the curation step is still mine.
  2. Repeat-signal judgment. The system flags new emails well. It still surfaces the same calendar conflict three days in a row if I haven't resolved it. Suppressing the recurring signal without dropping new ones is harder than it sounds.
  3. Cross-version state continuity. Claude Code ships new model versions. Some skills behave subtly differently between releases. I don't have an automated regression check yet. The cost of catching a drift is currently "I notice the output feels off."
  4. Multi-machine sync. Pequod runs the daemons. My laptop has its own Claude Code session. They share git-tracked memory but not live state. If I'm on Caltrain and Aubrey notices something on Pequod, I won't see it until I'm back at the keyboard. Working on it.

The point of the system is that I now have specific things to fix instead of vague unease.

Why I'm telling you this

Two reasons.

This is more doable than it looks. I'm not an ML researcher. I'm a generalist engineer with a strong back-end and infrastructure background, and a stubborn streak about owning the systems I depend on. The tooling is good enough now that one person can build something like this in their evenings. The cost is small, and the productivity improvement is real. If you've been on the fence about whether to invest, just do it. Hopefully the above gives you confidence that it's not out of your reach technically.

For folks that don't know me through work, hopefully this is a relatable view into it. I build systems that solve real problems and stay cheap to run. I rely on system knowledge instead of magic, because magic doesn't survive contact with production. The ~$300 monthly invoice is part of the story for the same reason: give thanks for the lessons and scar tissue.

It's running right now, on fifty bucks a month, doing things I'd never get to on my own.


Dan Neff is a Senior Principal Cloud Architect at Adobe with twenty-plus years building back-end and infrastructure systems. He writes from Hollister, California about engineering practice, civic virtue, and what AI looks like when it stops being a chatbot.