What I Learned Writing an MCP Server

Reading yet another MCP server article, I had an Inigo Montoya moment: "you keep using that word 'server'. I do not think it means what you think it means." In my experience, a server is some separate process being accessed via a socket or some protocol. In some cases, that's exactly what was going on. However, there were enough low-code MCP servers being shared that people should be clamoring to install Docker or getting tripped up on port blocking. Yet no one was complaining. If these really were separate infrastructure servers, Kubernetes would be a topic of discussion and there's no way a low-code designer/PM is happily embracing orchestration. And where was the security kerfuffle?

I realized "MCP Server" probably didn't mean what my infrastructure-biased brain thought it did. Either I needed to ask someone smarter than I was (not hard to do) or roll up my sleeves. Asking someone was harder than I thought. When looking for a custom MCP server, my co-workers largely did one of three things: have their AI harness search for one, vibe-code one, or leverage the EasyMCP framework. All three options meant arm-grabbing someone for coffee probably wouldn't satisfy my curiosity.

What MCP actually is

To start, MCP is a protocol that defines this middleware shim between your AI client and anything else. An MCP server provides the AI a set of functions and the required arguments to call them, then it returns the results. Boom. So simple.

MCP servers come in two transports. stdio runs locally and everything happens on the local system. streamable HTTP makes network calls via HTTP puts, gets, posts and the server has the option of streaming content back. Either you need to go across the network to get things done or you don't.

These servers are a lot like getting handed a restaurant menu. Remember that an LLM is a natural language interface and they're just sitting on the client side looking at what they could order and whether it makes sense based on the chat/prompt. So there's an MCP server who proffers a cocktail menu, you look through the descriptions and say you want an Old-fashioned. The server request gets passed along and at some point you're handed a fistful of bourbon on ice.

There is no formal contract between the menu-side and the kitchen-side.

For example, you could create a vegetarian menu, cook everything in bacon fat and the tears of the small children, and serve it up without irony (or auditing). In this metaphor, I'd definitely complain to the manager. That's an option when using a hosted API or some gateway service. In this case, it's like I'm cooking Thanksgiving dinner at home and the recipe produced turkey sushi or something else.

Why I picked Readwise

I wanted to write a Readwise MCP server as it contains thousands of highlights I've made to books and articles. Also, so many of the integrations I've been working on extend the capabilities of PAI — I wanted to start curating all the media and content I've gathered. The Readwise API itself is a very standard interface, with CRUD semantics for books, highlights, and tags. This first pass is a read-only MCP server so I can't muck things up too badly, but ideally I could turn Readwise into something similar to Ryan Holiday's notecard system so that I could quickly pull a set of quotes on a given topic or quickly pull content for a presentation. Previously, I tried to extract all the content to a local knowledge base, but keeping it updated from Readwise proved difficult and it was physically pinned to whatever laptop or PC I happened to be working on. A permanent cloud-based resource means I can support multiple agents (e.g., a phone-based agent) and get the same response.

Having a simple one-click installation with .mcpb tells me Anthropic is moving away from a generic harness plugin experience. One benefit would be encrypted MCP -> server communication with highly compressed protocols or gRPC-like streaming/sync options. Who knows?

It also says human credential management is where the real friction is. What it means to forgo humans and have a wholly agentic identity for authn/authz will take me down another rabbit hole. That's a question for another post.

Progressive disclosure

Back to menus: I may show up at a diner at 11am, choose between breakfast and lunch, then choose a breakfast plate, and finally look through the side dishes and toast options to round things out.

Reading through the MCP server specification, I chose to implement Progressive Disclosure. The agent exposes a small number of base options and, once the agent figures out what option makes the most sense, a second set of options is provided.

One thing that's attractive about this: Constant versus Linear prompt growth. This is pretty straightforward — given a large number of API options, breaking them down into common areas of focus means I'm not loading the full command list unnecessarily. For a toy example with 20 or so commands, this isn't meaningful but the savings compound as the surface grows. The token cost of the extra round-trip versus the saved schemas is empirical and I have not measured it; so far the benefit is entirely structural.

Bundles composed of multiple MCP servers can expose their progressive catalogs in parallel.

To be fair, progressive discovery would suck for a domain with no formal taxonomy. A diner menu is pretty simple, but at a place like Chipotle, Pokebowl, or Panda Express where you're composing your meal on the fly seeing all the options up front is kind of the point.

The credentials problem

The different scoping for credentials is nagging me. This feels like it should be a "least privileges" design, where the Readwise API token is only available when making an API call. That means adding it to a claude config or .env file is a faux pas — at that point, the LLM could respond to a random prompt by calling the MCP agent whenever it wants. Even if the LLM never sees the raw token, the server holds it in its environment for the entire session. The LLM can invoke that server on any prompt. From an exposure standpoint, that's the same as the LLM having the token directly — it has a one-call alias to it.

How an MCP server is distributed is also up in the air, based on the harness you're using. On the "menu" side there's generally a config file that makes the agent discoverable by the LLM. On the "kitchen" side, you need to share credentials securely. There doesn't seem to be universal agreement on how to do this. I created an .mcpb file, which is a cross-OS installer for the Claude application. What attracted me was leveraging the native OS keychain solutions rather than assuming the secrets were in an environment file. A coworker says he likes to implement a "just-in-time" (JIT) wrapper where credentials aren't available until the MCP server is invoked. The sweet spot seems to be passing your prompt to a credential vendor who will decide whether access is allowed for the action you're taking. However, it makes more sense to pass the entire request/response flow through a proxy with credentials rather than pass back credentials and hope they're not cached or used for a task that wasn't expressed in the request.

It seems like a more sane approach would be to store the credentials in a hosted proxy rather than pass them to the local server at all. I guess I'm pushing for a hosted MCP agent with requisite compute overhead. I could put the requests on a queue and check the queue for work. Is it worth the complexity? SQS + Lambda is pretty cheap, so I'll probably play with this in the next MCP implementation.

What I'd change next time

If I had to author a second MCP server this weekend, there are two changes I'd make: one for security, and one for interface design.

First, keep the key/token out of the LLM process by design. I might add a salted value to the environment and have the script decrypt the key for use, so the LLM secret is keyed to specific scripts. This would solve multiple problems around out-of-band use of a key and ensuring it was only used with the MCP agent.

Second, the Readwise interface is a pure CRUD implementation, but I'm likely to implement a Declarative set of commands (e.g., given a specific theme across highlights and books, which themes are adjacent? Give me a formal citation for a given highlight. What authors cover a given theme the most often?). I'd split "imperative" CRUD functions from "declarative" high-value functions. Given consistent usage and some time to stabilize, I could move the CRUD functions to be internal-only and only allow the LLM access to these bespoke functions.

What bothers me is that frontier LLMs are smart enough to compose new functionality, so when do I hide the building blocks and when do I leave them exposed? Perhaps this is a toggle in the MCP configuration.

What this changes about how I work

Before the MCP server, integrating Readwise into AI workflows meant pulling the highlights down, then running something over them locally. With this server, the corpus stays in Readwise and I can get to the content on demand in just two tool calls. This shifted my work from curating an export to curation in Readwise itself.

Expand the Readwise MCP server enough and it may incrementally become an application, but it's great to have an easily edited middleware tool to start with.

Maintaining the highlights in place, and increasing the value or context where they live, rather than running an extract-transform job after every new book, is a big shift in my thinking. The next domain I'd give this treatment is a geo tracker so I can become more aware of cool events, restaurants, or historic sites just around the corner.