March 5, 2026

Your LLM is not a load-bearing wall

The extraction pipeline shipped a week ago with a hand-rolled OpenAI-compatible client. Naturally, it only worked with Ollama. It might sound weird, but that was intentional: local-first, no requirement to send the details of your house to our AI overlords.

After a couple weeks of using various models locally, I learned (perhaps the hard way), that local models just cannot hold a candle to the frontier models offered by the large AI companies.

To that end, I wanted two things:

Support for frontier models, like those of OpenAI and Anthropic.
To avoid conversations like “Yeah, but do you support $MY_FAVORITE_API”.

Enter any-llm-go.

Someone else’s problem#

any-llm-go is a Go library from Mozilla that wraps the official SDKs for Ollama, OpenAI, Anthropic, Gemini, Groq, Mistral, DeepSeek, OpenRouter, llama.cpp, and Llamafile behind one interface. I deleted my client and replaced it with about forty lines of setup (#566). The binary grew from 27 MB to 47 MB from linking all the provider SDKs, which is fine because the alternative was me writing auth code for ten APIs.

Forty-seven megabytes is absolutely enormous for a terminal application, but who in their right mind wants to deal with ten REST APIs that are all doing basically the same thing?

Apparently the answer to that question is: the people doing the lord’s work over at Mozilla.

Back to the tech.

The provider details are configured in your micasa config:

[llm]
provider = "anthropic"
model = "claude-sonnet-4-5-latest"
api_key = "sk-..."

Local Ollama still works with zero config. Nothing changed for the default setup.

Two pipelines, two models#

Extraction reads invoices and proposes database fields. Chat answers natural-language questions about your data. These want different things – extraction wants a small model that’s fast and precise with JSON, chat wants something that can actually reason about whether your roof maintenance is overdue.

They used to share a model. Now they don’t (#575):

[llm]
provider = "ollama"
model = "qwen3"

[llm.extraction]
provider = "anthropic"
model = "claude-haiku-4-5-latest"
api_key = "sk-..."

Chat runs locally, extraction runs on Anthropic. Or both local. Or both cloud. [llm] is the default; [llm.chat] and [llm.extraction] override whatever you want per-pipeline.

Picking models at runtime#

r on a completed extraction step opens a fuzzy model picker instead of immediately rerunning (#560). Type to filter, arrows to navigate, enter to select. If the model isn’t local, it pulls first.

This matters because extraction is trial-and-error. A clean PDF with selectable text and a 3B model works fine. A photo of a receipt from a parking lot that you took while drunk might need something bigger. Switching without leaving the overlay keeps the loop tight.

Extraction in the background#

OCR on a multi-page scan takes a while. ctrl+b now backgrounds a running extraction (#559). The status bar shows a spinner while jobs run and a count when they finish. ctrl+b again foregrounds the latest result for review. Nothing auto-accepts – you always look before it writes.

Other things since last week#

Locale-aware currency – EUR gets comma decimals and period grouping (1.234,56), GBP gets the pound sign, JPY drops decimal places. Auto-detected from your system locale or set via MICASA_CURRENCY. (#467)
Imperial/metric toggle – U switches between square feet and square meters. Defaults to metric unless your locale is US, Liberia, or Myanmar. (#555)
Resolved incidents – resolving an incident now sets a proper resolved status. D permanently deletes resolved incidents with confirmation. (#588)
config --dump – prints fully resolved config as annotated TOML with env var hints. API keys are omitted to prevent us from doing dumb things like pasting secrets into an AI. (#597)
Extraction timeout – configurable LLM timeout (default 2 minutes) so a hung model doesn’t lock the overlay. (#604)

Under the hood: the 56-field Model God-struct got demoted to demi-god status during some code surgery with generics in the data layer (-248 lines of code, +339 lines of tests), and eight new static analysis tools run in pre-commit. Static analysis seems to keep the bot army away from some of its dumber tendencies, so let’s load up on it.

Try it#

go run github.com/cpcloud/micasa/cmd/micasa@latest --demo

Or with a cloud provider:

export MICASA_LLM_PROVIDER=anthropic
export MICASA_LLM_MODEL=claude-haiku-4-5-latest
export MICASA_LLM_API_KEY=sk-...
go run github.com/cpcloud/micasa/cmd/micasa@latest --demo

Binaries on the releases page.