Journal · Yugal Kithany

Replacing the Pipeline with Native Hooks next step

Both stages of the pipeline work independently now. Stage 1 converts PDFs into structured JSON. Stage 2 takes that JSON and inserts it into Grist. The problem is the handoff between them is still manual—a file lands in receipts/output/ and nothing happens until someone invokes Claude Code to process it.

The goal isn't to add new capabilities. Every individual component already works. What remains is removing that manual handoff and letting Claude Code coordinate the full workflow on its own—from receipt arrival to database insertion.

The plan is to replace the Python watcher with Claude Code's native hook system. Hook #1 watches receipts/raw/: when a PDF appears, markitdown MCP converts it to text, Claude Code processes the result, and output lands in receipts/output/. Whether that output is structured JSON or raw receipt text is still an open question—Hook #1 hasn't been tested yet, so keeping it flexible makes sense.

Hook #2 watches receipts/output/: when a new file appears, Claude Code loads the receipt-pipeline and grist-automation skills, applies validation rules, and calls the MCP tools to write to Grist. The two on_*.py hook files replace run_pipeline.py. Everything else stays the same.

The interesting thing is how much the project has shifted: it started as a receipt parser, which already exists. What I'm building now is the underlying orchestration—wiring tools, skills, and MCP servers into an (non?)interactive workflow.

Claude Code Becomes the Orchestrator after API key

Getting API access changed the architecture. Instead of trying to make one model handle everything, I split responsibilities: Qwen stayed on extraction, and Claude Code took over orchestration. The realization from the extraction work held—different problems need different tools. Qwen is fast and cheap at pulling structured fields from semi-structured text. Claude is better at reasoning about schema integrity, handling edge cases, and deciding which MCP tools to call in what order.

The key insight was about skills: skills aren't code, they're documentation. The grist-automation skill is structured markdown describing the schema, available tools, and expected workflows. Claude Code reads it before acting. Adding a new validation rule means editing a document, not deploying code.

The first successful session looked like this. A Costco receipt pasted directly in—Claude Code invokes add_receipt() and add_line_item() through the MCP server and populates finances.grist automatically:

claude — grist-automation

❯ /tools ✓ grist-automation connected ✓ markitdown connected ❯ [paste Costco receipt — 04/09/2026] Processing receipt from Costco... ⏺ grist - add_receipt (MCP) {Trip_ID: 15, Date: 2026-04-09, Store: "Costco Austin #681", Card: "XXXX5727", Tax: 0.82, Manual_Total: 48.06, Order_Number: "6818250111"} ✓ Response: { "success": true, "receipt_row_id": 15, "products_created": 3, "line_items_created": 5 } Done. Receipt #15 added — Costco Austin #681 — $48.06 — 5 items

Worth noting: this still required manually pasting the receipt and invoking Claude Code. Stage 2 works—it just isn't automatic yet.

Conversational queries over the database work the same way. Asking "how much have I spent on eggs and basil pesto?" produces a query_sql MCP call, a result set, and an analysis inline:

claude — grist-automation

❯ How much have I spent on eggs and basil pesto? ⏺ grist - query_sql (MCP) SELECT r.Date, p.Product, li.Price, li.Quantity, r.Store FROM Line_items li JOIN Products p ON li.Product_ID = p.id JOIN Receipts r ON li.Receipt_ID = r.id WHERE p.Product LIKE '%EGG%' OR p.Product LIKE '%PESTO%' ORDER BY r.Date ✓ 6 rows 2025-12-04 Organics Grade A Cage Free Eggs $7.53 HEB 2025-12-14 Basil Pesto $9.99 Costco 2025-12-14 Organic Large Eggs $7.69 Costco 2025-12-31 Signature Eggs $4.30 Costco 2026-04-09 KS CAGE FREE $3.79 Costco Austin #681 2026-04-09 Basil Pesto $9.99 Costco Austin #681 Basil Pesto: stable at $9.99 across 2 purchases Eggs: $7.53 → $3.79 — roughly 50% drop over 4 months

This is exactly what the database schema was designed to answer. The schema came first; these queries are why.

Building the Local Extraction Pipeline before API key

With the database structure settled, I shifted to the ingestion side. The first working version was intentionally simple: a Python watchdog on receipts/raw/ that woke up whenever a PDF appeared. Markitdown converted the PDF to markdown, which was then sent to a locally hosted Qwen model running through LM Studio at localhost:8080. The model returned structured JSON—date, store, line items, totals, card info. The JSON went to receipts/output/; the original PDF was archived. Everything ran locally. No cloud APIs, no external services.

The prompt was kept tight: extract these fields, return only valid JSON, temperature 0.3 for determinism, and 500 tokens max. Another model I tried was google_gemma-3-12b-it-qat. Although it successfully loaded on my 16GB M2 Mac mini with an 8,192 context window, Claude Code required a 32k context window to pass its initial codebase payload. When I attempted to configure the 12B model with 32k context, the system crashed under the extreme memory pressure. Ultimately, Qwen 7B at those tight settings gave the best balance between speed and extraction quality. Larger models either increased latency or completely overwhelmed the hardware without improving the structured output noticeably.

The first inference took 1m 41s. LM Studio showed the progress crawl—5%, 10%, slowly climbing—because the model was loading into memory before doing anything. That's just Qwen 7B cold-starting on Mac hardware; subsequent calls were faster once the model stayed warm in VRAM. It wasn't a connection problem. That's what local inference costs on first run.

That experience clarified something important: the local model was excellent at extraction but wasn't a good fit for orchestration. Extraction is a structured parsing task—given text, produce JSON. Orchestration involves deciding which tools to call, validating schema integrity, and recovering from errors. Trying to use the same model for both was the wrong frame. They needed different tools.

Designing the Database First before everything

Before touching any LLMs or automation, I started with the destination: the database. The goal wasn't simply to parse receipts—it was to build something I could eventually query conversationally through Claude Code, OpenCode, or any future MCP-compatible client.

I settled on a three-table relational structure in Grist. The separation felt like extra work at first, but it's what makes the interesting questions answerable: how much have I spent on eggs this year? Which store has the cheapest paneer? How has the price of basil pesto changed over time?

Click table names to open schemas

Designing the schema before writing any extraction code turned out to be the most useful decision in the project. Once the relationships were clear, everything else became an ingestion problem rather than a data-modeling problem.

Around the same time I started experimenting with MCP servers and Claude Code skills. The long-term vision was conversational access to receipt history—an assistant that understands the schema, knows which tools are available, and can answer questions about spending without me writing SQL by hand. The grist-automation MCP server was the foundation for that. Even before the extraction pipeline existed, I already knew the database would be accessed through MCP tools rather than a traditional application interface.