Magpie
Magpie is the central cataloguing hub of my self-hosted platform — it receives resource events from every producer service, persists their metadata, and fans them out to downstream consumers over Kafka.
Rather than wiring every service directly to every other, I built Magpie as a single intake point. On top of that routing layer it provides two annotation systems: user-defined Tags (freeform, human-applied labels) and machine-generated Labels (structured key=value pairs with confidence scores). Anything that enters the platform gets catalogued here first.
Magpie is a resource tagging and cataloguing service — the central hub that ties together the other services in my self-hosted platform by receiving resource events, persisting their metadata, and fanning them out to downstream consumers.
Why I Built It
As I added more services (Lynx for links, Owl for books and papers, Weevil for reading data) each one needed to notify the rest of the platform that a new thing existed. Rather than wiring every producer directly to every consumer, I built Magpie as a single intake point: producers publish one event to Magpie, and Magpie re-publishes it on the shared Kafka bus for whatever downstream services care.
On top of that routing layer I added two annotation systems I kept wanting in other tools: user-defined tags and machine-generated labels.
What It Does
- Receives
Resourceobjects from upstream services (UUID, origin service, entity type, source path, name) and persists their metadata in PostgreSQL. - Re-publishes each resource to the Kafka bus so downstream services — vector search, full-text indexers, graph databases — can subscribe without coupling to the producer.
- Forwards resources to grey-seal so they are automatically available in the RAG knowledge base.
- Manages Tags — named, coloured labels that users manually apply to resources.
- Manages Labels — immutable, machine-generated
key=valuepairs with a confidence score, namespace, and source provenance string. - Manages ResourceTag and ResourceLabel join records.
- Exposes all five domains via a ConnectRPC API and a WebAssembly browser UI.
Tech Stack
- Backend: Go, ConnectRPC, PostgreSQL
- Messaging: Kafka (Redpanda in local development)
- Frontend: WebAssembly SPA written in Go using go-app
For a deeper look at how the pieces fit together, see ARCH.md.