🗂️

Magpie

GoKafkaPostgreSQL

Magpie is the central cataloguing hub of my self-hosted platform — it receives resource events from every producer service, persists their metadata, and fans them out to downstream consumers over Kafka.

Rather than wiring every service directly to every other, I built Magpie as a single intake point. On top of that routing layer it provides two annotation systems: user-defined Tags (freeform, human-applied labels) and machine-generated Labels (structured key=value pairs with confidence scores). Anything that enters the platform gets catalogued here first.

Magpie is a resource tagging and cataloguing service — the central hub that ties together the other services in my self-hosted platform by receiving resource events, persisting their metadata, and fanning them out to downstream consumers.

Why I Built It

As I added more services (Lynx for links, Owl for books and papers, Weevil for reading data) each one needed to notify the rest of the platform that a new thing existed. Rather than wiring every producer directly to every consumer, I built Magpie as a single intake point: producers publish one event to Magpie, and Magpie re-publishes it on the shared Kafka bus for whatever downstream services care.

On top of that routing layer I added two annotation systems I kept wanting in other tools: user-defined tags and machine-generated labels.

What It Does

  • Receives Resource objects from upstream services (UUID, origin service, entity type, source path, name) and persists their metadata in PostgreSQL.
  • Re-publishes each resource to the Kafka bus so downstream services — vector search, full-text indexers, graph databases — can subscribe without coupling to the producer.
  • Forwards resources to grey-seal so they are automatically available in the RAG knowledge base.
  • Manages Tags — named, coloured labels that users manually apply to resources.
  • Manages Labels — immutable, machine-generated key=value pairs with a confidence score, namespace, and source provenance string.
  • Manages ResourceTag and ResourceLabel join records.
  • Exposes all five domains via a ConnectRPC API and a WebAssembly browser UI.

Tech Stack

  • Backend: Go, ConnectRPC, PostgreSQL
  • Messaging: Kafka (Redpanda in local development)
  • Frontend: WebAssembly SPA written in Go using go-app

For a deeper look at how the pieces fit together, see ARCH.md.

Documentation