🦭

Grey Seal

GoOllamaPostgreSQLRAG

Grey Seal is my self-hosted Retrieval-Augmented Generation chat backend — a service for having grounded conversations with a knowledge base I actually own and control.

I wanted to ask questions across the documents, links, and notes in my platform and get answers that cite real sources. Rather than sending data to a commercial RAG product, Grey Seal runs entirely on my own infrastructure: the LLM runs locally via Ollama, vector search is handled by Shrike, and nothing leaves the network. Conversations can be scoped to named resource sets and assigned system-prompt roles for specialised behaviour.

Grey Seal is a Retrieval-Augmented Generation (RAG) chat backend — a service for having grounded conversations with a knowledge base I actually own and control.

Why I Built It

I wanted to be able to ask questions across the documents, links, and notes stored in my self-hosted platform and get answers that cite real sources rather than hallucinated ones. Commercial RAG products exist, but they require sending your data to a third party. Grey Seal runs entirely on my own infrastructure: the LLM runs locally via Ollama, vector search is handled by Shrike, and the data never leaves the network.

It also gave me a concrete project to learn how RAG pipelines work in practice — chunking, embedding, retrieval, and prompt construction — beyond toy examples.

What It Does

Manages conversations — persistent chat sessions with a full message history.
Manages roles — named system prompts that can be assigned to a conversation to specialise its behaviour.
Manages resources — documents scoped to a conversation that constrain retrieval to a relevant subset of the knowledge base.
Answers user queries by retrieving semantically relevant chunks from Shrike and injecting them into the LLM prompt context.
Streams responses back to the client via a Connect-RPC server-streaming Chat RPC.
Records per-message feedback (−1 / 0 / 1) for quality tracking.
Provides a CLI (ingest) for submitting URLs or raw text to the knowledge base.

Tech Stack

Backend: Go, ConnectRPC, PostgreSQL
LLM inference: Ollama (deepseek-r1 by default)
Vector search: Shrike (which in turn uses Qdrant + Ollama embeddings)
Messaging: Kafka (Redpanda in local development)

For a deeper look at how the pieces fit together, see ARCH.md.

Grey Seal

Why I Built It

What It Does

Tech Stack

Documentation