โ ShrikeAPI Server (
Worker (
Library Layout (
Architecture
System Context
C4Context
title System Context โ Shrike within joel.holmes.haus
Boundary(platform, "joel.holmes.haus Platform") {
System(ui, "joel.holmes.haus", "Go-app WASM admin SPA โ submits search queries")
System(shrike, "Shrike", "Full-text and semantic search indexing service")
System(lynx, "Lynx", "Web archiving โ publishes TextExtractedEvent after enrichment")
System(greyseal, "Grey Seal", "RAG conversations โ queries Shrike for context retrieval")
System(magpie, "Magpie", "Resource index โ publishes Resource events for indexing")
}
SystemDb(postgres, "PostgreSQL", "Index records and full-text document store")
SystemDb(qdrant, "Qdrant", "Vector store โ cosine similarity search (768-dim)")
SystemExternal(ollama, "Ollama", "Local LLM โ nomic-embed-text embeddings")
SystemQueue(kafka, "Kafka", "TextExtractedEvent ยท EnvelopeEvent ยท ImportRequest ยท magpie.v1.Resource")
Rel(ui, shrike, "ConnectRPC search queries")
Rel(greyseal, shrike, "ConnectRPC hybrid search")
Rel(lynx, kafka, "Publishes TextExtractedEvent")
Rel(magpie, kafka, "Publishes magpie.v1.Resource")
Rel(shrike, postgres, "Reads / writes index records")
Rel(shrike, qdrant, "Vector upsert / search")
Rel(shrike, ollama, "Embed text chunks")
Rel(shrike, kafka, "Consumes 4 topics")
Container Diagram
C4Container
title Shrike โ Internal Containers
Boundary(shrike, "Shrike") {
Container(api, "cmd/api", "Go / ConnectRPC h2c :9000", "IndexRecordService (CRUD) ยท SearchService (semantic + keyword + reindex)")
Container(worker, "cmd/worker", "Go / Kafka", "4 consumers: TextExtracted ยท Envelope ยท Import ยท Resource")
Container(indexSvc, "index_record.Service", "Go", "CRUD for index records")
Container(chunker, "lib/chunker", "Go", "512-word overlapping windows, 64-word overlap")
Container(embedder, "OllamaEmbedder", "Go / HTTP", "nomic-embed-text via Ollama REST API")
Container(vectorStore, "QdrantStore", "Go / HTTP", "Upsert ยท Search ยท DeleteByEntity on shrike_context collection")
ContainerDb(indexRepo, "IndexRecordRepo + DocumentRepo", "PostgreSQL / squirrel", "indexrecords (tsvector GIN) ยท documents tables")
}
SystemDb(postgres, "PostgreSQL", "")
SystemDb(qdrant, "Qdrant", "shrike_context collection, 768-dim cosine")
SystemExternal(ollama, "Ollama", "http://ollama:11434")
SystemDb(minio, "MinIO / S3", "Blob source for ImportConsumer")
SystemQueue(kafka, "Kafka", "")
Rel(api, indexSvc, "delegates CRUD")
Rel(api, chunker, "Reindex path")
Rel(api, embedder, "Embed query")
Rel(api, vectorStore, "Search / DeleteByEntity")
Rel(api, indexRepo, "Keyword search (tsvector)")
Rel(worker, indexSvc, "Upsert index records")
Rel(worker, chunker, "Chunk full text")
Rel(worker, embedder, "Embed chunks")
Rel(worker, vectorStore, "Upsert points")
Rel(worker, minio, "Download blobs (ImportConsumer)")
Rel(worker, kafka, "Consumes 4 topics")
Rel(indexRepo, postgres, "SQL")
Rel(embedder, ollama, "POST /api/embed")
Rel(vectorStore, qdrant, "HTTP REST")
Process Decomposition
Shrike is deployed as two containerised binaries (api and worker) plus an optional browser UI (ui). All three share the same Go module and library code under lib/.
API Server (cmd/api/main.go)
Starts an HTTP/2 server (h2c, port 9000) with:
IndexRecordServiceโ ConnectRPC handler wrappingindexRecordService(CRUD)SearchServiceโ ConnectRPC handler with direct access toIndexRecordRepo,DocumentRepo,OllamaEmbedder, andQdrantStore- Logging middleware and CORS middleware (all origins allowed)
/healthendpoint
Runs database migrations on startup (via goose, embedded SQL).
Worker (cmd/worker/main.go)
Long-running process that starts four Kafka consumers as goroutines and blocks until SIGINT/SIGTERM:
- TextExtractedConsumer (group
shrike-text-indexer) โ deserialisesTextExtractedEvent, upsertsIndexRecord, stores full text inDocumentRepo, and vectorises (chunk โ embed โ Qdrant upsert). - EnvelopeConsumer (group
shrike-envelope) โ deserialisesEntityEnvelope, skips non-deletedevents, deletes the correspondingIndexRecord. - ImportConsumer (group
shrike-import) โ deserialisesImportRequest, downloads text from S3/MinIO viaBlobStore, stores text, upsertsIndexRecord, and vectorises. - ResourceConsumer (group
shrike-magpie-resource) โ deserialises a magpieResourceproto, upsertsIndexRecord, stores minimal text, and vectorises.
The worker calls vectorStore.EnsureCollection at startup to create the Qdrant collection shrike_context if it does not exist.
Library Layout (lib/)
lib/
chunker/ โ text chunking (overlapping word windows)
embedding/ โ Embedder interface + OllamaEmbedder implementation
repo/ โ Postgres: Conn, IndexRecordRepo, DocumentRepo, embedded migrations
schemas/ โ generated protobuf Go code (shrike/v1 entities + services)
shrike/
index_record/ โ IndexRecordService interface + implementation, consumer logic
import_request/ โ ImportConsumer
resource/ โ ResourceConsumer
storage/ โ BlobStore (S3-compatible via gocloud.dev)
ui/ โ go-app WebAssembly UI
vector/ โ Store interface + QdrantStore implementation
Data Flow: Ingest via TextExtractedEvent
graph TD
Upstream["Upstream service"] -->|"TextExtractedEvent"| Kafka[("Kafka")]
Kafka --> Consumer["Worker: TextExtractedConsumer"]
Consumer -->|"1. Upsert fullText"| DocRepo["DocumentRepo\n(documents table)"]
Consumer -->|"2. Create IndexRecord"| IdxRepo["IndexRecordRepo\n(indexrecords table)"]
Consumer -->|"3. Chunk text\n512 words, 64 overlap"| Chunker["lib/chunker"]
Chunker -->|"4. Embed each chunk"| Ollama["OllamaEmbedder\nHTTP โ Ollama"]
Ollama -->|"5. Upsert vector points"| Qdrant[("QdrantStore\nshrike_context")]
DocRepo --> PG[("PostgreSQL")]
IdxRepo --> PG
Data Flow: Search Request
graph TD
Client -->|"SearchService.Search\nquery ยท mode ยท entity_uuids"| API["cmd/api"]
API -->|"semantic path (default)"| Embed["OllamaEmbedder.Embed(query)"]
Embed -->|"vector search + MatchAny filter"| QSearch["QdrantStore.Search"]
QSearch --> SR1["SearchResult{snippet=chunk_text}"]
API -->|"keyword path\nmode=keyword or fallback"| KWSearch["IndexRecordRepo.SearchKeyword\ntsvector GIN"]
KWSearch --> SR2["SearchResult"]
Data Flow: Reindex
graph TD
Client -->|"SearchService.Reindex(uuid)"| API["cmd/api"]
API -->|"1. GetFullText"| DocRepo["DocumentRepo (Postgres)"]
API -->|"2. DeleteByEntity"| Qdrant[("QdrantStore\nfilter by entity_uuid payload")]
DocRepo -->|"fullText"| Chunker["3. chunker.Chunk"]
Chunker -->|"chunks"| Ollama["OllamaEmbedder.Embed"]
Ollama -->|"4. Upsert new points"| Qdrant
External Dependencies
| System | Role | Default address |
|---|---|---|
| Postgres | IndexRecord metadata, full text, tsvector keyword index | db:5432 |
| Qdrant | Vector storage, cosine similarity search | http://qdrant:6333 |
| Ollama | Text embedding (nomic-embed-text, 768 dims) | http://ollama:11434 |
| Kafka / Redpanda | Event bus (four consumer groups) | redpanda-0:9092 |
| S3 / MinIO | Blob storage for imported text files | S3_ENDPOINT env var |
Chunking Strategy
lib/chunker/Chunk(text, size=512, overlap=64) splits text into overlapping word-level windows:
- Step size = 512 โ 64 = 448 words
- Each window is up to 512 words, with 64-word overlap from the previous chunk
- Qdrant point IDs are deterministic:
uuid.NewSHA1(NameSpaceURL, "<entity_uuid>:<chunk_index>")
Qdrant Collection
- Collection name:
shrike_context - Vector size: 768 (matches
nomic-embed-textoutput) - Distance metric: Cosine
- Created/ensured by the worker at startup
Database Migrations
Three goose migrations embedded in lib/repo/migrations/:
00000001_index_record.up.sqlโ createsindexrecordstable00000002_tsvector.up.sqlโ addssearch_vectorgenerated column + GIN index00000003_documents.up.sqlโ createsdocumentstable with FK toindexrecords
Migrations run automatically on API server startup. The worker skips them.