Owl
Owl is my personal digital library — a self-hosted service for storing and managing PDF/EPUB books and academic papers without depending on a third-party platform.
I wanted a single place to keep my reading material that I actually control. For academic papers especially, the existing options are either siloed into specific platforms or require tedious manual bookkeeping. Owl runs on my own infrastructure, stores files in S3-compatible object storage, and keeps metadata in a PostgreSQL database I own. Paper ingestion is handled by a Temporal workflow that resolves URLs, downloads files, deduplicates by checksum and DOI, and persists everything durably in the background.
Owl is a personal digital library service — a place to store and manage PDF/EPUB books and academic papers without depending on a third-party platform.
Why I Built It
I wanted a single place to keep my reading material that I actually control. Commercial alternatives for books are fine, but for academic papers the existing options are either siloed into specific platforms or require manual bookkeeping across folders and browser tabs. Owl runs on my own infrastructure and stores everything in a database and object storage bucket I own.
It also gave me a concrete project to explore Temporal for durable workflow orchestration — paper ingestion is exactly the kind of multi-step, failure-prone pipeline that benefits from a proper workflow engine.
What It Does
At its core Owl is a catalogue for two types of content:
- Books — PDF/EPUB files with metadata: title, authors, publisher, ISBN, format, page count, language, tags, and notes. Files are stored in S3-compatible object storage (MinIO); metadata lives in PostgreSQL.
- Papers — Academic documents (arXiv, DOI, direct URL) with rich metadata: abstract, DOI, arXiv ID, publication venue, source URL. Submission returns immediately with a workflow ID; a Temporal pipeline resolves the URL, downloads the file, deduplicates by checksum and DOI, uploads to object storage, and persists the record.
Both domains are accessible through a ConnectRPC API and a Cobra-based CLI.
Design Philosophy
Owl is intentionally single-user and self-hosted. There is no authentication layer and no ambition to scale beyond a personal instance. The Temporal-backed ingestion pipeline is more infrastructure than strictly necessary for the load — it’s part of the learning exercise.
Tech Stack
- Backend: Go, ConnectRPC, PostgreSQL
- Workflow orchestration: Temporal
- Messaging: Kafka (Redpanda in local development)
- Object storage: MinIO (S3-compatible)
- Metadata sources: arXiv Atom API, CrossRef REST API
For a deeper look at how the pieces fit together, see ARCH.md.