Librarius — Jeffrey Epstein Released Files Investigation Platform

Platform Preview

See Librarius in action

A complete desktop application for investigating the Epstein files with AI-powered search, entity extraction, and multimodal analysis.

Documents Browser

Full-text viewer with entity extraction, filtering by labels/people/organizations, and inline entity highlighting.

RAG & Data Management

Knowledge base dashboard with statistics, multi-KB support, index status monitoring, and one-click indexing.

Entity Explorer

Browse People, Organizations, Locations, Dates with A-Z navigation, search, and document linking.

Image Gallery

Visual evidence browser with AI captions, semantic search, and "Find Similar" vector search.

Audio Transcription

Audio player with waveform visualization, playback controls, and speech-to-text transcription.

Model Management

Configure AI models with device selection (CPU/MPS/CUDA), variants, and benchmark tools.

Get Started

How It Works

Four simple steps to unlock millions of pages of evidence with AI-powered investigation.

1

Download

Download any of the 12 Epstein archive datasets directly from the DOJ, Internet Archive, or via torrent. SHA256 verification ensures file integrity. Over 250GB of documents, audio, and images available.

2

Index

Our pipeline processes everything automatically: OCR for scanned documents (PaddleOCR), audio transcription (Whisper), image captioning (vision models), and named entity extraction for people, organizations, dates, and locations.

3

Explore

Browse documents with intelligent filtering, view images with AI-generated captions, play audio with synced transcripts, and navigate the network of extracted entities across the entire archive.

4

Investigate

Ask sophisticated questions powered by SOTA BGE-M3 hybrid retrieval (dense + sparse + BM25) with cross-encoder reranking. Every AI answer is grounded in source documents with clickable citations.

Core Features

Everything you need to investigate the released files

A complete toolkit for investigating the publicly released Jeffrey Epstein court documents: OCR processing, hybrid search, entity extraction, multimodal processing, and evidence-grounded AI analysis.

🔍

Hybrid Search

Find evidence across court filings, depositions, and correspondence. BGE-M3 dense + sparse + BM25 with RRF fusion and cross-encoder reranking.

Verbatim Quotes Sources Rerank

👤

Entity Extraction

Automatically extract people, organizations, locations, and dates mentioned across the entire archive with spaCy NER.

857 People 1.3K Orgs 323 Locations

💬

RAG Chat

Ask questions about the archive. Get evidence-grounded answers with citations back to source documents. Multi-provider: Ollama, OpenAI, Claude.

Evidence-Based Citations Streaming

📄

Document OCR

Process scanned FBI documents with PaddleOCR + PyMuPDF fallback. Auto-classification: Depositions, Court Filings, Flight Logs, and more.

245 Docs 7.9K Chunks

📷

Image Analysis

Browse 216+ images with AI-generated captions. Semantic search across descriptions and "Find Similar" visual similarity search.

216 Images Captions Similar

🎧

Audio Transcription

Browse audio files with waveform visualization. Whisper-based speech-to-text transcription with playback sync.

Whisper Waveform Sync

⚙

Model Management

Configure embeddings, reranking, OCR, and transcription models. Device selection: CPU, MPS (Apple Silicon), CUDA.

MPS/CUDA Benchmark

🔒

Privacy-First

Everything runs locally. No cloud uploads, no tracking. Use local Ollama models for complete isolation.

Local Private Ollama

245

Documents

7.9K

Chunks

216

Images

857

People

1.3K

Organizations

323

Locations

610

Dates

Evidence Search

Find evidence across thousands of pages

Search depositions, court filings, flight logs, and passenger manifests with state-of-the-art hybrid retrieval. BGE-M3 captures semantic meaning while BM25 catches exact names and dates. Cross-encoder reranking surfaces the most relevant evidence.

Search across all document types in the archive
Find specific names, dates, and locations
Cross-encoder reranking for relevance
Verbatim quotes with document and page citations

Entity Extraction

Map every name in the files

Automatically extract named entities from all documents. Browse by entity type (People, Organizations, Locations, Dates), filter alphabetically, and see mention counts across the entire archive. Build a complete picture of who was involved, where, and when.

857 people extracted from documents
1,300+ organizations and agencies
323 locations: properties, cities, addresses
610 dates and time periods

Image Analysis

Search visual evidence with AI

Browse 216+ images from the archive with AI-generated captions. Search across image descriptions with natural language queries, or use "Find Similar" to discover visually related images through vector similarity search.

AI-generated captions for every image
Semantic search across image descriptions
Find Similar: visual similarity search
Filter by knowledge base source

Audio Transcription

Transcribe and search audio files

Browse audio files from the archive with waveform visualization and playback controls. Use Whisper-based speech-to-text to transcribe recordings and make them searchable alongside your documents.

Waveform visualization with seek support
Playback controls: play, pause, skip, speed
Whisper speech-to-text transcription
Transcription status tracking

Model Management

Configure your AI pipeline

Full control over the AI models powering your investigation. Select devices per model (CPU, MPS for Apple Silicon, CUDA for NVIDIA), choose model variants, benchmark performance, and manage downloads.

BGE-M3 embeddings with MPS/CUDA support
BGE-reranker-v2-m3 cross-encoder
PaddleOCR for document processing
Benchmark tools to compare performance

Technical Architecture

Built for serious document investigation

A production-ready platform for investigating the Jeffrey Epstein released files. Python/FastAPI backend, Flutter desktop UI, and enterprise-grade search technology.

FastAPI Backend

High-performance Python API with async support, automatic OpenAPI docs, and type-safe endpoints.

Flutter Desktop

Native desktop app for macOS, Windows, and Linux. Material 3 design with responsive layouts.

BGE-M3 Embeddings

State-of-the-art multilingual embeddings with dense and learned sparse representations.

SQLite + JSON

Lightweight storage with SQLite for entities and JSON for document cards. No external database required.

Local & Private

All processing on your machine. No cloud required. Your documents never leave your control.

Multi-Provider LLM

Connect to Ollama for local inference, or use OpenAI and Claude APIs for cloud-powered analysis.

Supported Models

Default model catalog

Model variants exposed in the app Model Management screen and `/models` API.

Model	Category	Default Variant	Available Variants	Runtime
`chat.codex`	Chat (Cloud)	`gpt-5.2-codex`	`gpt-5.2-codex`, `gpt-5-codex-mini`	API
`chat.claude`	Chat (Cloud)	`claude-sonnet-4-20250514`	`claude-sonnet-4-20250514`, `claude-opus-4-20250514`	API
`chat.ollama`	Chat (Local)	`llama3.2`	`llama3.2`, `qwen2.5:7b`	Local CPU/GPU
`ocr.vision`	OCR (Native)	`vision-ocr`	`vision-ocr`	Local ANE/CPU
`embed.hash`	Embedding (Native)	`hash-384`	`hash-384`	Local CPU

Start investigating the Epstein files

Analyze the publicly released Jeffrey Epstein court documents. Open source, local-first, privacy-respecting.

Download for macOS Learn More

Buy License (Polar) Buy License (LemonSqueezy)

7-day in-app trial, then buy on web or directly from the app Pro tab.

macOS · Windows · Linux · Python · Flutter

The codebase is cross-platform, but we currently provide macOS binaries only. License: Source code is licensed under Business Source License 1.1 (BSL-1.1), and binary distributions are licensed under the Librarius Binary Distribution License. See LICENSE, BINARY-LICENSE.txt, and the website License page.

⚠

Alpha Release

This is an early alpha version intended for testing and development. Features may be incomplete, unstable, or change significantly before the stable release. Please report any issues on GitHub.

✎

Unsigned Build

This macOS build is not signed by Apple. Remove the quarantine attribute in Terminal before first launch:

# If installed to /Applications (system-wide):
xattr -d com.apple.quarantine /Applications/Librarius.app

# If installed to ~/Applications (user-only):
xattr -d com.apple.quarantine ~/Applications/Librarius.app

Why? macOS quarantines downloaded apps. For unsigned apps, Gatekeeper may block execution until this flag is removed.

Official Sources

DOJ Epstein Dataset Downloads

Download the complete archive directly from the U.S. Department of Justice. Over 250 GB of court documents, depositions, and evidence files.

Dataset	Size	Source	Download
DataSet 1	2.1 GB	DOJ	Download ZIP
DataSet 2	44.1 GB	DOJ	Download ZIP
DataSet 3	23.3 GB	DOJ	Download ZIP
DataSet 4	14.3 GB	DOJ	Download ZIP
DataSet 5	32.1 GB	DOJ	Download ZIP
DataSet 6	38.2 GB	DOJ	Download ZIP
DataSet 7	45.3 GB	DOJ	Download ZIP
DataSet 8	45.6 GB	DOJ	Download ZIP
DataSet 9	~10 GB	Torrent Only	See GitHub
DataSet 10	~10 GB	Torrent Only	See GitHub
DataSet 11	~10 GB	Torrent Only	See GitHub
DataSet 12	5.3 GB	DOJ	Download ZIP

DataSets 9-11 were removed from DOJ servers and are only available via community torrents. See the GitHub repository for torrent links and verification hashes.

Investigate theEpstein files

See Librarius in action

Documents Browser

RAG & Data Management

Entity Explorer

Image Gallery

Audio Transcription

Model Management

How It Works

Download

Index

Explore

Investigate

Everything you need to investigate the released files

Hybrid Search

Entity Extraction

RAG Chat

Document OCR

Image Analysis

Audio Transcription

Model Management

Privacy-First

Find evidence across thousands of pages

Map every name in the files

Search visual evidence with AI

Transcribe and search audio files

Configure your AI pipeline

Built for serious document investigation

FastAPI Backend

Flutter Desktop

BGE-M3 Embeddings

SQLite + JSON

Local & Private

Multi-Provider LLM

Default model catalog

Start investigating the Epstein files

Alpha Release

Unsigned Build

DOJ Epstein Dataset Downloads

Investigate the
Epstein files