Unlock the complete DOJ Epstein archive with AI-powered investigation. Download directly from government sources, process with state-of-the-art OCR and NER, and investigate with hybrid semantic search and evidence-grounded AI analysis.
A complete desktop application for investigating the Epstein files with AI-powered search, entity extraction, and multimodal analysis.
Full-text viewer with entity extraction, filtering by labels/people/organizations, and inline entity highlighting.
Knowledge base dashboard with statistics, multi-KB support, index status monitoring, and one-click indexing.
Browse People, Organizations, Locations, Dates with A-Z navigation, search, and document linking.
Visual evidence browser with AI captions, semantic search, and "Find Similar" vector search.
Audio player with waveform visualization, playback controls, and speech-to-text transcription.
Configure AI models with device selection (CPU/MPS/CUDA), variants, and benchmark tools.
Four simple steps to unlock millions of pages of evidence with AI-powered investigation.
Download any of the 12 Epstein archive datasets directly from the DOJ, Internet Archive, or via torrent. SHA256 verification ensures file integrity. Over 250GB of documents, audio, and images available.
Our pipeline processes everything automatically: OCR for scanned documents (PaddleOCR), audio transcription (Whisper), image captioning (vision models), and named entity extraction for people, organizations, dates, and locations.
Browse documents with intelligent filtering, view images with AI-generated captions, play audio with synced transcripts, and navigate the network of extracted entities across the entire archive.
Ask sophisticated questions powered by SOTA BGE-M3 hybrid retrieval (dense + sparse + BM25) with cross-encoder reranking. Every AI answer is grounded in source documents with clickable citations.
A complete toolkit for investigating the publicly released Jeffrey Epstein court documents: OCR processing, hybrid search, entity extraction, multimodal processing, and evidence-grounded AI analysis.
Find evidence across court filings, depositions, and correspondence. BGE-M3 dense + sparse + BM25 with RRF fusion and cross-encoder reranking.
Automatically extract people, organizations, locations, and dates mentioned across the entire archive with spaCy NER.
Ask questions about the archive. Get evidence-grounded answers with citations back to source documents. Multi-provider: Ollama, OpenAI, Claude.
Process scanned FBI documents with PaddleOCR + PyMuPDF fallback. Auto-classification: Depositions, Court Filings, Flight Logs, and more.
Browse 216+ images with AI-generated captions. Semantic search across descriptions and "Find Similar" visual similarity search.
Browse audio files with waveform visualization. Whisper-based speech-to-text transcription with playback sync.
Configure embeddings, reranking, OCR, and transcription models. Device selection: CPU, MPS (Apple Silicon), CUDA.
Everything runs locally. No cloud uploads, no tracking. Use local Ollama models for complete isolation.
Search depositions, court filings, flight logs, and passenger manifests with state-of-the-art hybrid retrieval. BGE-M3 captures semantic meaning while BM25 catches exact names and dates. Cross-encoder reranking surfaces the most relevant evidence.
Automatically extract named entities from all documents. Browse by entity type (People, Organizations, Locations, Dates), filter alphabetically, and see mention counts across the entire archive. Build a complete picture of who was involved, where, and when.
Browse 216+ images from the archive with AI-generated captions. Search across image descriptions with natural language queries, or use "Find Similar" to discover visually related images through vector similarity search.
Browse audio files from the archive with waveform visualization and playback controls. Use Whisper-based speech-to-text to transcribe recordings and make them searchable alongside your documents.
Full control over the AI models powering your investigation. Select devices per model (CPU, MPS for Apple Silicon, CUDA for NVIDIA), choose model variants, benchmark performance, and manage downloads.
A production-ready platform for investigating the Jeffrey Epstein released files. Python/FastAPI backend, Flutter desktop UI, and enterprise-grade search technology.
High-performance Python API with async support, automatic OpenAPI docs, and type-safe endpoints.
Native desktop app for macOS, Windows, and Linux. Material 3 design with responsive layouts.
State-of-the-art multilingual embeddings with dense and learned sparse representations.
Lightweight storage with SQLite for entities and JSON for document cards. No external database required.
All processing on your machine. No cloud required. Your documents never leave your control.
Connect to Ollama for local inference, or use OpenAI and Claude APIs for cloud-powered analysis.
Analyze the publicly released Jeffrey Epstein court documents. Open source, local-first, privacy-respecting.
Download the complete archive directly from the U.S. Department of Justice. Over 250 GB of court documents, depositions, and evidence files.
DataSets 9-11 were removed from DOJ servers and are only available via community torrents. See the GitHub repository for torrent links and verification hashes.