Architecture
Ebla uses a hybrid architecture that combines server-authoritative storage with peer-to-peer acceleration. This page explains the core components and how they work together.
System Overview
┌───────────────────────────────────────────────────┐
│ EBLA SERVER │
│ ┌─────────────────────────────────────────────┐ │
│ │ HTTP/2 + WebSocket │ │
│ │ REST API │ Sync │ P2P Coord │ Admin │ WS │ │
│ ├─────────────────────────────────────────────┤ │
│ │ Core Services │ │
│ │ Auth │ Sync Engine │ Block Store │ Teams │ │
│ │ P2P Tracker │ Knowledge Layer │ GC/Backup │ │
│ └─────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────┘
│ │
▼ ▼
┌────────────┐ ┌────────────────┐
│ PostgreSQL │ │ Object Storage │
│ (metadata) │ │ (S3/GCS/disk) │
└────────────┘ └────────────────┘
│
│ HTTP/WS
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ EBLA CLIENT │ │ EBLA CLIENT │ │ EBLA CLIENT │
│ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │
│ │ Daemon │ │ │ │ Daemon │ │ │ │ Daemon │ │
│ │ Watcher │ │ │ │ Watcher │ │ │ │ Watcher │ │
│ │ P2P │◄├────┼─┤ P2P │◄├────┼─┤ P2P │ │
│ │ Cache │ │ │ │ Cache │ │ │ │ Cache │ │
│ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└───────────────────┴───────────────────┘
P2P (LAN / NAT traversal)
Design Principles
Server-Authoritative
The server is the single source of truth. All commits must be validated and stored by the server before they're considered durable. This eliminates split-brain scenarios and ensures data consistency.
P2P-Accelerated
While the server is authoritative, actual data transfer can happen directly between clients. LAN transfers are preferred, then NAT-traversed peers, then server fallback. This reduces bandwidth costs and improves speed.
Block-Level Deduplication
Files are split into content-addressed blocks (SHA-256). Identical blocks are stored once, even across different files or libraries. This saves storage and bandwidth.
Immutable Commits
All changes are recorded as immutable commits, similar to Git. This enables version history, point-in-time recovery, and conflict detection.
Server Components
API Layer
The server exposes a RESTful API over HTTP/2 with the following route groups:
| Route Prefix | Purpose | Auth |
|---|---|---|
/api/v1/auth |
Authentication (login, register, device codes) | Public/JWT |
/api/v1/libraries |
Library CRUD and sync operations | JWT |
/api/v1/blocks |
Block upload, download, existence checks | JWT |
/api/v1/teams |
Team management and invitations | JWT |
/api/v1/p2p |
Peer registration and discovery | JWT |
/api/v1/admin |
Server administration (GC, backups) | Admin JWT |
/api/v1/ws |
WebSocket for real-time notifications | JWT |
/app |
Client web UI (file browser) | Session |
/admin |
Admin web UI (dashboard) | Admin Session |
Sync Engine
The sync engine handles all file synchronization logic:
- Commit Processing: Validates incoming commits, checks block availability, updates library HEAD
- Conflict Detection: Detects when multiple clients modify the same file
- Merge Resolution: Automatic three-way merge for text files, configurable strategies for others
- Incremental Commits: Supports staging commits that finalize when all blocks are durable
Block Storage
Blocks are the fundamental unit of storage in Ebla. Each block is:
- Content-addressed by SHA-256 hash
- Variable size (up to 4MB, with content-defined chunking)
- Compressed on write (LZ4 or Zstandard)
- Reference-counted for garbage collection
Supported storage backends:
| Backend | Config Value | Use Case |
|---|---|---|
| Filesystem | filesystem |
Development, single-server deployments |
| AWS S3 | s3 |
Production, scalable cloud storage |
| MinIO | minio |
Self-hosted S3-compatible |
| Google Cloud Storage | gcs |
GCP deployments |
| Azure Blob Storage | azure |
Azure deployments |
| Tiered Storage | tiered |
Multi-tier (hot/warm/cold/archive) |
Knowledge Layer
The knowledge layer enables AI-powered search:
- Parsers: Extract text from PDF, Markdown, Office documents, code files
- Embeddings: Generate vector embeddings using OpenAI, Anthropic, or Ollama
- Vector Search: Semantic similarity search using pgvector
- RAG Pipeline: Retrieval-augmented generation for Q&A with citations
WebSocket Hub
Real-time notifications for:
- Commit Notifications: Instant sync triggers when files change
- Library Subscriptions: Per-library change feeds
- Tiered Storage Events: Commit finalization and expiration
Client Components
Sync Daemon
The daemon runs continuously in the background:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Discover │───▶│ Chunk │───▶│ Persist │───▶│ Check │───▶│ Upload │
│(1 worker)│ │(2-4 workers)│ │(1 writer)│ │(batched) │ │(4-16 workers)│
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │ │
└───────────────┴───────────────┴───────────────┴───────────────┘
Bounded channels (backpressure)
Key features:
- Streaming Pipeline: Files flow through stages without sequential barriers
- Concurrent Workers: CPU-bound chunking (2-4), network-bound uploads (4-16)
- Micro-batch Existence Checks: Rolling batches of 2000 hashes or 200ms timeout
- Resumable Sessions: Upload sessions persist to SQLite for crash recovery
- Incremental Commits: Files appear on server as blocks upload
File Watcher
The watcher detects local file changes using OS-native APIs:
- Linux: inotify
- macOS: FSEvents
- Windows: ReadDirectoryChangesW
Features:
- Debounced events (coalesce rapid changes)
- Ignore pattern matching (
.eblaignore) - Recursive directory watching
- Handle renames and moves efficiently
Hash Cache
Multi-level cache to avoid re-hashing unchanged files:
| Level | Check | When Used | Speed |
|---|---|---|---|
| 1 | Size + ModTime | File unchanged | Instant |
| 2 | Size + Inode | File touched but same content | Instant |
| 3 | Size + First Block Hash | Quick content validation | ~25x faster |
| 4 | Full File Hash | Content actually changed | Baseline |
P2P Engine
The P2P engine handles peer discovery and block transfer:
- mDNS Discovery: Automatic LAN peer discovery
- Server Coordination: Cross-network peer discovery via server
- NAT Traversal: STUN/TURN/ICE for connections across NATs
- Block Transfer: Direct TCP/UDP transfer between peers
- Block Cache: Cache frequently-accessed blocks locally
Local Database
SQLite database for local state:
- Sync configurations (library → path mappings)
- Hash cache entries
- Upload session state
- Local commit journal
- Cached server commits
Data Model
Libraries
A library is the top-level container for files:
{
"id": "lib_abc123",
"name": "My Documents",
"owner_id": "usr_xyz",
"team_id": null, // null for personal libraries
"head_commit_id": "commit_789",
"created_at": "2026-01-01T00:00:00Z"
}
Commits
Every change is recorded as an immutable commit:
{
"id": "commit_789",
"library_id": "lib_abc123",
"parent_ids": ["commit_788"], // Can have multiple parents (merge)
"author_id": "usr_xyz",
"device_id": "dev_123",
"message": "Added report.pdf",
"files": [
{
"path": "reports/2026/Q1.pdf",
"action": "add", // add, modify, delete, rename
"hash": "sha256:abc...",
"size": 1048576,
"blocks": ["blk_1", "blk_2", "blk_3"]
}
],
"created_at": "2026-01-15T10:30:00Z"
}
Blocks
Content-addressed storage units:
{
"hash": "sha256:a1b2c3d4e5f6...",
"size": 4194304, // 4MB max
"compressed_size": 3145728, // After compression
"tier": "hot", // Storage tier
"ref_count": 3, // Number of files using this block
"created_at": "2026-01-15T10:30:00Z"
}
Sync Protocol
Push Flow (Client → Server)
- Scan: Client scans local folder, computes file hashes
- Diff: Compare with last known server state, identify changes
- Chunk: Split changed files into blocks
- Check: POST
/blocks/check- which blocks does server already have? - Upload: POST
/blocks/upload- upload missing blocks - Commit: POST
/libraries/:id/sync/commit- create commit with file list
Pull Flow (Server → Client)
- Check: GET
/libraries/:id/sync- get changes since last sync - Fetch Commits: Download new commits and file lists
- Identify Blocks: Determine which blocks are needed
- Download: GET blocks from P2P peers or server
- Assemble: Reconstruct files from blocks
- Apply: Write files to local filesystem
Conflict Resolution
When multiple clients modify the same file:
- Server detects concurrent modifications (divergent commit history)
- Conflict is recorded in the commit
- Resolution strategy is applied:
- LWW: Last writer wins (by timestamp)
- Three-Way Merge: For text files, merge changes automatically
- Ours/Theirs: Keep one version
- Manual: User must resolve
- Merge commit is created with multiple parents
Security Model
Authentication
- JWT Tokens: Stateless authentication with configurable expiry
- Device Codes: OAuth-style device authorization flow for CLI
- Session Cookies: Secure cookies for web UI
- Password Hashing: bcrypt with cost factor 12
Authorization
- Library ACLs: Per-user and per-role permissions
- Team Roles: Owner, Admin, Member, Viewer
- Permission Levels: Admin, Write, Read, None
Data Protection
- TLS: All connections encrypted in transit
- At-Rest Encryption: Depends on storage backend (S3 SSE, GCS CMEK, etc.)
- Block Content: Not visible to P2P peers (only hashes)
Scalability
Library Scaling
For libraries with many files (100K-10M):
- Materialized Views: Pre-computed file indexes for fast browsing
- Background Workers: Refresh views incrementally on commit
- Automatic Threshold: Switch to materialized mode above 10K files
Storage Tiering
Optimize cost and performance with tiered storage:
- Hot Tier: Fast NVMe for active blocks
- Warm Tier: SSD or standard S3 for recent data
- Cold Tier: Infrequent access storage
- Archive Tier: Glacier or equivalent for long-term retention
Horizontal Scaling
For high-availability deployments:
- Stateless Servers: Run multiple server instances behind a load balancer
- PostgreSQL: Primary with read replicas, or managed PostgreSQL
- Object Storage: Inherently distributed (S3, GCS)
- WebSocket Affinity: Sticky sessions or Redis pub/sub for cross-instance notifications