Architecture

Ebla uses a hybrid architecture that combines server-authoritative storage with peer-to-peer acceleration. This page explains the core components and how they work together.

System Overview

                    ┌───────────────────────────────────────────────────┐
                    │                   EBLA SERVER                      │
                    │  ┌─────────────────────────────────────────────┐   │
                    │  │              HTTP/2 + WebSocket              │   │
                    │  │  REST API │ Sync │ P2P Coord │ Admin │ WS   │   │
                    │  ├─────────────────────────────────────────────┤   │
                    │  │                Core Services                 │   │
                    │  │  Auth   │ Sync Engine │ Block Store │ Teams  │   │
                    │  │  P2P Tracker │ Knowledge Layer │ GC/Backup  │   │
                    │  └─────────────────────────────────────────────┘   │
                    └───────────────────────────────────────────────────┘
                                │                     │
                                ▼                     ▼
                         ┌────────────┐       ┌────────────────┐
                         │ PostgreSQL │       │ Object Storage │
                         │ (metadata) │       │ (S3/GCS/disk)  │
                         └────────────┘       └────────────────┘
                                │
                                │ HTTP/WS
            ┌───────────────────┼───────────────────┐
            │                   │                   │
            ▼                   ▼                   ▼
    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
    │ EBLA CLIENT  │    │ EBLA CLIENT  │    │ EBLA CLIENT  │
    │ ┌──────────┐ │    │ ┌──────────┐ │    │ ┌──────────┐ │
    │ │  Daemon  │ │    │ │  Daemon  │ │    │ │  Daemon  │ │
    │ │  Watcher │ │    │ │  Watcher │ │    │ │  Watcher │ │
    │ │  P2P     │◄├────┼─┤  P2P     │◄├────┼─┤  P2P     │ │
    │ │  Cache   │ │    │ │  Cache   │ │    │ │  Cache   │ │
    │ └──────────┘ │    │ └──────────┘ │    │ └──────────┘ │
    └──────────────┘    └──────────────┘    └──────────────┘
           │                   │                   │
           └───────────────────┴───────────────────┘
                      P2P (LAN / NAT traversal)

Design Principles

Server-Authoritative

The server is the single source of truth. All commits must be validated and stored by the server before they're considered durable. This eliminates split-brain scenarios and ensures data consistency.

P2P-Accelerated

While the server is authoritative, actual data transfer can happen directly between clients. LAN transfers are preferred, then NAT-traversed peers, then server fallback. This reduces bandwidth costs and improves speed.

Block-Level Deduplication

Files are split into content-addressed blocks (SHA-256). Identical blocks are stored once, even across different files or libraries. This saves storage and bandwidth.

Immutable Commits

All changes are recorded as immutable commits, similar to Git. This enables version history, point-in-time recovery, and conflict detection.

Server Components

API Layer

The server exposes a RESTful API over HTTP/2 with the following route groups:

Route Prefix	Purpose	Auth
`/api/v1/auth`	Authentication (login, register, device codes)	Public/JWT
`/api/v1/libraries`	Library CRUD and sync operations	JWT
`/api/v1/blocks`	Block upload, download, existence checks	JWT
`/api/v1/teams`	Team management and invitations	JWT
`/api/v1/p2p`	Peer registration and discovery	JWT
`/api/v1/admin`	Server administration (GC, backups)	Admin JWT
`/api/v1/ws`	WebSocket for real-time notifications	JWT
`/app`	Client web UI (file browser)	Session
`/admin`	Admin web UI (dashboard)	Admin Session

Sync Engine

The sync engine handles all file synchronization logic:

Commit Processing: Validates incoming commits, checks block availability, updates library HEAD
Conflict Detection: Detects when multiple clients modify the same file
Merge Resolution: Automatic three-way merge for text files, configurable strategies for others
Incremental Commits: Supports staging commits that finalize when all blocks are durable

Block Storage

Blocks are the fundamental unit of storage in Ebla. Each block is:

Content-addressed by SHA-256 hash
Variable size (up to 4MB, with content-defined chunking)
Compressed on write (LZ4 or Zstandard)
Reference-counted for garbage collection

Supported storage backends:

Backend	Config Value	Use Case
Filesystem	`filesystem`	Development, single-server deployments
AWS S3	`s3`	Production, scalable cloud storage
MinIO	`minio`	Self-hosted S3-compatible
Google Cloud Storage	`gcs`	GCP deployments
Azure Blob Storage	`azure`	Azure deployments
Tiered Storage	`tiered`	Multi-tier (hot/warm/cold/archive)

Knowledge Layer

The knowledge layer enables AI-powered search:

Parsers: Extract text from PDF, Markdown, Office documents, code files
Embeddings: Generate vector embeddings using OpenAI, Anthropic, or Ollama
Vector Search: Semantic similarity search using pgvector
RAG Pipeline: Retrieval-augmented generation for Q&A with citations

WebSocket Hub

Real-time notifications for:

Commit Notifications: Instant sync triggers when files change
Library Subscriptions: Per-library change feeds
Tiered Storage Events: Commit finalization and expiration

Client Components

Sync Daemon

The daemon runs continuously in the background:

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Discover │───▶│  Chunk   │───▶│ Persist  │───▶│  Check   │───▶│  Upload  │
│(1 worker)│    │(2-4 workers)│  │(1 writer)│    │(batched) │    │(4-16 workers)│
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
     │               │               │               │               │
     └───────────────┴───────────────┴───────────────┴───────────────┘
                        Bounded channels (backpressure)

Key features:

Streaming Pipeline: Files flow through stages without sequential barriers
Concurrent Workers: CPU-bound chunking (2-4), network-bound uploads (4-16)
Micro-batch Existence Checks: Rolling batches of 2000 hashes or 200ms timeout
Resumable Sessions: Upload sessions persist to SQLite for crash recovery
Incremental Commits: Files appear on server as blocks upload

File Watcher

The watcher detects local file changes using OS-native APIs:

Linux: inotify
macOS: FSEvents
Windows: ReadDirectoryChangesW

Features:

Debounced events (coalesce rapid changes)
Ignore pattern matching (.eblaignore)
Recursive directory watching
Handle renames and moves efficiently

Hash Cache

Multi-level cache to avoid re-hashing unchanged files:

Level	Check	When Used	Speed
1	Size + ModTime	File unchanged	Instant
2	Size + Inode	File touched but same content	Instant
3	Size + First Block Hash	Quick content validation	~25x faster
4	Full File Hash	Content actually changed	Baseline

P2P Engine

The P2P engine handles peer discovery and block transfer:

mDNS Discovery: Automatic LAN peer discovery
Server Coordination: Cross-network peer discovery via server
NAT Traversal: STUN/TURN/ICE for connections across NATs
Block Transfer: Direct TCP/UDP transfer between peers
Block Cache: Cache frequently-accessed blocks locally

Local Database

SQLite database for local state:

Sync configurations (library → path mappings)
Hash cache entries
Upload session state
Local commit journal
Cached server commits

Data Model

Libraries

A library is the top-level container for files:

{
  "id": "lib_abc123",
  "name": "My Documents",
  "owner_id": "usr_xyz",
  "team_id": null,           // null for personal libraries
  "head_commit_id": "commit_789",
  "created_at": "2026-01-01T00:00:00Z"
}

Commits

Every change is recorded as an immutable commit:

{
  "id": "commit_789",
  "library_id": "lib_abc123",
  "parent_ids": ["commit_788"],  // Can have multiple parents (merge)
  "author_id": "usr_xyz",
  "device_id": "dev_123",
  "message": "Added report.pdf",
  "files": [
    {
      "path": "reports/2026/Q1.pdf",
      "action": "add",           // add, modify, delete, rename
      "hash": "sha256:abc...",
      "size": 1048576,
      "blocks": ["blk_1", "blk_2", "blk_3"]
    }
  ],
  "created_at": "2026-01-15T10:30:00Z"
}

Blocks

Content-addressed storage units:

{
  "hash": "sha256:a1b2c3d4e5f6...",
  "size": 4194304,              // 4MB max
  "compressed_size": 3145728,   // After compression
  "tier": "hot",                // Storage tier
  "ref_count": 3,               // Number of files using this block
  "created_at": "2026-01-15T10:30:00Z"
}

Sync Protocol

Push Flow (Client → Server)

Scan: Client scans local folder, computes file hashes
Diff: Compare with last known server state, identify changes
Chunk: Split changed files into blocks
Check: POST /blocks/check - which blocks does server already have?
Upload: POST /blocks/upload - upload missing blocks
Commit: POST /libraries/:id/sync/commit - create commit with file list

Pull Flow (Server → Client)

Check: GET /libraries/:id/sync - get changes since last sync
Fetch Commits: Download new commits and file lists
Identify Blocks: Determine which blocks are needed
Download: GET blocks from P2P peers or server
Assemble: Reconstruct files from blocks
Apply: Write files to local filesystem

Conflict Resolution

When multiple clients modify the same file:

Server detects concurrent modifications (divergent commit history)
Conflict is recorded in the commit
Resolution strategy is applied:
- LWW: Last writer wins (by timestamp)
- Three-Way Merge: For text files, merge changes automatically
- Ours/Theirs: Keep one version
- Manual: User must resolve
Merge commit is created with multiple parents

Security Model

Authentication

JWT Tokens: Stateless authentication with configurable expiry
Device Codes: OAuth-style device authorization flow for CLI
Session Cookies: Secure cookies for web UI
Password Hashing: bcrypt with cost factor 12

Authorization

Library ACLs: Per-user and per-role permissions
Team Roles: Owner, Admin, Member, Viewer
Permission Levels: Admin, Write, Read, None

Data Protection

TLS: All connections encrypted in transit
At-Rest Encryption: Depends on storage backend (S3 SSE, GCS CMEK, etc.)
Block Content: Not visible to P2P peers (only hashes)

Scalability

Library Scaling

For libraries with many files (100K-10M):

Materialized Views: Pre-computed file indexes for fast browsing
Background Workers: Refresh views incrementally on commit
Automatic Threshold: Switch to materialized mode above 10K files

Storage Tiering

Optimize cost and performance with tiered storage:

Hot Tier: Fast NVMe for active blocks
Warm Tier: SSD or standard S3 for recent data
Cold Tier: Infrequent access storage
Archive Tier: Glacier or equivalent for long-term retention

Horizontal Scaling

For high-availability deployments:

Stateless Servers: Run multiple server instances behind a load balancer
PostgreSQL: Primary with read replicas, or managed PostgreSQL
Object Storage: Inherently distributed (S3, GCS)
WebSocket Affinity: Sticky sessions or Redis pub/sub for cross-instance notifications