# Mini-PaaS Deployment System ## Overview A pull-based deployment system that allows deploying Nix-built services without full NixOS rebuilds. Services are defined in a manifest, pulled from an S3 binary cache, and managed as systemd units with Caddy for reverse proxying. ## Problem Statement Current deployment (`push.sh` + full NixOS rebuild) is slow and heavyweight: - Every service change requires rebuilding the entire NixOS configuration - Adding a new service requires modifying Biz.nix and doing a full rebuild - Deploy time from "code ready" to "running in prod" is too long ## Goals 1. **Fast deploys**: Update a single service in <5 minutes without touching others 2. **Independent services**: Deploy services without NixOS rebuild 3. **Add services dynamically**: New services via manifest, no NixOS changes needed 4. **Maintain NixOS for base OS**: Keep NixOS for infra (Postgres, SSH, firewall) 5. **Clear scale-up path**: Single host now, easy migration to Nomad later ## Key Design Decisions 1. **Nix closures, not Docker**: Deploy Nix store paths directly, not containers. Simpler, no Docker daemon needed. Use systemd hardening for isolation. 2. **Pull-based, not push-based**: Target host polls S3 for manifest changes every 5 min. No SSH needed for deploys, just update manifest. 3. **Caddy, not nginx**: Caddy has admin API for dynamic route updates and automatic HTTPS. No config file regeneration needed. 4. **Separation of concerns**: - `bild`: Build tool, adds `--cache` flag to sign+push closures - `push.sh`: Deploy orchestrator, handles both NixOS and service deploys - `deployer`: Runs on target, polls manifest, manages services 5. **Out-of-band secrets**: Secrets stored in `/var/lib/biz-secrets/*.env`, manifest only references paths. No secrets in S3. 6. **Nix profiles for rollback**: Each service gets a Nix profile, enabling `nix-env --rollback`. ## Relevant Existing Files - `Omni/Bild.hs` - Build tool, modify to add `--cache` flag - `Omni/Bild.nix` - Nix build library, has `bild.run` for building packages - `Omni/Ide/push.sh` - Current deploy script, enhance for service deploys - `Biz.nix` - Current NixOS config for biz host - `Biz/Packages.nix` - Builds all Biz packages - `Biz/PodcastItLater/Web.nix` - Example NixOS service module (to be replaced) - `Biz/PodcastItLater/Web.py` - Example Python service (deploy target) - `Omni/Os/Base.nix` - Base NixOS config, add S3 substituter here ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ DEV MACHINE │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ push.sh │ │ │ │ │ │ │ │ if target.nix: (NixOS deploy - existing behavior) │ │ │ │ bild │ │ │ │ nix copy --to ssh://host │ │ │ │ ssh host switch-to-configuration │ │ │ │ │ │ │ │ else: (Service deploy - new behavior) │ │ │ │ bild --cache ──▶ sign + push closure to S3 │ │ │ │ update manifest.json in S3 with new storePath │ │ │ │ (deployer on target will pick up changes) │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ Separation of concerns: │ │ - bild: Build + sign + push to S3 cache (--cache flag) │ │ - push.sh: Orchestrates deploy, updates manifest, handles both modes │ └─────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ DO SPACES (S3 BINARY CACHE) - PRIVATE │ │ │ │ /nar/*.nar.xz ← Compressed Nix store paths │ │ /*.narinfo ← Metadata + signatures │ │ /nix-cache-info ← Cache metadata │ │ /manifest.json ← Current deployment state │ │ /manifests/ ← Historical manifests for rollback │ │ manifest-.json │ │ │ │ Authentication: AWS credentials (Spaces access key) │ │ - Dev machine: write access for pushing │ │ - Target host: read access for pulling │ └─────────────────────────────────────────────────────────────────────────────┘ │ poll every 5 min ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ TARGET HOST (biz) │ │ │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ biz-deployer │ │ │ │ (Python systemd service, runs every 5 min via timer) │ │ │ │ │ │ │ │ 1. Fetch manifest.json from S3 │ │ │ │ 2. Compare to local state │ │ │ │ 3. For changed services: │ │ │ │ - nix copy --from s3://... │ │ │ │ - Generate systemd unit file │ │ │ │ - Create GC root │ │ │ │ - systemctl daemon-reload && restart │ │ │ │ 4. Update Caddy routes via API │ │ │ │ 5. Save local state │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ Directories: │ │ - /var/lib/biz-deployer/services/*.service (generated units) │ │ - /var/lib/biz-deployer/state.json (local state) │ │ - /var/lib/biz-secrets/*.env (secret env files) │ │ - /nix/var/nix/gcroots/biz/* (GC roots) │ │ │ │ NixOS manages: │ │ - Base OS, SSH, firewall │ │ - Caddy with admin API enabled │ │ - PostgreSQL, Redis (infra services) │ │ - biz-deployer service itself │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## Components ### 1. S3 Binary Cache (DO Spaces) **Bucket**: `omni-nix-cache` (private) **Region**: `nyc3` (or nearest) **Credentials**: - Dev machine: `~/.aws/credentials` with `[digitalocean]` profile - Target host: `/root/.aws/credentials` with same profile **Signing key**: - Generate: `nix-store --generate-binary-cache-key omni-cache cache-priv-key.pem cache-pub-key.pem` - Private key: `~/.config/nix/cache-priv-key.pem` (dev machine only) - Public key: Added to target's `nix.settings.trusted-public-keys` **S3 URL format**: ``` s3://omni-nix-cache?profile=digitalocean&scheme=https&endpoint=nyc3.digitaloceanspaces.com ``` ### 2. Manifest Schema (v1) ```json { "version": 1, "generation": "2025-01-15T12:34:56Z", "services": [ { "name": "podcastitlater-web", "artifact": { "type": "nix-closure", "storePath": "/nix/store/abc123-podcastitlater-web-1.2.3" }, "hosts": ["biz"], "exec": { "command": "podcastitlater-web", "user": "pil-web", "group": "pil" }, "env": { "PORT": "8000", "AREA": "Live", "DATA_DIR": "/var/podcastitlater", "BASE_URL": "https://podcastitlater.com" }, "envFile": "/var/lib/biz-secrets/podcastitlater-web.env", "http": { "domain": "podcastitlater.com", "path": "/", "internalPort": 8000 }, "systemd": { "after": ["network-online.target", "postgresql.service"], "requires": [], "restart": "on-failure", "restartSec": 5 }, "hardening": { "dynamicUser": false, "privateTmp": true, "protectSystem": "strict", "protectHome": true }, "revision": "abc123def" } ] } ``` ### 3. Deployer Service (Omni/Deploy/Deployer.py) Python service that: - Polls manifest from S3 - Pulls Nix closures - Generates systemd units - Updates Caddy via API - Manages GC roots - Tracks local state ### 4. NixOS Module (Omni/Deploy/Deployer.nix) Configures: - biz-deployer systemd service + timer - Caddy with admin API - S3 substituter configuration - Required directories and permissions ### 5. Bild Integration (Omni/Bild.hs) New `--cache` flag that: 1. Builds the target 2. Signs the closure with cache key (using NIX_CACHE_KEY env var) 3. Pushes to S3 cache 4. Outputs the store path for push.sh to use Does NOT update manifest - that's push.sh's responsibility. ### 6. Push.sh Enhancement (Omni/Ide/push.sh) Detect deploy mode from target extension: - `.nix` → NixOS deploy (existing behavior) - `.py`, `.hs`, etc. → Service deploy (new behavior) For service deploys: 1. Call `bild --cache` 2. Capture store path from bild output 3. Fetch current manifest.json from S3 4. Archive current manifest to manifests/manifest-.json 5. Update manifest with new storePath for this service 6. Upload new manifest.json to S3 7. Deployer on target picks up change within 5 minutes ## Migration Path ### Phase 1: Infrastructure Setup 1. Create DO Spaces bucket 2. Generate signing keys 3. Configure S3 substituter on target 4. Deploy base deployer service (empty manifest) ### Phase 2: Migrate First Service 1. Choose non-critical service (e.g., podcastitlater-worker) 2. Add to manifest with different port 3. Verify via staging route 4. Flip Caddy to new service 5. Disable old NixOS-managed service ### Phase 3: Migrate Remaining Services - Repeat Phase 2 for each service - Order: worker → web → storybook ### Phase 4: Cleanup - Remove service-specific NixOS modules - Simplify Biz.nix to base OS only ## Rollback Strategy 1. Each deploy archives current manifest to `/manifests/manifest-.json` 2. Rollback = copy old manifest back to `manifest.json` 3. Deployer sees new generation, converges to old state 4. GC roots keep old closures alive (last 5 versions per service) ## Scale-up Path | Stage | Hosts | Changes | |-------|-------|---------| | Current | 1 | Full architecture as described | | 2-3 hosts | 2-3 | Add `hosts` filtering, each host runs deployer | | 4+ hosts | 4+ | Consider Nomad with nix-nomad for job definitions | ## Security Considerations - S3 bucket is private (authenticated reads/writes) - Signing key never leaves dev machine - Secrets stored out-of-band in `/var/lib/biz-secrets/` - systemd hardening for service isolation - Deployer validates manifest schema before applying ## File Locations ``` Omni/ Deploy/ PLAN.md # This document Deployer.py # Main deployer service Deployer.nix # NixOS module Manifest.py # Manifest schema/validation Systemd.py # Unit file generation Caddy.py # Caddy API integration S3.py # S3 operations (for deployer) Bild.hs # Add --cache flag for sign+push Ide/ push.sh # Enhanced: NixOS deploy OR service deploy + manifest update ```