Infrastructure monitoring platform for managing Oracle Cloud VMs without SSH. Three components: Go agent on each VM, control plane API for polling and storage, Next.js dashboard for operations.
Needed to manage multiple VMs running MySpendo, Weather Insight, and other apps. SSH workflow was tedious: check status, read logs, edit .env, restart. Multiply that across two VMs with 3-4 apps each. Built this to do all of that from the browser.

Browser hits Next.js API routes that proxy to control plane. API key stays server-side. Never in browser bundle.
Control plane polls agents every 30 seconds over private Oracle VCN network. Agents listen on localhost:9000. Only accessible to control plane's private IP. No public exposure.
Agents are static Go binaries. Zero runtime dependencies. Read systemd status, Docker containers, journald logs, CPU/memory from /proc, and parse .env files.
Control plane stores history in PostgreSQL. Fires webhook alerts on crashes. Proxies requests to agents. Tracks 30-day uptime per app.
Problem
Log streaming uses Server-Sent Events from journalctl -f. SSE needs long-lived connections to real agents. Demo mode has no agents. Cannot stream fake data over SSE.
Solution
Return 204 in demo mode. EventSource errors immediately. Log viewer detects failure and falls back to HTTP polling (5s interval). Polling hits /logs route that returns demo data. No client changes needed.
Impact
Demo works perfectly. Users browse live demo without infrastructure. SSE and polling look identical to users.
Problem
Editing .env on running services is risky. Partial write or crash mid-update corrupts config. App breaks on next restart.
Solution
Write sequence: backup original → write to .env.tmp → mv .env.tmp .env. The mv is atomic (rename syscall on same filesystem). No partial write window. Crash leaves original .env untouched.
Impact
Zero partial writes. Config changes safe. Backup always available if something breaks.
Problem
Poller fetches status from agent, calls UpdateStatus (writes DB), then fetches app to check changes. Cannot compare old vs new. Needed for alerts and uptime history.
Solution
Fetch app BEFORE UpdateStatus. Capture oldStatus. Compare oldStatus != newStatus after update. Fire webhook only on transition. Write status_history row (close old, open new).
Impact
Alerts fire once per status change. Uptime history accurate. No duplicate notifications.
Go Static Binaries: Zero runtime dependencies. Just scp and run. No Node, Python, or JVM. Learned atomic file operations (rename, temp files, backups).
SSE vs WebSocket: SSE simpler for one-way streams. journalctl -f pipes perfectly. Always need fallback (HTTP polling) for edge cases.
Next.js API Proxy Pattern: Keeps API keys server-side. Never in browser bundle. Makes demo mode easy to guard in one place.
Private Networking: Agents don't need public IPs. Oracle VCN handles private routing. Control plane uses 10.0.0.x addresses. Only control plane needs public endpoint with TLS.
JSONB Flexibility: App config stored as JSONB. Added new features (deploy_dir, auto_restart) without migrations. Just new JSON keys. Structs sync by convention.
What I'd Do Differently: Integration tests for poller and agent handlers. Job queue like Asynq instead of goroutines. Prometheus metrics. Mobile app instead of mobile web.