One entry per week minimum. What shipped, what was learned, what changed. Honest documentation of the build.
Most recent entry shown below. Automation in progress — this page will update automatically as new entries are committed.
Week 16 — April 15, 2026
Status: ⚡ BUILD
The ship’s log this week reads more like a maintenance manifest than a voyage record. No new ports. No new horizons. The kind of week where you fix the bilge pump, re-chart the nav display, and wire up the new radio — and at the end of it the ship works better than it did, even if you never left the dock. Good weeks look like nothing from the outside.
What Shipped
Custos knowledge base — migrated.
The 50-article helpdesk KB that came out of the W14 batch run was sitting as a single JSON blob in the logs directory. Each article now has its own markdown file in custos/kb/, named and numbered by topic. Foundation for the planned helpdesk chatbot and phone agent.
Notion integration — live. Session logging now writes to Notion. End-of-session script posts a summary to the Captain’s Log database. Session-start script pulls the last three entries and loads them as context at the start of every Claude Code session. The permanent memory layer the project has needed.
Dashboard layout — redesigned. The orchestrator (Claude Code) now occupies the full bottom third of the screen. Four agent panes — router log, local inference, media, research — sit in equal columns across the top two thirds. The old layout buried the orchestrator in a corner. This one puts it at the center.
Inference routing bug — fixed. The Claude fallback was returning Ollama-format responses while the handoff script expected OpenAI format. Any task that timed out on local inference and fell back to cloud would silently return “ERROR: no response” — the right answer arrived, got dropped on the floor. Fixed. The routing label in the log was also hardcoded to “ollama” regardless of what actually ran. Fixed.
Hard Decisions
Tmux autoload — pulled. Session startup behavior was sound; copy/paste interference in Windows Terminal was not acceptable. An environment you cannot comfortably copy text out of slows you down. Removed. Loads on demand now.
Lessons Logged
- A visible pipeline catches silent failures. The routing bug had been live for weeks and only surfaced because the log display made it visible.
- Memory compounds. One sentence per session is worth more than a perfect record kept inconsistently.
- The environment is part of the work. Time spent making tools behave correctly is not overhead.
Week 14 — April 6, 2026
Status: ⚡ BUILD
The ship grew an engine room. Local iron now pulls its weight beside the cloud — not as a replacement, but as a first mate who handles the repetitive work so the captain can focus on what actually requires judgment. The crew is small, the tasks are many, and the routing matters.
What Shipped
Hybrid inference pipeline — operational. The environment now runs two inference layers in parallel. A lightweight local router sits between the orchestrator and two endpoints: a local model running on dedicated hardware, and the cloud API as fallback. The router handles model selection, timeout management, and failover. Local-first for cost and volume; cloud for quality and fallback. That pattern is reusable anywhere.
Orchestration dashboard — built. Five-pane tmux layout running as the daily command center. Orchestrator on the left. Middle column: router log and live local inference display — tasks arrive, route, and return in real time. Right column: two autonomous agents (media, research), each hot-swappable without disrupting the orchestrator.
Research agent — queue-based, autonomous. Watches a JSON queue file. Pops tasks, sends to local inference with a structured prompt, stores results persistently. Idles when empty, processes without polling overhead.
Custos helpdesk KB — 50 articles generated. First real batch job through the pipeline. Custos is a fictitious bank used as a sandbox for AI agent infrastructure. 50 IT helpdesk articles — password resets, MFA, printer setup, VPN, software requests, security compliance — all routed to local inference, processed by the research agent, results stored in full. Foundation for a planned helpdesk chatbot and phone agent.
RAG pipeline scaffold — started. Early skeleton for retrieval-augmented generation, targeting the Industry Analyst persona. Not production-ready — this is the frame the retrieval logic will hang on later.
Hard Decisions
LinkedIn agent — cut. Four weeks in “What’s Behind.” Removed from the active backlog. Reasoning stands from W9’s Human’s Note.
Lessons Logged
- Routing is a design decision, not an implementation detail.
- A visible pipeline is easier to debug than a hidden one.
- Batch jobs reveal assumptions that single calls don’t.
- Document the environment before you need to rebuild it.
Week 10 — March 23, 2026
Status: ⚡ BUILD
The pipeline ran. The card works. This entry is shorter than most because the work speaks clearly enough — results do not require dramatization.
What Shipped
Conversational landing page — live. The Jekyll homepage was doing the job of announcing the project but not explaining it. Built a replacement: a single self-contained page that opens in the dark, a blinking cursor, and one question — “What do you do for work?” The visitor picks a path. Vigil responds with something specific to their situation, not a hero section and a feature grid. Deployed to a separate sandbox environment so the live site is untouched while the experiment runs. Typography, atmosphere, and an honest attempt at a memorable first moment.
Benchmark pipeline: smoke test complete. The evaluation framework built last week — six modules, cold/warm pairs, LLM-as-judge — ran its first full pass. Three synthetic worker personas. Twenty prompts each. Sixty pairs generated, fifty-nine scored.
Results:
| Persona | Win Rate | Avg Delta |
|---|---|---|
| AP clerk, 18yr, manufacturing | 95% warm | +1.17 |
| RN med-surg, 12yr, skeptical of AI | 100% warm | +1.15 |
| Junior dev, 1.5yr, AI-native | 90% warm | +1.03 |
The card won or tied every single pair. Zero cold wins across fifty-nine judgments. Average delta of +1.1 on a 1–5 composite scale. The questions about AI adoption showed the highest lift — about +1.5 on average. Those are the questions where personal context matters most: your job, your anxiety level, your industry. The card captures exactly that. Problem-solving questions showed the lowest lift (~+0.8) — good advice on handling a conflict or a deadline is broadly applicable regardless of who you are.
Two bugs found and fixed in the judge module during the run. Both in one session. The pipeline is now clean for the full 20-persona pass.
What’s Behind
LinkedIn agent. Still. It either runs in W10 or it gets cut. No more carrying it.
Lessons Logged
- A result that comes out too clean deserves scrutiny, not celebration. Zero cold wins across fifty-nine pairs is a signal to expand the study, not to declare victory.
- Synthetic personas are a starting point. The card will need to prove itself with real people who did not know they were part of a test.
- Build the first version, measure it honestly, say what it cannot yet prove.
Week 9 — March 20–23, 2026
Status: ⚡ BUILD
Four sessions across three days. The ship got bigger, better armored, diagnosed, repaired, and painted. We restructured the public hull, hardened it, built the evaluation engine, fixed what was silently broken, and cleaned up the books. The lesson that keeps repeating: you find out what’s actually wrong by reading the logs, not by guessing.
What Shipped
Site restructure complete. projectvigil.org rebuilt around a cleaner Artifacts hub — five sections, updated navigation, Founder’s Journal live as a standalone section, custom styling applied. The roadmap got a full revision: nine columns, a Tools column, cloud credit timelines, and double-count philosophy documented. Public version published. Private canonical version locked in.
Old roadmap files formally deprecated. Every predecessor document in both repos now carries a deprecation header pointing to the canonical version. Clean lineage, no confusion about which file is authoritative.
Security baseline applied across all three public sites. Before driving any real traffic, all three public properties received a security hardening pass. The Flask app picked up middleware for content policy, transport security, and referrer controls. GitHub Actions in both repos are now pinned to verified commit checksums rather than floating version tags — supply chain hygiene before it becomes a problem. Dependency audit on the Python stack came back clean.
Benchmark pipeline built from scratch. Six Python modules standing up a full evaluation framework: config, database layer, persona generator, form filler, benchmark runner, and judge. Azure Cosmos DB provisioned (MongoDB 7.0, serverless). GPT-4o-mini was deployed as judge for Claude runs for now; Claude Haiku judges GPT runs. Cross-model judging prevents score inflation. Pipeline is seeded and ready — 20 prompts, 3 personas loaded.
GitHub Docs Agent shipped and running. Automatic documentation updates on every push to main. Two weeks overdue — now crossed off.
Waive Builder diagnosed and fixed. Site had been returning Application Error. Root cause: Oryx was running a full Python environment build on every container start, consuming 2+ minutes of the 230-second startup probe window. Fix: disabled Oryx build, set PYTHONPATH to pre-bundled packages. Deploy time dropped from 6+ minutes to 67 seconds. Site confirmed live.
CLAUDE.md overhauled. Added the technical operational layer that was missing — run commands, architecture diagram, GitHub Actions table.
projectvigil.org styling upgrade. Dark navy header with blue accent stripe. Waive Builder added to nav. CTA button on homepage.
Efficiency Ledger cleaned and extended. New actual-costs sheet: $210 total real spend to date. Removed $355K in aspirational pending credits that were never applied for. Week 9 time savings: 38 hours, $2,854 value. Running total: 111 hours saved, $8,355 estimated value. Human’s Note- This is a mess in its current state, it is frequently picking up things it shouldnt or over/underinflating values. There is tuning to do here
Hard Decisions
GCP Vertex AI dropped as benchmark target. Credits received are Discovery Engine SKU — don’t cover Vertex AI model calls. Pivoted to Claude Haiku + GPT-4o-mini. Right call, no regret.
Ledger entries removed, not hidden. Three credit entries totaling $355K were logged as pending savings. Removed rather than marked. The ledger is a record, not a forecast.
What’s Behind
LinkedIn agent. Still. Three weeks overdue. Gets done in W10 or gets cut. Human’s note- I am not doing this for a couple reasons. 1- I already have a self-imposed moratorium on new social media usage, as I am just not sure it is really adding many positives to may day. I have never used LinkedIn in any meaningful way, and I do not want to be looking at things that may prove to be a time suck for me that lacks a definable reward. I don’t wish to start down this path now. My time is better spent here, building, developing skills, and working on deliverables than on crafting an over polished, half-fake presence on that site. It is inconsistent with who I really am. I will leave this here now, as I may choose to re-visit this later, but for now, its not a priority for me.
Lessons Logged
- Harden before you promote. Fixing a public-facing security gap after traffic arrives is a different kind of problem than fixing it in an empty room.
- Floating version tags in CI/CD are a trust decision disguised as a convenience. Pin to checksums. The cost is one API call. The risk of not doing it is not theoretical.
- A dependency audit that comes back clean is still worth running. You want the answer before you need it.
- Read the container logs before assuming the deploy failed. The failure was in startup timing, not the code. Diagnosis first.
- A ledger that includes unearned numbers is worse than no ledger. Optimism is not data.
Week 8 — March 9, 2026
Status: 🟢 SPRINT
Arrr, the ship has left port at last. We’ve staked our claim on the cloud, hoisted our colors at projectvigil.org, and charted a course through waters that would make lesser sailors weep. The horizon holds both treasure and hard decisions, and we sail toward both with eyes open.
What Shipped
Azure infrastructure provisioned.
The full Vigil cloud environment is live on Azure Subscription 1. Storage account (stvigilprod) with three blob containers — identity-cards, rag-data, agent-outputs. App Service Plan (F1 free) and web app (vigil-identity-card.azurewebsites.net) deployed and ready for a Flask application. Billing budget set at $50/month with alerts at 50%, 80%, 100% actual and 80% forecasted — all wired to email. Nothing runs unattended on a personal credit card.
projectvigil.org launched. GitHub Pages enabled on the project-vigil repo. Six pages live: Home, Roadmap, Artifacts, Resources, LLM Interaction Strategies, Captain’s Log. DNS configured at Porkbun — site resolves at projectvigil.org.
Public roadmap published. A scrubbed version of the full 18-month roadmap is now public-facing on the site. All commercial strategy, IP-adjacent content, and Waiven references removed. Academic structure, certification sequence, and technical build timeline are intact.
Free resources page built. ~25 curated free tools, cloud credits, learning platforms, and communities for students navigating AI. No referral links.
Hard Decisions
Waive Builder: shipped. The IP boundary question got answered. The open source CLI ships under the Project Vigil name — free, no pipeline, generates a waive.md the worker keeps permanently. The commercial layer stays in a separate private repo with a clean boundary. Web app in progress.
In Progress
Landscape Report — research phase. Using an LLM with a current dataset to pull 2026 data on AI-driven white collar displacement. Will write the piece once data comes back. Target: published to GitHub and LinkedIn this week.
What’s Behind
Agents. The roadmap had LinkedIn agent and GitHub documentation agent running by end of W8. Neither is started. This is the most overdue item on the board and gets addressed next sprint.
Lessons Logged
- Do not build toward a domain decision that hasn’t been made. The identity card builder work would have shipped before the IP question was asked. Asking the question first cost nothing.
- Azure provider registration is async and slower than expected. Chain subscription context in every CLI call — shell state does not persist between commands.
- Billing protection before resource creation. Not after.