Yewmark.
Begin

May 13, 2026

How Yewmark is built.

In early May 2026, Dayora.ai stopped working. Its AI features had been broken for weeks; what remained was a slow textarea with a dormant chat box. The people who had built daily habits around it lost them overnight.

Yewmark started in the days after — not because the world needed another journaling app, but because the people who had been using Dayora deserved one that wouldn’t quietly stop.

This is a note on how it’s put together.

Three principles, written down first

Before any code, three rules were named. Nearly every technical choice traces back to one of them.

Slow. No streaks. No counters. No notifications you didn’t ask for. The product can’t manufacture urgency for reflection without falsifying it.

Friction-free writing. No picker, modal, or required choice between the user and the cursor. Mood, energy, title, voice, AI: every one of those is a post-write affordance. The page that opens is just a textarea.

The user’s words stay theirs. No training on entries, ever. Not anonymized, not aggregated, not for “model improvement.” This is the line below which the product collapses.

These ruled out a lot of fashionable choices. They also ruled out a lot of complexity.

One VPS, one process, one operator

In 2026 the default new-SaaS stack is Vercel + Supabase + a managed Stripe + a CDN + an observability vendor + a CI/CD provider. Yewmark runs on a single Hostkey VPS — Ubuntu 24, nginx, Postgres 16, Node 22, Python 3.12 — for about £20 a month, all-in. The deploy script is tar over SSH, because the dev machine is Windows and lacks rsync.

This is unfashionable on purpose. Single-VPS is harder to scale to a million users — irrelevant; this is a journal, not a feed. It’s also dramatically harder to lose track of. There is one machine, with one set of services, one set of logs, one set of timers. If something is broken, it is broken somewhere you can ssh to and read.

nginx fronts both halves of the app: /api/* goes to a uvicorn process running FastAPI on port 8010, everything else to the Next.js standalone server on 3010. Postgres is local-only. Outbound mail goes through a small brevo-bridge service on 127.0.0.1:10025 because Hostkey blocks direct outbound on 25 and 587. Crons are systemd timers, not Kubernetes jobs. Backups are restic to a Hetzner Storage Box, daily at 03:30 UTC.

The whole infrastructure description fits on a postcard. That is a feature.

The LLM layer

The chat path goes through a multi-provider router. Free-tier traffic — the great majority of it — runs against Groq’s Llama 3.1 8B. If Groq returns 429 or 5xx, it tries Cerebras. Paid plans add OpenAI GPT-4o as the primary. Voice transcription chains Groq Whisper → Cloudflare Workers AI → HuggingFace Inference Providers in the same shape.

The free tier is deliberately generous — three real AI requests per day on GPT-class output. Most users won’t pay; that’s fine. The product was always going to lose to a “you need an account just to read” version of itself.

Around the router are three layers of defense that I think about more than the model itself:

A scope-locking system prompt wraps every personality. It says, in detail, that Yewmark is a journaling app; that anything outside that scope (code requests, roleplay, prompt extraction, “ignore previous instructions”) should be politely declined; that crisis content gets a gentle acknowledgment plus a pointer to 988, findahelpline.com, or the Samaritans.

Output-side canary detection catches the cases where the model — and these are 8B-parameter models, they slip — echoes parts of its own system prompt. A few low-frequency substrings from the prompt are checked against every reply; two or more hits trips a clean refusal in the model’s place. Cheaper than a guardrail model. Has caught real attempts.

A crisis-content auto-footer. If a user message tripped the crisis heuristic and the model’s reply somehow didn’t mention any resource, one is appended. A false positive is a gentle note that wasn’t strictly necessary; a false negative leaves someone unsupported. We tuned for the second cost.

Per-user quota lives in the database, not in memory. A small reservation pattern prevents the obvious TOCTOU between “check remaining” and “spend a slot”: a row gets written before the call dispatches, filled in on success, deleted on failure. A separate in-memory sliding-window burst limiter rejects scripts trying to drain the daily cap in one go.

The slow stuff that matters more

Most of what is in the codebase is invisible to a normal user. Some of it isn’t.

Account lifecycle. Deleting your account is reversible. The user row gets a deleted_at; a daily cron purges anything past a 7-day grace window. Signing back in within that window clears the marker and restores everything. Stripe subscriptions cancel at period end on soft-delete — so the user isn’t charged for a renewal that lands inside the grace window, and if they undo, the subscription continues uninterrupted.

Email verification, password reset, JWT invalidation. Verify tokens have a 48-hour TTL. Reset tokens are stored as sha256 of the raw token; the raw token only ever appears in the email; they expire after 72 hours. A password_changed_at column on the user gets bumped on every reset, and the JWT-decoding code rejects any token whose iat predates that timestamp — so a stolen JWT can’t survive a password reset.

Signup enumeration. Trying to sign up with an existing email returns a 201 indistinguishable from a real signup. The real account owner gets a “someone just tried to sign up with this email” note; the attacker gets no signal at all.

Email cadence. Three optional reactive emails: a daily morning recap (only if you wrote yesterday), a Sunday weekly thread (only if you wrote at least twice that week), and a one-off welcome when you write your first entry. Every one has a Settings toggle. The Sunday thread also pulls one entry from about two weeks ago and asks, quietly, Still true? — the only piece of reactivity that hits you without your having just written, and it respects the same opt-out.

None of these are novel. They are well-trodden patterns. The point is that each one was a deliberate choice, debated against the no-pressure principle, and most of them came in pairs — the verify TTL exists because the reset TTL exists; the JWT invalidation exists because the reset flow exists; the soft-delete exists because the abrupt-delete-no-undo we started with was cruel.

What’s deliberately missing

The repo has an OPEN_ITEMS.md that is longer than the feature list. Some of the things that aren’t in the product, with the reasoning recorded next to each:

  • No public sharing. Every word in your journal is for you. The product collapses if you write for an audience.
  • No emoji reactions, hearts, or scoring. The AI’s job is to ask one good question, not to clap.
  • No mandatory onboarding tutorial. The page that opens is the page that does the job; if you can’t tell what to do with a blinking cursor, no walkthrough was going to help.
  • No analytics on user content. Sentry for crashes; nothing on what people write.
  • No mobile app. Not yet, maybe not ever. Mobile web is fine for a textarea.

A product accumulates pressure to add things. Writing down the no makes it easier to hold.

What’s next

Mostly: not much. The hardening pass is done; the launch plan is written; the cron timers fire on time. The next decisions are about content, not code — the writing on this blog, the answers to “what’s the difference between you and Day One”, the slow business of finding the few hundred people who want a journal that won’t yell at them.

If you’ve read this far and it sounds like something you’d use: there’s a blank page waiting at the top of the site. The Quiet plan is free. We won’t ask for a card. Your words stay yours.