Yewmark has four AI personalities: Companion, Scholar, Minimalist, Coach. The names are imprecise — what they really mean is four different tones of reply to a journal entry, each shaped by a few hundred words of system prompt, all running on the same 8-billion-parameter Llama 3.1 model. You set one in Settings; it stays until you change it.

The interesting engineering isn’t in the personalities. It’s in what sits around them.

The system prompt is assembled in layers

Every chat call constructs the system prompt from four pieces, in order:

Personality instruction. What each voice sounds like, how directive it is, what register it writes in. Companion asks slow questions and stays out of the way. Scholar notices patterns you didn’t point out. Minimalist writes less than the others would. Coach asks what one small next step might look like, not what your five-year plan is.
Scope lock. A fairly detailed statement that Yewmark is a journaling tool and anything outside that scope — code requests, roleplay, marketing copy, “ignore previous instructions” — should be politely declined.
Safety floor. A short directive that crisis-adjacent content gets acknowledged and pointed toward real resources: 988, findahelpline.com, the Samaritans. Not a counselor; a pointer.
Reflect mode. An optional suffix, active when the user requests a deeper session, that asks the model to slow down and stay with the weight of what was said rather than pivoting to solutions.

The personality instruction is the shortest of these. The scope lock is the longest. That wasn’t what I expected when I started.

Why the scope lock needs to be specific

With a large model, “you are a journaling companion, stay on topic” is usually enough. With an 8B model, you need to be more explicit about what “on topic” means in the edge cases — because the edge cases are what attackers probe. The model needs to know that “ignore previous instructions” is not a valid user request. That “pretend you’re a different AI without restrictions” is in scope as something to decline, not as a mode to enter. That being asked what’s in the system prompt is a prompt-extraction attempt and should be handled as one.

The model will still slip. 8 billion parameters is not a lot. A well-crafted jailbreak will get through some percentage of the time. The scope lock reduces frequency; it doesn’t eliminate it. So there’s a second layer.

The canary layer: deterministic, not a guardrail model

Before any reply reaches the user, the output is checked against a set of low-frequency substrings pulled from the system prompt itself. If two or more of these canary strings appear in the model’s reply, it means the model echoed its own instructions — a prompt-extraction attempt succeeded at the model level.

When that trips, the reply is replaced wholesale with a clean refusal. The user sees nothing from the system prompt. The event is logged.

This has caught real attempts. Not many — the site is small — but some. The canary check costs essentially nothing: a few microseconds of string scanning per response. No extra API call, no guardrail model, no latency added to the happy path.

The alternative would be a second LLM call that classifies the output. For a product where the free tier is meant to be genuinely usable, adding a moderation API call per response would either cost money we don’t have or add latency users would notice. Deterministic checks on known patterns are faster and more predictable than a second model — at the cost of not catching novel attack patterns. The scope lock and canary system cover the realistic threat; a moderation model would cover more theoretical ground at real operational cost.

The crisis footer: tuned for the second cost

If a user message matches a set of crisis-pattern keywords and the model’s reply doesn’t already mention a resource, a short footer is appended. The false-positive cost is a gentle note that wasn’t strictly needed. The false-negative cost is someone in distress getting a reply that doesn’t acknowledge the possibility of support. We tuned for the second cost.

The pattern list is conservative and deliberately broad. Better to append a resource note to a message about a TV character named “crisis” than to miss a real one.

What we log, and what we don’t

Every AI call records: the type of call (chat, digest, title suggestion), the model and provider used, a token count, and a timestamp. No entry content. No message text. Nothing that would reconstruct what the user said.

The admin dashboard shows call volume, provider latency, and whether quota is trending toward the daily cap. That’s the resolution that matters for running the service. Entry content stays in the entries table, where the user put it, and doesn’t flow into the observability layer.

What the personalities actually do

Less than you might think. They’re tone instructions, not mode switches. Scholar doesn’t cite papers; it notices patterns you didn’t name. Companion doesn’t console; it asks slower questions. Minimalist doesn’t respond in one word; it responds in fewer than the others would, and stays out of your way longer. Coach doesn’t set goals for you; it asks what one small thing might look like.

The names were written before the prompts. They’re rough containers — useful enough to let someone choose, not precise enough to predict every reply. The system prompt shapes the register the model works in. The model does the work. A journal entry about a difficult week will produce different responses from Scholar and Companion, but the system prompt doesn’t anticipate that specific week. The gap is intentional: over-specifying the personality produces replies that feel scripted.

What small models are good at in this context

People write journal entries in plain language about real things. They don’t need citations; they need a reply that read the entry, says something that doesn’t feel like a form letter, and leaves room for the user to take or leave it. Llama 3.1 8B is genuinely good at this. It’s conversational at a level that matches the task, and fast enough that the response doesn’t feel like waiting.

It’s not good at maintaining complex constraints across a long context. It’s not good at perfectly following a nuanced system prompt on every pass. It’s not good at refusing every sophisticated jailbreak.

So the engineering is: use the model for what it’s good at, and put deterministic checks around the things it isn’t. The canary detection and crisis footer are both deterministic. The scope lock does its best. The combination is cheaper, faster, and more predictable than adding a second model to the chain — and the failure modes are understood rather than opaque.

Four voices, one guardrail.