Prompting Well

What a prompt actually is

You already know that a large language model is, at heart, a next-token predictor: it reads the text so far and produces a probability over what comes next. Prompting is simply the craft of arranging that text so the most likely continuation is the answer you want. The model's billions of weights do not change when you prompt — they were fixed at the end of training. The only thing you control is the input, and the prompt is that input.

This reframes the whole activity. You are not *teaching* the model anything in the lasting sense — nothing it learns from your prompt survives the conversation. You are *steering* a system that already absorbed an enormous amount during instruction tuning and pretraining, pointing its existing abilities at your specific task. Good prompting is less like programming and more like giving precise directions to a brilliant, literal-minded colleague who has read almost everything but knows nothing about *your* situation.

In-context learning: learning without training

The reason prompting works at all is a striking property called in-context learning: a large pretrained model can pick up a new pattern from examples placed in the prompt, with no weight updates, no backpropagation, no gradient descent. Show it three labelled examples of a task and it will often continue the fourth correctly. This was not explicitly designed in; it surfaced once models grew large enough, and it is one of the few genuinely emergent abilities worth the name.

Be honest about what this is, though. In-context learning is not the model rewiring itself; it is the model *recognising* a pattern it can already represent and conditioning its next-token guesses on it. That distinction explains the limits: examples help most when the task is something the model has seen relatives of during training, and they cannot reliably install a brand-new skill, teach fresh facts, or override deep tendencies. When people speak of 'emergence' as if abilities appear by magic past some threshold, treat it with care — much of the sharpness is an artefact of how we *measure* a task, not a sudden switch flipping inside the network.

Zero-shot, few-shot, and choosing your examples

The simplest prompt is zero-shot: you just describe the task and ask. Thanks to instruction tuning, modern models handle a huge range of zero-shot requests well. When they wobble — inconsistent formatting, ambiguous categories, an unusual style — you reach for few-shot prompting: include a handful of input–output examples right in the prompt. This is few-shot learning done purely through context, and it is often the fastest, cheapest fix before you ever consider fine-tuning.

Classify the sentiment as positive / negative / neutral.

Review: "Battery dies in an hour."          -> negative
Review: "Setup took five minutes, love it."  -> positive
Review: "It is a phone. It makes calls."      -> neutral

Review: "Screen is gorgeous but it overheats." ->

A three-shot prompt. The examples teach the label set, the exact output format (one word after an arrow), and how to handle mixed signals — none of which the instruction alone pinned down.

Examples carry more than you might think. They quietly fix the output format, demonstrate edge cases, and set the tone — often more reliably than a paragraph of instructions. A few craft notes: cover the *variety* of inputs you expect rather than three near-identical easy ones; keep the format of every example byte-for-byte consistent, because the model imitates surface form eagerly; and watch ordering — for some tasks the label of the last example nudges the answer. More is not always better: two sharp examples usually beat ten sloppy ones, and every example spends part of your context window.

The system prompt: setting the stage

Most chat models split the input into roles, and the first one is the system prompt: a standing instruction that frames *who the model is being* and *what rules hold* for the whole conversation, before the user ever speaks. This is where persona, tone, output format, scope, and refusals belong — 'You are a careful tax assistant; cite the rule you used; if unsure, say so; never invent figures.' Because it sits at the top and the model was trained to weight it heavily, it shapes everything that follows.

But keep your expectations calibrated. The system prompt is a strong *preference*, not an unbreakable contract. A long or adversarial conversation can erode it, and a clever user input can sometimes talk the model past its rules. Treat it as the most reliable layer you have for steering behaviour, not as a security boundary — anything that truly must not happen needs guardrails *outside* the model, not just a stern sentence inside it.

Writing instructions a model can follow

Clarity beats cleverness. The model has no access to your intentions, only your words, so the single biggest lever in prompt engineering is removing ambiguity. Say exactly what you want, in what format, with what constraints, and what to do in the awkward cases. Vague asks ('summarise this') produce vague, unpredictable output; specific asks ('summarise this in three bullet points, each under fifteen words, for a non-technical reader') produce something you can actually rely on.

State the role and goal first, then the task — so the model reads its job before the details.
Prefer positive instructions ('respond in JSON with keys a, b') over negative ones ('do not ramble'); say what to do, not just what to avoid.
Separate instructions from data with clear delimiters (headings, triple quotes, XML-like tags) so the model never confuses one for the other.
Give it an out: tell it what to say when the answer is not in the provided material, so it declines instead of inventing.
Test on real, varied inputs and iterate; a prompt that nails one example often breaks on the second.

For tasks that need reasoning, a small structural change helps a lot: asking the model to work through its thinking before answering — chain-of-thought prompting — gives it room to compose intermediate steps, which markedly improves arithmetic, logic, and multi-step problems. The honest caveat is that the written-out reasoning is a *generated narrative*, not a faithful trace of internal computation; it can be plausible and still wrong, so check the answer, not just the story. We will go deeper on reasoning techniques in the next guide.

Limits, decoding, and good habits

No prompt overcomes the model's two hard ceilings. The first is the context window: everything — system prompt, examples, your data, the running conversation — must fit in a fixed budget of tokens, and what falls outside it simply does not exist for the model. The second is hallucination: a model will produce fluent, confident text even when it has no grounding, because fluency is what it was trained to optimise. Prompting can reduce both (be concise, ask it to cite sources, tell it to say 'I don't know'), but it cannot abolish them — that is what retrieval and grounding, covered later in this rung, are for.

One more knob sits beside the prompt: decoding. Settings like temperature control how randomly the model samples its next token. Low temperature makes output focused and repeatable — what you want for classification or extraction; higher temperature makes it more varied — useful for brainstorming or creative drafts. It is not part of the prompt text, but it shapes results just as much, and a 'flaky' prompt is sometimes really just a temperature set too high for the job.

Finally, treat prompts the way you treat code. Version them, test them on a fixed set of inputs, and change one thing at a time so you can tell what actually helped. Prompting is a real skill with real leverage — but it is the first tool to reach for, not the last. When a task genuinely needs fresh facts, reach for retrieval; when it needs a behaviour the base model just will not learn from examples, that is the signal to consider fine-tuning, which later guides in this rung weigh against prompting honestly.