The Prompt Engineer

Prompt engineering techniques, system prompt patterns, and LLM benchmarks — practical guides for developers who talk to machines for a living.

Why Your Prompt Works 80% of the Time

adaptive-promptinginstance-adaptiveprompt-optimizationzero-shot-cotproduction-llmprompt-routing

You spent three days on that system prompt. Ran it through eval suites, tuned the wording, squeezed out every last percentage point. Hit 87% accuracy on your test set. Shipped it. And then the support

Model Routing Is the Prompt Trick Nobody Talks About

prompt-routingmodel-selectioncost-optimizationllm-routingproduction-llmfinerouter

Most prompt engineering advice assumes you've already picked a model. You tune the wording, adjust the temperature, add few-shot examples — all to coax better output from one fixed endpoint. But t

You Don't Have to Beg for JSON Anymore

structured-outputconstrained-decodingjson-schemaproduction-llmbenchmarksopenai

I spent three months in 2024 building retry logic for a pipeline that extracted product data from GPT-4. The model returned valid JSON about 94% of the time — sounds fine until you do the math on 50,0

Cache-Shaped Prompts

prompt-cachingprompt-structurecost-optimizationagentic-systemsanthropicopenai

Someone analyzed 3,007 Claude Code sessions and found a ratio that broke my brain: for every fresh token sent to the API, 525 tokens were served from cache. The total? 12.2 billion cached tokens again

A Penny Per Jailbreak

prompt-fuzzingjailbreakllm-securityred-teamingguardrailsai-safety

It costs roughly one cent to jailbreak GPT-4o. Not with some hand-crafted prompt that took a red team weeks to develop — with an automated fuzzer that runs in about 60 seconds and succeeds 99% of the

Portable Prompts Are a Lie

prompt-portabilitymodel-driftingcross-modelprompt-optimizationpromptbridgemulti-model

I spent two days last month migrating a production extraction pipeline from GPT-4o to Claude. The prompts were clean. They'd been through three rounds of eval tuning. Every edge case was handled.

Your AI Safety Judge Has a Markdown Problem

prompt-injectionai-safetyguardrailsllm-securityred-teaming

Turns out the thing that breaks your AI safety filter isn't some elaborate multi-turn social engineering attack. It's a newline character. Maybe a markdown header. Perhaps a humble list marker

Your Prompt Is Fine. Your Context Is Rotting.

context-windowcontext-rotprompt-optimizationmulti-turnlost-in-the-middlebenchmarks

You've been debugging your prompt for an hour. You've tried different phrasings, added examples, restructured the whole thing. The model still gives garbage. Here's a thought: maybe the pr

Stop Telling Your Model to Think Step by Step

reasoning-modelschain-of-thoughtprompt-anti-patternsopenaianthropiccontext-engineering

The single most repeated piece of prompt engineering advice from 2023 is now actively degrading your outputs. "Think step by step." Wei et al.'s 2022 chain-of-thought paper showed it cou