Guides

Practical tips for getting more out of LLMs while spending less.

Designing for Prompt Cache Hits: How to Save 90% on LLM Input Tokens

Prompt cache reads cost 10x less than regular input tokens. Learn how to structure your prompts to maximise cache hit rates and slash your LLM costs.

Last updated June 14, 2026

LLM Token Optimization Strategies: The Complete Guide for 2026

A comprehensive guide to LLM token optimization. Learn the strategies that actually reduce costs — from context engineering to model routing to prompt caching.

Last updated June 14, 2026

How to Measure and Monitor LLM Token Usage (Before You Can Optimise It)

You can't optimise what you can't measure. Learn how to track LLM token usage with built-in tools, cost APIs, and monitoring patterns that reveal where your tokens actually go.

Last updated June 14, 2026

5 Ways to Reduce Your LLM API Costs Today

Practical, immediately actionable strategies to cut your LLM token spend without sacrificing output quality.

Last updated June 14, 2026

How to Reduce OpenAI and Claude API Token Costs: A Developer's Guide

Practical techniques to reduce your OpenAI and Claude API costs. Covers pricing tiers, prompt caching, structured outputs, model routing, and the API features that save money.

Last updated June 14, 2026

Cut MCP and Tool Overhead to Save Thousands of LLM Tokens Per Request

Tool definitions and MCP servers can add 55K–134K tokens of overhead before any work starts. Learn how on-demand tool loading can cut that by 85%.

Last updated June 14, 2026

Token-Efficient Prompting Patterns: Chain of Draft, Output Formats, and Prompt Compression

Modern prompting techniques that dramatically reduce token usage. Chain of Draft cuts reasoning tokens by 92%. Output format choices can halve your token count. Here's how.

Last updated June 14, 2026

Claude Code: How to Get More Done With Fewer Tokens

Specific techniques for using Claude Code more efficiently — better prompts, smarter context management, and workflow tips.

Last updated June 14, 2026

Context Engineering: Why Reducing LLM Token Usage Isn't About Shorter Prompts

The biggest source of wasted LLM tokens isn't your prompt — it's your context. Learn how session management, just-in-time retrieval, and repo memory cut token usage dramatically.

Last updated June 14, 2026