Prompt Caching giảm chi phí API 90%

Prompt caching là gì?

Prompt caching cho phép cache phần system prompt và context không thay đổi giữa các request. Kết quả: giao dịch giảm tới 90% chi phí input token.

Cách hoạt động

Request đầu tiên: gửi system prompt đầy đủ → cache
Các request sau: chỉ gửi phần mới → đọc từ cache
Tiết kiệm: 90% input token cho phần cached

Ví dụ Python

Python

import openai

client = openai.OpenAI(
    base_url="https://api.izziapi.com/v1",
    api_key="izzi-xxx-your-key"
)

# System prompt dài (cached sau request đầu)
system_prompt = """You are a Python expert.
[... 2000 tokens system context ...]"""

# Request 1: Cache system prompt
response = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Fix this bug"}
    ]
)

# Request 2+: System prompt đã cached → 90% rẻ hơn
response = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Now add tests"}
    ]
)

So sánh chi phí

Scenario	Không cache	Có cache	Tiết kiệm
10 requests, 2K system tokens	$0.066	$0.010	85%
100 requests, 5K system tokens	$1.65	$0.18	89%
1000 requests, 10K system tokens	$33.00	$3.60	89%

Kết luận

Prompt caching là cách đơn giản nhất để giảm chi phí API AI. Izzi API hỗ trợ cache tự động — bạn chỉ cần giữ system prompt nhất quán.