Up to 90% cost reduction with the new disk-based caching technology!
DeepSeek API Introduces Disk-Based Context Caching
Original: DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs
Importance: 新機能が多くのユーザーに影響を与えるため。
Summary
The DeepSeek API introduces disk-based context caching, allowing for the detection of duplicate user inputs to reduce service latency and cut costs by up to 90%. Users can expect quicker responses even with long inputs. This feature is available to all users automatically, requiring no special code changes.
Key Points
- Introduction of disk-based context caching
- Detects duplicate inputs to avoid recomputation
- Potential cost reduction of up to 90%
- Quick responses even with long inputs
- Automatically available to all users, no code changes needed
View developer notes (APIs, breaking changes, migration)
The newly implemented disk-based context caching technology in the DeepSeek API detects duplicate user inputs and reuses cached parts to significantly reduce latency. Users can leverage cache hits for requests with identical prefixes, potentially saving up to 90% on costs. New fields in the API response allow monitoring of cache performance.
Source: https://api-docs.deepseek.com/news/news0802
Outlet: DeepSeek Release Notes
This article is an AI-generated summary (OpenAI GPT-4o-mini) of publicly available information from Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, Sakana, and other vendors. The original source URL is always provided in accordance with fair-use citation requirements. Summaries are AI-generated and may contain mistranslations or misinterpretations. Always verify details with the original source.