DeepSeek releases visual primitive reasoning method to enhance multimodal complex reasoning capabilities.

April 30, 2026

CoinFeed News

CoinFeed reported on April 30th that DeepSeek has proposed a "Visual Primitives" method, which addresses the reference gap problem in multimodal tasks by embedding basic visual units such as points and boxes into the inference chain. This method is based on the DeepSeek-V4-Flash architecture and achieves low image token consumption through compressed key-value caching. In counting and spatial inference benchmarks, its performance is comparable to GPT-5.4, Claude-Sonnet-4.6, and Gemini-3-Flash (limited to certain dimensions). The team stated that they will open-source some benchmarks and data in the future, and the model weights will be released after integration.

DeepSeek releases visual primitive reasoning method to enhance multimodal complex reasoning capabilities.

Share this article

Share Article