What looks to be one of the first “reasoning” AI models to compete with OpenAI’s o1 has been shown by a Chinese lab.
A preview of DeepSeek-R1, an AI research startup backed by quantitative traders, was made public on Wednesday. According to the company, DeepSeek-R1 is a reasoning model that can compete with o1.
Reasoning models, in contrast to most models, take more time to think through a question or query in order to adequately fact-check themselves. By doing this, they are able to steer clear of some of the common mistakes that models make.
As with o1, DeepSeek-R1 comes up with an answer by reasoning through tasks, planning ahead, and carrying out a sequence of actions. It may take some time. Similar to o1, DeepSeek-R1 may “think” for tens of seconds before responding, depending on how complicated the question is.
According to DeepSeek, on two well-known AI benchmarks, AIME and MATH, DeepSeek-R1 (or, more specifically, DeepSeek-R1-Lite-Preview) performs similarly to OpenAI’s o1-preview model. MATH is a set of word problems, whereas AIME assesses a model’s performance using other AI models. However, the model isn’t flawless. According to certain X critics, DeepSeek-R1 (as well as o1) has trouble with tic tac toe and other logic difficulties.
Additionally, DeepSeek is easily jailbroken, meaning that it can be encouraged to disregard security measures. The model provided a comprehensive meth recipe to one X user.
The Chinese government’s pressure on regional AI programs is probably the cause of the conduct. China’s internet regulator must benchmark models to make sure their answers “embody core socialist values.” Many Chinese AI systems refuse to reply to subjects that could enrage regulators since the government has reportedly gone so far as to suggest a blacklist of sources that cannot be utilized to train models.
The increased focus on reasoning models coincides with a reexamination of the validity of “scaling laws,” which are long-held beliefs that a model’s capabilities would continuously rise if it were given additional data and processing power. Numerous news stories indicate that models from prominent AI laboratories, such as OpenAI, Google, and Anthropic, aren’t making as much progress as they used to.
New AI concepts, systems, and development processes are in high demand as a result. The first is test-time compute, which supports DeepSeek-R1 and o1 models. In essence, test-time compute, sometimes referred to as inference compute, allows models additional processing time to do jobs.
During a keynote address at Microsoft’s Ignite conference this week, Microsoft CEO Satya Nadella made reference to test-time compute and stated, “We are seeing the emergence of a new scaling law.”
An odd move is DeepSeek’s announcement that it intends to expose an API and open source DeepSeek-R1. High-Flyer Capital Management, a Chinese quantitative hedge fund that bases its trading decisions on artificial intelligence, is supporting it.
The general-purpose text-and image-analyzing DeepSeek-V2 model, one of DeepSeek’s original models, compelled rivals like ByteDance, Baidu, and Alibaba to lower the usage fees for some of their models and make others entirely free.
For model training, High-Flyer constructs its own server clusters; the latest one apparently costs 1 billion yen (~$138 million) and contains 10,000 Nvidia A100 GPUs. High-Flyer was founded by computer science graduate Liang Wenfeng with the goal of creating “superintelligent” AI through its DeepSeek organization.