As US-based technology company OpenAI released a text-to-video model, Sora, last month, raising the stakes in the global AI race, more Chinese enterprises are creating lightweight large language models.
A smaller large model, sometimes referred to as a lightweight model, is essentially one that needs fewer parameters. In comparison to large models, this implies that their ability to process and produce text will be restricted.
To put it simply, these compact models are similar to these large models, which are similar to luxury sport utility vehicles.
The Chinese AI startup ModelBest Inc. garnered significant attention in the industry in February when it unveiled its most recent lightweight large model.
Called MiniCPM-2B, the model has an embedded capacity of 2 billion parameters, a far smaller number than the 1.7 trillion parameters that OpenAI’s massive GPT-4.0 can process.
With 2.7 billion parameters, Phi-2 is a tiny language model that can understand language and make common sense decisions. It was released in December by the US tech giant Microsoft.
The new model performs similarly to the Mistral-7B from the French AI company Mistral on publicly available general benchmarks, according to ModelBest CEO Li Dahai. However, Mistral’s model performs better in arithmetic, coding, and Chinese. Li said that with parameters at the 10-billion level, its overall performance outperforms some peer large models.
“Both large and smaller large models have their advantages, depending on the specific requirements of a task and their constraints, but Chinese companies may find a way out to leverage small models amid an AI boom,” Li added.
In an earlier interview, Zhou Hongyi, the founder and chairman of 360 Security Technology and a participant in the current two sessions of the 14th National Committee of the Chinese People’s Political Consultative Conference, stated that it might be difficult to develop a large model that is universal and better than GPT-4.0 at this time.
According to him, GPT-4.0 “knows everything, it is not specialized” at this point.
“If we can excel in a particular business domain by training a model with unique business data and integrating it with many business tools within that sector, such a model will not only have intelligence, but also possess unique knowledge, even hands and feet,” he explained.
According to Li, there will be significant commercial value for such a lightweight model if it can be implemented in various industries.
“If the model is compressed, it will require fewer calculations to operate, which also means less powerful processors and less time to complete responses,” Li explained.
“With the popularity of such end-side models, the inference cost of more electronic devices, such as mobile phones, will further decrease in the future,” he stated.