Technology

Gen AI without the dangers

Published

1 year ago

November 28, 2023

Komal

It’s understandable that ChatGPT, Stable Diffusion, and DreamStudio-Generative AI are making headlines. The outcomes are striking and getting better geometrically. Already, search and information analysis, as well as code creation, network security, and article writing, are being revolutionized by intelligent assistants.

Gen AI will play a critical role in how businesses run and provide IT services, as well as how business users complete their tasks. There are countless options, but there are also countless dangers. Successful AI development and implementation can be a costly and risky process. Furthermore, the workloads associated with Gen AI and the large language models (LLMs) that drive it are extremely computationally demanding and energy-intensive.Dr. Sajjad Moazeni of the University of Washington estimates that training an LLM with 175 billion or more parameters requires an annual energy expenditure for 1,000 US households, though exact figures are unknown. Over 100 million generative AI questions answered daily equate to one gigawatt-hour of electricity use, or about 33,000 US households’ daily energy use.

How even hyperscalers can afford that much electricity is beyond me. It’s too expensive for the typical business. How can CIOs provide reliable, accurate AI without incurring the energy expenses and environmental impact of a small city?

Six pointers for implementing Gen AI economically and with less risk

Retraining generative AI to perform particular tasks is essential to its applicability in business settings. Expert models produced by retraining are smaller, more accurate, and require less processing power. So, in order to train their own AI models, does every business need to establish a specialized AI development team and a supercomputer? Not at all.

Here are six strategies to create and implement AI without spending a lot of money on expensive hardware or highly skilled personnel.

Start with a foundation model rather than creating the wheel.

A company might spend money creating custom models for its own use cases. But the expenditure on data scientists, HPC specialists, and supercomputing infrastructure is out of reach for all but the biggest government organizations, businesses, and hyperscalers.

Rather, begin with a foundation model that features a robust application portfolio and an active developer ecosystem. You could use an open-source model like Meta’s Llama 2, or a proprietary model like OpenAI’s ChatGPT. Hugging Face and other communities provide a vast array of open-source models and applications.

Align the model with the intended use

Models can be broadly applicable and computationally demanding, such as GPT, or more narrowly focused, like Med-BERT (an open-source LLM for medical literature). The time it takes to create a viable prototype can be shortened and months of training can be avoided by choosing the appropriate model early in the project.

However, exercise caution. Any model may exhibit biases in the data it uses to train, and generative AI models are capable of lying outright and fabricating responses. Seek models trained on clean, transparent data with well-defined governance and explicable decision-making for optimal trustworthiness.

Retrain to produce more accurate, smaller models

Retraining foundation models on particular datasets offers various advantages. The model sheds parameters it doesn’t need for the application as it gets more accurate on a smaller field. One way to trade a general skill like songwriting for the ability to assist a customer with a mortgage application would be to retrain an LLM in financial information.

With a more compact design, the new banking assistant would still be able to provide superb, extremely accurate services while operating on standard (current) hardware.

Make use of your current infrastructure

A supercomputer with 10,000 GPUs is too big for most businesses to set up. Fortunately, most practical AI training, retraining, and inference can be done without large GPU arrays.

Training up to 10 billion: at competitive price/performance points, contemporary CPUs with integrated AI acceleration can manage training loads in this range. For better performance and lower costs, train overnight during periods of low demand for data centers.
Retraining up to 10 billion models is possible with modern CPUs; no GPU is needed, and it takes only minutes.
With integrated CPUs, smaller models can operate on standalone edge devices, with inferencing ranging from millions to less than 20 billion. For models with less than 20 billion parameters, such as Llama 2, CPUs can respond as quickly and precisely as GPUs.

Execute inference with consideration for hardware

Applications for inference can be fine-tuned and optimized for improved performance on particular hardware configurations and features. Similar to training a model, optimizing one for a given application means striking a balance between processing efficiency, model size, and accuracy.

One way to increase inference speeds four times while maintaining accuracy is to round down a 32-bit floating point model to the nearest 8-bit fixed integer (INT8). Utilizing host accelerators such as integrated GPUs, Intel® Advanced Matrix Extensions (Intel® AMX), and Intel® Advanced Vector Extensions 512 (Intel® AVX-512), tools such as Intel® Distribution of OpenVINOTM toolkit manage optimization and build hardware-aware inference engines.

Monitor cloud utilization

A quick, dependable, and expandable route is to offer AI services through cloud-based AI applications and APIs. Customers and business users alike benefit from always-on AI from a service provider, but costs can rise suddenly. Everyone will use your AI service if it is well-liked by all.

Many businesses that began their AI journeys entirely in the cloud are returning workloads to their on-premises and co-located infrastructure that can function well there. Pay-as-you-go infrastructure-as-a-service is becoming a competitive option for cloud-native enterprises with minimal or no on-premises infrastructure in comparison to rising cloud costs.

You have choices when it comes to Gen AI. Generative AI is surrounded by a lot of hype and mystery, giving the impression that it’s a cutting-edge technology that’s only accessible to the wealthiest companies. Actually, on a typical CPU-based data center or cloud instance, hundreds of high-performance models, including LLMs for generative AI, are accurate and performant. Enterprise-grade generative AI experimentation, prototyping, and deployment tools are rapidly developing in both open-source and proprietary communities.

By utilizing all of their resources, astute CIOs can leverage AI that transforms businesses without incurring the expenses and hazards associated with in-house development.

Up Next

A security executive at Microsoft refers to generative AI as a “super power” in the industry

Don't Miss

According to a senior Google executive, the AI legal framework must foster innovation

Komal

Technology

Microsoft Expands Copilot Voice and Think Deeper

Published

1 month ago

February 25, 2025

Archana Suryawanshi

Microsoft Expands Copilot Voice and Think Deeper

Microsoft is taking a major step forward by offering unlimited access to Copilot Voice and Think Deeper, marking two years since the AI-powered Copilot was first integrated into Bing search. This update comes shortly after the tech giant revamped its Copilot Pro subscription and bundled advanced AI features into Microsoft 365.

What’s Changing?

Microsoft remains committed to its $20 per month Copilot Pro plan, ensuring that subscribers continue to enjoy premium benefits. According to the company, Copilot Pro users will receive:

Preferred access to the latest AI models during peak hours.
Early access to experimental AI features, with more updates expected soon.
Extended use of Copilot within popular Microsoft 365 apps like Word, Excel, and PowerPoint.

The Impact on Users

This move signals Microsoft’s dedication to enhancing AI-driven productivity tools. By expanding access to Copilot’s powerful features, users can expect improved efficiency, smarter assistance, and seamless integration across Microsoft’s ecosystem.

As AI technology continues to evolve, Microsoft is positioning itself at the forefront of innovation, ensuring both casual users and professionals can leverage the best AI tools available.

Stay tuned for further updates as Microsoft rolls out more enhancements to its AI offerings.

Technology

Google Launches Free AI Coding Tool for Individual Developers

Published

1 month ago

February 25, 2025

Archana Suryawanshi

Google Launches Free AI Coding Tool for Individual Developers

Google has introduced a free version of Gemini Code Assistant, its AI-powered coding assistant, for solo developers worldwide. The tool, previously available only to enterprise users, is now in public preview, making advanced AI-assisted coding accessible to students, freelancers, hobbyists, and startups.

More Features, Fewer Limits

Unlike competing tools such as GitHub Copilot, which limits free users to 2,000 code completions per month, Google is offering up to 180,000 code completions—a significantly higher cap designed to accommodate even the most active developers.

“Now anyone can easily learn, generate code snippets, debug, and modify applications without switching between multiple windows,” said Ryan J. Salva, Google’s senior director of product management.

AI-Powered Coding Assistance

Gemini Code Assist for individuals is powered by Google’s Gemini 2.0 AI model and offers:
Auto-completion of code while typing
Generation of entire code blocks based on prompts
Debugging assistance via an interactive chatbot

The tool integrates with popular developer environments like Visual Studio Code, GitHub, and JetBrains, supporting a wide range of programming languages. Developers can use natural language prompts, such as:
“Create an HTML form with fields for name, email, and message, plus a submit button.”

With support for 38 programming languages and a 128,000-token memory for processing complex prompts, Gemini Code Assist provides a robust AI-driven coding experience.

Enterprise Features Still Require a Subscription

While the free tier is generous, advanced features like productivity analytics, Google Cloud integrations, and custom AI tuning remain exclusive to paid Standard and Enterprise plans.

With this move, Google aims to compete more aggressively in the AI coding assistant market, offering developers a powerful and unrestricted alternative to existing tools.

Technology

Elon Musk Unveils Grok-3: A Game-Changing AI Chatbot to Rival ChatGPT

Published

2 months ago

February 19, 2025

Archana Suryawanshi

Elon Musk Unveils Grok-3: A Game-Changing AI Chatbot to Rival ChatGPT

Elon Musk’s artificial intelligence company xAI has unveiled its latest chatbot, Grok-3, which aims to compete with leading AI models such as OpenAI’s ChatGPT and China’s DeepSeek. Grok-3 is now available to Premium+ subscribers on Musk’s social media platform x (formerly Twitter) and is also available through xAI’s mobile app and the new SuperGrok subscription tier on Grok.com.

Advanced capabilities and performance

Grok-3 has ten times the computing power of its predecessor, Grok-2. Initial tests show that Grok-3 outperforms models from OpenAI, Google, and DeepSeek, particularly in areas such as math, science, and coding. The chatbot features advanced reasoning features capable of decomposing complex questions into manageable tasks. Users can interact with Grok-3 in two different ways: “Think,” which performs step-by-step reasoning, and “Big Brain,” which is designed for more difficult tasks.

Strategic Investments and Infrastructure

To support the development of Grok-3, xAI has made major investments in its supercomputer cluster, Colossus, which is currently the largest globally. This infrastructure underscores the company’s commitment to advancing AI technology and maintaining a competitive edge in the industry.

New Offerings and Future Plans

Along with Grok-3, xAI has also introduced a logic-based chatbot called DeepSearch, designed to enhance research, brainstorming, and data analysis tasks. This tool aims to provide users with more insightful and relevant information. Looking to the future, xAI plans to release Grok-2 as an open-source model, encouraging community participation and further development. Additionally, upcoming improvements for Grok-3 include a synthesized voice feature, which aims to improve user interaction and accessibility.

Market position and competition

The launch of Grok-3 positions xAI as a major competitor in the AI chatbot market, directly challenging established models from OpenAI and emerging competitors such as DeepSeek. While Grok-3’s performance claims are yet to be independently verified, early indications suggest it could have a significant impact on the AI landscape. xAI is actively seeking $10 billion in investment from major companies, demonstrating its strong belief in their technological advancements and market potential.