The debut of S&P AI Benchmarks by Kensho was discreetly announced by S&P Global, a prominent supplier of financial intelligence, on Wednesday. Large language models (LLMs) are used in complex financial and quantitative applications, and this creative solution seeks to establish a new benchmark for assessing LLM performance.
Created by Kensho, S&P Global’s AI-focused division, the benchmarking tool evaluates an LLM’s proficiency in tasks like quantitative reasoning, extracting data from financial documents, and proving domain-specific knowledge. Each model’s capabilities are transparently shown after the results are shown on a leaderboard.
According to Bhavesh Dayalji, CEO of Kensho and Chief AI Officer for S&P Global, “S&P AI Benchmarks combined Kensho’s cutting-edge AI research and engineering with S&P Global’s leading financial intelligence capabilities” in a VentureBeat interview. “We hope that the solution spurs more innovation in the FinAI space and becomes the industry standard for understanding how LLMs perform on complex financial reasoning.”
As more financial services organizations investigate the possibilities of generative AI and LLMs to improve efficiency and gain a competitive edge, the introduction of S&P AI Benchmarks is timely. Unfortunately, organizations find it difficult to evaluate which models are best suited for their particular use cases due to the absence of established benchmarks.
Encouraging Creativity and Thoughtful Judgment
According to Dayalji, “benchmark solutions like ours are critical to helping institutions and professionals across our industry determine which LLMs they should be using for their specific use cases.” In addition, “we think that S&P AI Benchmarks will spur innovation by assisting financial professionals in determining where each model excels and where it can bring the greatest benefit.”
Engineers, researchers, academics, and financial professionals from all of S&P Global’s divisions made up the diverse team of experts who developed and validated the S&P AI Benchmarks methodology. An LLM’s performance in three important categories is rigorously tested by the 600 questions that make up the evaluation set.
A Milestone for AI Adoption in Finance
The introduction of S&P AI Benchmarks, according to industry analysts, may represent a critical turning point in the financial sector’s adoption of AI. With increasingly sophisticated AI influencing the financial sector, companies will need a trustworthy and transparent benchmarking tool to help them choose which models to use. The FinAI space may see innovation spurred by S&P Global’s solution, which could also hasten the responsible adoption of LLMs.
S&P Global believes that S&P AI Benchmarks will be instrumental in influencing how AI will develop in the financial services industry going forward. “Our goal is for LLMs to become more efficient and better tailored to the demands of all the industries we serve, and our solutions will help us get there,” Dayalji stated. “In order for us to keep improving our framework, we encourage participation from all model providers.”
Tools like S&P AI Benchmarks by Kensho, which help organizations harness the power of AI and generative AI while ensuring accuracy, transparency, and responsible deployment, are poised to become indispensable guides as the financial industry navigates the rapidly evolving landscape of these technologies.