A WSU-led study found that while AI like ChatGPT performs well on multiple-choice financial licensing exams, it struggles with complex, specific tasks. The findings suggest AI is better suited as an assistance tool rather than a replacement for finance professionals.
As artificial intelligence systems like ChatGPT increasingly permeate various industries, a recent study led by Washington State University highlights significant limitations in AI’s capability to replace human financial professionals.
The study, published in the Financial Analysts Journal, analyzed over 10,000 responses from AI language models BARD, Llama and ChatGPT to financial exam questions.
The researchers — Donald (DJ) Fairhurst, an associate professor of finance at WSU’s Carson College of Business, and Daniel Greene, the Bill Short Associate Professor of Finance at Clemson’s Wilbur O. and Ann Powers College of Business — sought not just the right answers but also detailed explanations to gauge the AI’s comprehension and reasoning compared to human experts.
“Passing certification exams is not enough. We really need to dig deeper to get to what these models can really do,” Fairhurst said in a news release.
Although ChatGPT, particularly its paid version 4.0, demonstrated superior performance in providing accurate and human-like answers, it still fell short in more specialized scenarios.
“For broad concepts where there have been good explanations on the internet for a long time, ChatGPT can do a very good job at synthesizing those concepts. If it’s a specific, idiosyncratic issue, it’s really going to struggle,” Fairhurst added.
The study utilized questions from various licensing exams, including the Series 6, 7, 65 and 66, which mirror the real-world tasks performed by financial professionals. The AI models showed high accuracy in areas like securities transactions and market trend monitoring. However, they struggled with more complex issues, such as determining clients’ insurance coverage and tax status.
Fairhurst and Greene also experimented with fine-tuning ChatGPT 3.5 by providing it with examples of correct responses and explanations, which significantly improved its accuracy, rivaling that of ChatGPT 4.0.
The research continues as the team, including WSU doctoral student Adam Bozman, explores other financial tasks such as evaluating potential merger deals. Given that ChatGPT’s training data only goes up to September 2021, they are testing it against known outcomes of deals made after that date. Early results suggest that the AI model underperforms in this domain, reaffirming the notion that AI is not yet ready for complex financial decision-making.
The implications of these findings suggest that AI, while a powerful tool, is best utilized to assist experienced financial professionals rather than replace them.
“It’s far too early to be worried about ChatGPT taking finance jobs completely,” added Fairhurst.
However, the rise of AI could transform the structure of entry-level roles in investment banks.
“The practice of bringing a bunch of people on as junior analysts, letting them compete and keeping the winners – that becomes a lot more costly. So it may mean a downturn in those types of jobs, but it’s not because ChatGPT is better than the analysts, it’s because we’ve been asking junior analysts to do tasks that are more menial,” Fairhurst added.
As AI continues to evolve, the conversation around its role in the financial sector will undoubtedly persist. This study marks a critical step in understanding both the potential and the limitations of AI technologies like ChatGPT in specialized professions.