BenchmarkJuly 3, 20264 min read

Bridgewater's AI Breakthrough: Custom Model Outperforms GPT and Claude in Finance Tests

A custom AI model developed by Bridgewater and Thinking Machines Lab has achieved an impressive 85% accuracy in finance tests, surpassing leading commercial models like GPT and Claude. This breakthrough demonstrates the potential for companies to create powerful AI solutions using their own data, without relying on large providers.

In a significant development, a fine-tuned open-source AI model has been shown to outperform leading commercial models in evaluating financial documents. The model, known as Qwen3-235B, was developed by Bridgewater and Thinking Machines Lab, and has achieved an impressive 85% accuracy in tests. This is a substantial improvement over the 50% accuracy achieved by variants of Gemini, Claude, and GPT, even with expert-written instructions and a three-tier rating system. The Qwen3-235B model is also 14 times cheaper to operate than its commercial counterparts, making it a highly attractive option for companies looking to automate financial document analysis.

The development of the Qwen3-235B model is a significant milestone in the field of AI research, as it demonstrates the potential for companies to create powerful AI solutions using their own data. By fine-tuning an open-source model with proprietary examples and expert judgment, Bridgewater and Thinking Machines Lab have been able to create a model that is highly effective in evaluating financial documents. This approach has several advantages over relying on large providers, including greater control over the model's development and deployment, as well as the ability to tailor the model to specific business needs.

The Qwen3-235B model was tested on a range of tasks, including deciding whether a financial article is relevant to an executive, and whether a central bank document signals the direction of future rate changes. The model's performance was evaluated using a range of metrics, including accuracy, precision, and recall. The results show that the Qwen3-235B model is highly effective in evaluating financial documents, and is able to outperform leading commercial models in a range of tasks.

The implications of this breakthrough are significant, both for companies and for the broader AI research community. For companies, the development of the Qwen3-235B model demonstrates the potential for creating powerful AI solutions using their own data, without relying on large providers. This approach can help companies to reduce costs, improve efficiency, and gain a competitive advantage in their respective markets. For the broader AI research community, the Qwen3-235B model represents a significant advance in the field of AI research, and demonstrates the potential for fine-tuning open-source models to achieve state-of-the-art performance.

In historical context, the development of the Qwen3-235B model represents a significant improvement over previous versions of AI models. Earlier models, such as GPT and Claude, were limited by their reliance on publicly available data, and were unable to achieve the same level of performance as the Qwen3-235B model. The Qwen3-235B model's use of proprietary examples and expert judgment has allowed it to achieve a higher level of accuracy and effectiveness, and demonstrates the potential for companies to create highly effective AI solutions using their own data.

The competitive context of the Qwen3-235B model is also significant, as it represents a challenge to the dominance of large providers in the AI market. Companies such as OpenAI and Google have traditionally been the leaders in the development of AI models, but the Qwen3-235B model demonstrates the potential for smaller companies and research organizations to create highly effective AI solutions. This could lead to a more competitive market for AI models, with a greater range of options available to companies and developers.

In practical terms, the Qwen3-235B model has the potential to make a significant impact on the way that companies evaluate financial documents. By automating the process of evaluating financial documents, companies can reduce costs, improve efficiency, and gain a competitive advantage in their respective markets. The model's high level of accuracy and effectiveness also makes it a highly attractive option for companies looking to reduce the risk of human error, and to improve the overall quality of their financial analysis.

Overall, the development of the Qwen3-235B model is a significant breakthrough in the field of AI research, and demonstrates the potential for companies to create powerful AI solutions using their own data. As the AI market continues to evolve, it is likely that we will see more companies and research organizations developing custom AI models, and the Qwen3-235B model represents an important milestone in this process. For AI model users and developers, this breakthrough matters because it shows that custom models can outperform commercial ones, and that the key to success lies in fine-tuning and proprietary data.

Models Mentioned

Anthropic: Claude Opus 4.7 (Fast)

GPT-4.1

Browse Models Compare All News

Bridgewater's AI Breakthrough: Custom Model Outperforms GPT and Claude in Finance Tests

Models Mentioned

AI-Powered Students See 24% Drop in Exam Scores After Two Years

Explore