LLM crypto trading contest finds LLMs can’t trade crypto

Four out of six large language models (LLMs) ended in losses after competing in the 'Alpha Arena' crypto trading competition, with OpenAI's ChatGPT losing 63% of its funds. The competition, created by Nof1, involved various LLMs trading crypto under the same constraints for over two weeks. ChatGPT, Google's Gemini, X's Grok, and Anthropic's Claude Sonnet all closed below their starting $10,000, while only Alibaba's QWEN3 MAX and High-Flyer's DeepSeek achieved profits of $2,232 and $489, respectively. LLMs were challenged by trading costs and limited prompts designed to test their abilities. Organizer Jay Azhang highlighted the models' 'consistent biases' and intends to hold another refined competition in the future.

Nov 4

3 min read

Source:protos.com

Layer-1

Centralized Exchange

SEC Alleged Securities

LLM crypto trading contest finds LLMs can’t trade crypto

Introduction to the Alpha Arena Crypto Trading Competition

Four out of six major Large Language Models (LLMs) ended up in the red in a crypto trading competition called the “Alpha Arena.” The competition was organized by Nof1 and involved popular LLMs trading cryptocurrencies based on the same set of prompts over a duration of just over two weeks. Unfortunately, the final results were disappointing, with OpenAI’s ChatGPT leading the losses after losing 63% of its funds.

Performance Breakdown

During the competition, ChatGPT, Google’s Gemini, X’s Grok, and Anthropic’s Claude Sonnet all ended with less than the $10,000 they initially started with. Specifically:

ChatGPT lost $6,267
Gemini lost $5,671
Grok lost $4,531
Claude Sonnet lost $3,081

These models were notably more inclined to take short positions, whereas Claude Sonnet “rarely” shorted. Meanwhile, only two models saw profits: High-Flyer’s DeepSeek made $489, and Alibaba’s QWEN3 MAX led with a profit of $2,232.

Trading Statistics and Costs

The competition saw contrasting trading behaviors among the LLMs:

Gemini performed 238 trades, the highest number of trades overall.
Claude Sonnet, however, made only 38 trades.

Trading costs were a significant factor, impacting profits across the board. QWEN3 MAX incurred $1,654 in trading fees, while Gemini paid $1,331, further reducing its performance. The overall 'win rate' for all LLMs ranged between 25% and 30%. Nof1 remarked that early rounds saw profit and loss heavily impacted by over-trading and trading fees.

High Points and Persistent Challenges

On October 27, both QWEN3 MAX and DeepSeek reached their peak, doubling their initial investments. Claude and Grok briefly entered the positive territory as well. However, ChatGPT and Gemini remained in the red for nearly the entire competition. Such patterns reflected some consistent biases in how each LLM approached trading.

Insights and Future Improvements

The competition's organizer, Jay Azhang, explained the inherent challenges faced by the LLMs. He stated that LLMs struggle with numerical time series data, which was the primary context provided for the trading activities. The competition also imposed strict constraints, such as a limited asset universe, strict rules, and a finite action-space.

Despite these challenges, Azhang remains optimistic about improving future iterations of the competition. He mentioned plans to incorporate better prompts and more statistical rigor to give the models a fairer chance to demonstrate their capabilities.

Conclusion

While the Alpha Arena competition highlighted significant limitations in current LLMs for crypto trading, it also showcased their potential and unique “investing personalities.” Nof1 plans to organize another iteration of the competition with improved methodologies, aiming to push the boundaries of what AI-powered trading can achieve in the financial world.

More News

•