
Ahmed Al-Dahle, vice president of Meta’s (formerly Facebook) generative AI department, recently issued a statement refuting the allegations against the company. The allegations were that Meta had manipulated some benchmark tests to show better performance while hiding the limitations of its Llama 4 AI model. At the same time, he also acknowledged user complaints that Llama 4 did not deliver the promised high-quality output.
What did Meta’s AI chief say?
Al-Dahle posted on X (Twitter), saying,
“We are seeing a lot of people getting great results from this model, but there are also reports of mixed quality in some services. Since we released the model as soon as it was ready, it may take a few days for all public implementations to be fixed.
He also clarified that the Meta team is still working on bug fixes and that users will have to wait some more time.
What is the ‘training on test set’ allegation?
Some critics claimed that Meta had “trained” its AI model “on a test set”, thereby falsely showing an improvement in benchmark performance. To this, Al-Dahle clarified,
“This claim is absolutely false, we would never do this.”
Training on a test set means that the AI model was trained on the same data on which it was to be tested later. This can make the model’s score look better than it is, which is considered unfair practice.
How did it start Controversy?
The matter heated up when a **former Meta employee wrote an online post claiming that he left Meta due to the company’s “grey benchmarking practices” (questionable testing methods). Although the post was not confirmed, it raised questions in the AI community.
Did the Maverick model make false claims?
Meta recently claimed its Maverick model was better than OpenAI’s GPT-4 and close to Google’s Gemini 2.5 Pro. But when testers checked it, the results did not match Meta’s claims.
It was later discovered that the model Meta submitted to LMArena (AI leaderboard) was different from the public version. Meta described it as “Llama-4-Maverick-03-26-Experimental”, which was “optimized for chat”.
Conclusion: Can Meta be trusted?
- There is excitement about the capabilities of Meta’s AI models, but the lack of transparency raises questions.
- The company says it is working on bug fixes, and better results can be expected soon.
- The AI community will now keep a close eye on Meta’s benchmarking methods.
What is Meta’s Llama 4 AI model?
Llama 4 is a generative AI model created by Meta (formerly Facebook) used for text generation, chatbots, and other NLP tasks. It is a competitor to models like OpenAI’s GPT-4 and Google Gemini.
What are the allegations against Meta?
The allegation is that Meta deliberately “trained on the test set” to show better performance in benchmark tests of Llama 4, which is considered an unethical AI practice.
What is “training on the test set”?
It means that the AI model is trained on the same data that it is later tested on. This can make the model’s scores look unnecessarily high, which is less than its actual capability.
How did Meta respond to these allegations?
Ahmed Al-Dahle, the head of Meta’s generative AI department, dismissed the allegations and said,
“This claim is completely false, we would never do this.”
He also admitted that some users are having issues with the output quality, and the company is still fixing bugs.
What was the controversy over the Maverick model?
Meta claimed that the Maverick model was better than GPT-4o and close to Gemini 2.5. But when some users tested its public version, the performance did not match those claims. It was later revealed that Meta had used a different, experimental version in benchmarking.
Is Meta’s transparency being questioned?
Yes. A post by a former Meta employee mentioning “grey benchmarking practices” has raised doubts about transparency in the AI community.
Can Meta’s AI models be trusted?
Meta has delivered technically powerful models so far, but recent events have raised transparency and trust issues. The company says it is working on improvements.
Will Llama 4 perform better in the future?
According to Meta, the model was released as soon as it was ready. Now, they are working on bug fixes and fine-tuning. Performance improvements are expected in the coming days.
Read more:
Google and Reddit’s new partnership: Better search and answer features through AI!