Skip to content

The News Integrity in AI Toolkit: The challenges of evaluating LLM factuality

June 16, 2026 point

In the afternoon of the second day of the 14th Point Conference, the Backstage of Dom mladih hosted a workshop titled “The News Integrity in AI Toolkit: The challenges of evaluating LLM factuality.”

The workshop was held by Hicham Yezza, Principal Data Scientist for Responsible AI at the BBC, working on evaluation of AI tools and models and the impact they have on the organisation’s editorial and public service values.

The workshop was structured as a review of the research report by the BBC “News Integrity in AI Assistants“ published in 2025, followed by a review of the toolkit that was developed based on the results of the research and a Q&A session.

The research involved 22 public service media organisations from 18 countries, who assessed how ChatGPT, Copilot, Gemini and Perplexity answered questions about current events and the news in 14 different languages. They received over 3,000 answers and found that half of them had at least one significant issue, a third showed serious sourcing problems and a fifth had major accuracy issues. The toolkit was created to contain practical solutions to the problems that the research identified and to provide answers for what a good AI assistant response is supposed to look like.

Yezza explained that the components of a good AI assistant’s answers are accuracy, providing context, differentiating fact from opinion, and clear and accurate sourcing. The toolkit also contains additional explanations for operational failures that can occur with AI assistant responses. The toolkit can be used by tech companies, media organizations, research community and the general public, and it can serve both as a tool for monitoring and evaluation of AI assistants and for media literacy education. It is still an evolving resource, explained Yezza.

After a detailed presentation, the workshop concluded with questions from the audience and a lively discussion on the possible uses of the toolkit and its further development.

Author: Nerma Šehović / Photo: Almin Tabak

(point.zastone.ba)