New paper accepted at LREC 2022: Generating textual explanations for ML models performance


This week sees the thirteenth Language Resources and Evaluation Conference (LREC 2022) being hosted in Marseille, France. LREC aims to provide an overview covering the state-of-the-art, new R&D directions and emerging trends as well as facilitating the exchange of information regarding LRs, their applications, and requirements coming from e-science and e-society.

Covering both scientific/technological issues as well as policy and organisational ones this major conference for researchers, industrials and funding agencies provided the opportunity for Caspian to contribute our own learnings as part of the products, services and applications that are the result of progress in language sciences and technologies.

Our latest paper ‘Generating Textual Explanations for Machine Learning Models Performance: A Table-to-Text Task’ has been accepted by the LREC conference 2022. With one of the hot topics of the conference being Machine Learning and Multimodality, this acceptance is recognition of the extensive research that underpins the advanced automation technology solutions Caspian develop to support the anti-financial crime efforts of financial services organisations.

The paper proposes a new natural language generation (NLG) task as an alternative to the numerical tables that are widely employed to communicate or report the classification performance of machine learning (ML) models with respect to a set of evaluation metrics. It was authored by Amir Enshaei of Caspian and Newcastle University alongside Isaac Ampomah, James Burton and Noura Al Moubayed, all of Durham University and as part of our Knowledge Transfer Partnership with the university.

The paper recognises that for non-experts, domain knowledge is required to fully understand and interpret the information presented by numerical tables, whereas the newly proposed NLG task trains neural models to generate textual explanations, analytically describing the classification performance of ML models based on the metrics’ scores reported in the tables. This provides a better understanding of the classification performance of ML models, and we believe it to be the first of its kind to focus on explaining the performance of ML models in this way.

The textual explanations are written by computer science experts and checked manually to ensure that they accurately reflect or verbalise the information in the corresponding performance table. This could prove to be an invaluable resource that helps other scientists in the field to use and introduce new potential models.

The full paper is published here.