Data: a mental burden for companies

“Artificial intelligence is a tool that you feed, but its food – i.e. fresh data – can have a restructuring effect on the business.”

Mehdi Labassi
Chief Technology Officer, Carrefour
SKEMA ALUMNI

Bernardo Pagnoncelli
Research Professor of Analytics and Data Science, SKEMA Business School

We live in the data age, and hardly a month goes by without news of a new AI tool or a new data science application. Organisations are increasingly using these advances to integrate data products like analytics solutions, machine learning models and even large language models (LLMs) into their core operations. An essential but often overlooked aspect of managing data products is the need for ongoing and thus considerable maintenance after their deployment. Although their code remains unchanged and they undergo extensive testing and validation during the development phase, their performance tends to dwindle over time. A recent MIT study published in Nature (Daniel Vela & al., 2022) reveals that this erosion affects 91% of the machine learning models evaluated. Several factors can influence this decline, known in scientific literature as “deterioration” or “ageing”.

CATCHING UP WITH THE PRESENT

“Data drift”, for example, involves variations in the statistical properties of input data due to changes in user behaviour, changing market trends and other external factors. An e-commerce recommendation system can lose relevance as consumer preferences change with each season. Likewise, the concept of “drift” occurs when the relationship between the input characteristics and the output variable changes, as can be seen in fraud detection models that fail to recognise new types of fraudulent activity. Obsolete knowledge also contributes to the ageing of data products. Models that do not incorporate the latest information or trends can lag behind. For example, the health recommendations provided by LLMs can become less accurate if not updated with the latest medical guidelines. Furthermore, the quality of results and metrics can drop significantly if the quality of input data deteriorates. For instance, the accuracy of a weather forecasting model can be reduced if the reliability of data from the sensors it relies on is compromised.

Various proactive strategies can be adopted to guarantee the performance and accuracy of data products once they are deployed in production. Continuous monitoring and alerts play a fundamental role in tracking metrics and identifying anomalies. Models need to be recalibrated and updated regularly to incorporate new data and stay on point. Effective data quality management involves rigorous validation to ensure the integrity of the “Artificial intelligence is a tool that you feed, but its food – i.e. fresh data – can have a restructuring effect on the business.” input data. The use of data observability practices and tools can solve these problems, and change management is also essential, through checks on data versions and contracts. Above all, this maintenance work requires proper planning and budgeting. The financial commitment to a data product goes well beyond its initial development and requires a change in financial governance to support ongoing operations.

As data products become central to the business strategy of many organisations, it is imperative to move away from the “fire and forget” approach typical of traditional software deployment. A proactive approach to understanding and mitigating data product ageing is crucial to ensure their longterm relevance and effectiveness.