AI is surrounded by hype, but can it fulfil expectations? Rebooting AI is a new book that highlights fundamental problems and we look at how the use of AI in financial crime and compliance is impacted.
Rebooting AI, by Professors Gary Marcus and Ernest Davis of New York University, has caused controversy by questioning whether Artificial Intelligence can live up to the hype, arguing that we should not expect it to match human performance on complex tasks. Their challenge has reached the pages of the New York Times and The Economist, and a video of their debate with AI pioneer Yoshua Bengio has been viewed 40,000 times.
Subtitled “building artificial intelligence we can trust”, the main target of the criticism in Rebooting AI is Machine Learning (ML): training a system to make decisions by using large sets of examples. A machine learner can be trained to spot breast cancers, for example, by giving it scans of healthy tissue and examples of tumours spotted by clinicians, until it can reliably spot tumours in previously unseen examples. If such systems are successful it could mean providing better expert care to millions more people at much lower cost.
ML is now used by about one third of financial institutions in their fight against financial crime – including transaction monitoring, customer due diligence, KYC, and correspondent banking – to spot patterns of transactions or corporate structures that are suggestive of criminal activity.
Marcus and Davis call out three main problems with ML. But do these systematic problems apply to Financial Crime and Compliance (FCC) teams? And what are the solutions?
What the FCC community need to know about machine learning problems.
Problem 1: ML is Data Hungry
The performance of a ML solution depends on the number, range, and quality of examples it has to learn from. DeepMind’s breast cancer screening system, for example, required 30,000 scans to reach human performance levels, carefully labelled by multiple expert clinicians.
Banks have access to large volumes of customer transactions, but data is still a major problem.
The first problem is that we have plenty of examples of low risk behaviour, but few of high risk activity. Criminals, naturally, are reluctant to advertise themselves, but machine learners need examples of both types to learn from. High quality labelling depends on a lot of careful work by scarce subject matter experts. Anti-Money Laundering (AML) solutions are also often based on outdated risk typologies that don’t include important risk classes – such as human trafficking – for which we have few proven examples of transaction histories to learn from. At Caspian, we address this problem by working with independent expert investigators to evaluate and identify thousands of cases of possible risk. We also use systems, such as autoencoders, to characterise what ‘normal’ low-risk behaviour looks like, so we can spot and investigate atypical behaviours more efficiently.
There are other techniques that the Caspian team are also testing – such as active learning and data programming – that help us learn quicker, to identify which examples would improve performance the most, and to learn more from FCC subject matter experts about how they make their decisions.
However, in too many cases, banks don’t have access to the data they need to effectively detect money laundering. Challenges include the use of secrecy locations, lack of data sharing between banks, disparate data formats and the lack of transparent corporate registries in many countries. The situation is gradually improving, as the FATF guidelines on information sharing are gradually implemented, and more company registries come online, but we have a long way to go.
In the meantime, the Caspian team continue to learn from the great work of our colleagues at initiatives such as the Open Crime and Corruption Reporting Project and Open Corporates, whilst making extensive use of data from other publicly available sources such as the Panama Papers.
Problem 2: ML is Brittle and Shallow
Despite the name, ‘deep learning’ systems are not particularly deep. They classify examples by relying on surface appearances, and struggle to use contextual information to make sensible decisions. Even small changes to inputs can produce unforeseen changes to outputs, producing sometimes weird results. For example, a group at MIT produced images of turtles that are mistaken for rifles by an apparently reliable computer vision system (we said it was weird!).
This matters for FCC because it means that even simple changes to inputs can successfully hide criminal activity. We are already familiar with some of these tactics: criminals slice transfers into transactions that fall below alert thresholds, or mis-spell names to hide their identity across multiple sources. But these are a problem for all AML processes, and not just ML-based systems. It is unlikely that criminals would have sufficient access to banks systems to deliberately design hacks targeted at particular algorithms.
The bigger problem caused by the shallowness of ML systems is the nature of these mistakes, rather than their volume. Regulators don’t expect systems to be perfect, but they do expect them to be reasonable – the Federal Reserve guidance on model risk management describes this as ‘conceptual soundness’. Mistaking a turtle for a rifle is the kind of mistake a human would never make, because we recognise the image based on overall ‘top-down’ characteristics, rather than fine grained surface effects.
It is highly likely that ML-based AML systems will make the kind of mistakes that humans would not.
At Caspian, we address this problem by training our systems on deeper expert knowledge of the problem domain. We have spent 18 months working with Tier 1 global banks, and our independent financial crime experts observing, capturing and testing how they gather evidence, judge and evaluate evidence, make risk decisions and then explain those decisions. This enables us to identify which features of cases and alerts are significant – such as the affordability or frequency of payments – and train our systems on those specific aspects.
Problem 3: ML is Opaque
Marcus and Davis’ final criticism is of one type of machine learner in particular: neural networks. These can be very effective learners, but are ‘black boxes’: it is hard to understand why they made particular decisions.
This matters for FCC because it makes what the Federal Reserve guidance calls “effective challenge” impossible. If an institution is to be held accountable for the decisions of its processes – human or machine – then they need to be able to justify those decisions. If it decides that a person or activity is high risk or safe, then we need to know why.
Knowing how a system makes its decisions is vital for making informed judgements about its limitations, the assumptions it makes, and the kind of cases it will get wrong. It is also necessary for identifying cases of discriminatory bias, or to incorporate expert human understanding into the solutions. Techniques such as LIME or LRP help us reverse engineer how neural networks work, and to understand which factors were important in making decisions, but they require extra investment.
We take a pragmatic approach in building our Financial Investigation Platform by starting with expert human understanding of the risks. We model this using a combination of explicit rules, machine learning and unsupervised techniques. But we also ensure that every component and decision can be explained in human-legible terms to provide specific value for FCC teams.
AI and ML are powerful tools but are not magic bullets. Marcus and Davis highlight the limitations; and pointers to the solutions. As with Cyber Security, AI needs a comprehensive battery of monitoring and enhancement capabilities that are continually applied.
And we have found that this approach works in FCC. Our solutions are trusted by Tier 1 banks to investigate transaction histories and individual entities in depth, at scale. We see fewer errors picked up by QA from our automated solution than from human analysts; whilst also passing the strictest scrutiny of model review boards and regulators.