Deep learning and its interpretability in retail banking
21 November 2018
As deep learning enters the mainstream in financial services, interpretability is becoming an important issue; introducing models with hidden layers and complex behaviour makes outputs less deterministic. This presents a problem when it comes to justifying whether a model is fair and reliable, or whether it contains hidden biases and errors. Ahead of our technical publication in this area, we discuss some practical approaches for interrogating complex models and revealing their structure. As with complex engineering structures, this kind of holistic analysis can start to reveal possible points of failure.
Credit risk assessment for retail banking and deep learning
The Retail Banking industry is primarily concerned with the distribution of financial services to private individuals. From the bank’s perspective, the range of services on offer varies from almost credit-risk-free – such as current accounts – to potentially high-risk products that include credit cards, loans and mortgages.
One of the most popular techniques used to assess and manage the credit risk of potential applicants is the so-called credit scorecard, usually obtained via techniques such as logistic regression.
A typical scorecard takes a number of inputs from the credit applicant, such as income and usage history of previous credit products. Each of these inputs is transformed into a score, according to certain rules. The final step is to obtain an aggregate score by summing the individual input scores – if the applicant’s score is found to be above a certain threshold, representative of the lender’s risk appetite, the application is accepted.
The development of the scorecard’s rules and threshold is only possible via analysis of relatively large cohorts of previous customers’ data and debt repayment behaviour.
Credit scorecards have proven very useful in this context as they make effective use of relatively coarse data (e.g. account statement and/or income data, which could be annual or semi-annual) and provide an intuitive, visual explanation of which factors contributed the most towards a certain score.
At the same time, it is possible to provide feedback to a rejected applicant regarding the most likely reasons for the rejection; individual scores that are particularly weak are likely to correspond to the factors that are implicitly categorising the applicant as high-risk.
In recent years the retail banking industry has started to see a significant increase in the volume of data available for analysis. There are several sources of this data; social networks, mobile application usage statistics, and even government-backed data frameworks, such as Open Banking (www.openbanking.org.uk) – which was rolled out in the UK in January 2018. This latter initiative allows private individuals and SMEs to share their transaction-level account information on a voluntary basis, enabling participating banks and lending institutions (especially smaller ones that may not have enough data to conduct risk analyses) to offer better-tailored credit products to whoever decides to share their data.
This increase in data volume presents both advantages and challenges. On one side, a large dataset is key to generating accurate statistical analysis – generally speaking, more data is always desirable. Conversely, once a dataset becomes too large it may require a paradigm shift in the tools and techniques used to analyse it.
With regards to the credit scorecard approach mentioned previously, an effective scorecard takes a limited number of inputs and this can be a serious limitation when analysing highly-granular datasets. A possible solution is to perform data aggregation upstream to make standardised scoring viable, but at the cost of potential information loss in the process.
Where this is the case, more sophisticated deep learning (DL) models can be utilised to take advantage of their specific capabilities in analysing certain types of granular data. In the past decade there has been a surge in remarkable results attained by DL algorithms – especially neural networks (NNs) – in various pattern recognition applications. These vary from image classification and voice recognition to forecasting of interest rate dynamics.
Managing the interpretability problem
Explainability does not require a new category of model, but rather ensuring that outputs from existing models are sufficient to provide understanding – both at a technical level for the developer and for the end user. For existing models, the ability to explain a model might be mapped against the accuracy of the outcome with respect to the “ground truth” (i.e. the phenomenon we wish to model) and used to generate appropriate metrics for model quality.
Measuring the performance of different models, DL might score highly for predictive accuracy but low for explainability in comparison to decision trees – which score lower on accuracy but are comparatively easy to explain.
Many existing systems within large organisations are already at a level of complexity beyond the immediate ability of a single person to understand. With the addition of AI, this challenge will only grow.
Even in a relatively explainable model, the outputs may still be too complicated to explain easily and therefore it will be necessary to provide user interfaces and effective data representation techniques to allow additional insight. This will require new skills and creative techniques to illuminate the biases and decision logic embedded within these models. This will range from summarised reporting dashboards, right down to deep explanation and traceability of the decision process.
Ultimately, regulators will have to provide guidance on the fair use of data, allowing organisations to ensure their models are fit for purpose...
To view the full article, please click the link below.