By continuing to use this site you are agreeing to our use of cookies, as set out in our privacy policy.

« go back White paper Predicting-Mortgage-Loan-Delinquency.pdf Download Download the report

Predicting mortgage loan delinquency status with neural networks

Abstract. Monitoring of loan performance and early identification of high-risk consumers aids prevention of loan defaults and is of interest to many banks and investors. For banks especially, timely monitoring ensures regulatory compliance as well as adequately quantifying their risk, accurately calculating their capital and setting aside proper reserves. With this goal in mind, four multi-classification models have been built that take as input characteristics of a loan at inception, as well as information about the first 12 monthly payments, and predict the status of these payments over the next 12-month period.

Ksenia Ponomareva, Paul Epstein and David Knight 14 February 2019

Introduction and motivation

There is an extensive literature on application of machine learning algorithms to credit scoring, where a model determines the relationship between default and loan characteristics. Once built, this model is used to predict the probability of debt being repaid by the obligor. Review and comparison of these types of models are provided in [1]-[3], with [4] focusing on assessing mortgage risk using a deep net and [5] using a convolutional neural network.

As can be seen from these papers, most publications in consumer lending applications cover binary classification, where a loan either defaults or it does not. Sometimes severe delinquency, defined as having payments delayed by six months or more, is used as an alternative to the default. There are several reasons for considering a multi-classification problem instead. Firstly, a lot of information is lost if only default and non-default states are taken into account. For example, a loan with no missed payments has a different creditworthiness profile to a loan where payments are late by one month only but have occured several times during the initial term of the loan. Slightly late (between 30 and 180 days) payments can be a sign of declining financial health of the obligor. However, since the loan has not technically defaulted, it will be marked as "good" by the binary classification and remain unnoticed until the delinquency status deteriorates significantly. Differentiating between the frequency of the occurrence and the delay magnitude of the payments would allow more accurate tailoring of credit terms and conditions as well as enable early-warning signal detection - since borrowers are likely to transition through different credit profiles throughout the life of the loan.

The aim of this paper is to investigate the technical feasibility of predicting several classes, or statuses, of loan near-term delinquency and payment behaviour by considering a number of multi-classification models. In this paper residential mortgage loan data has been used, but model architectures described here can be applied to other types of loans as well - provided a suitable dataset is available for training.

The rest of the paper is organised as follows. Section 2 provides details on the construction of the final dataset and feature analysis. Section 3 specifies model architectures and provides high-level implementation details. Section 4 provides summary and analysis of results. Conclusions are drawn in Section 5.

(To continue reading please open the PDF link below)