Introduction and motivation
There is an extensive literature on application of machine learning algorithms to credit scoring, where a model determines the relationship between default and loan characteristics. Once built, this model is used to predict the probability of debt being repaid by the obligor. Review and comparison of these types of models are provided in -, with  focusing on assessing mortgage risk using a deep net and  using a convolutional neural network.
As can be seen from these papers, most publications in consumer lending applications cover binary classiﬁcation, where a loan either defaults or it does not. Sometimes severe delinquency, deﬁned as having payments delayed by six months or more, is used as an alternative to the default. There are several reasons for considering a multi-classiﬁcation problem instead. Firstly, a lot of information is lost if only default and non-default states are taken into account. For example, a loan with no missed payments has a different creditworthiness proﬁle to a loan where payments are late by one month only but have occured several times during the initial term of the loan. Slightly late (between 30 and 180 days) payments can be a sign of declining ﬁnancial health of the obligor. However, since the loan has not technically defaulted, it will be marked as "good" by the binary classiﬁcation and remain unnoticed until the delinquency status deteriorates signiﬁcantly. Differentiating between the frequency of the occurrence and the delay magnitude of the payments would allow more accurate tailoring of credit terms and conditions as well as enable early-warning signal detection - since borrowers are likely to transition through different credit proﬁles throughout the life of the loan.
The aim of this paper is to investigate the technical feasibility of predicting several classes, or statuses, of loan near-term delinquency and payment behaviour by considering a number of multi-classiﬁcation models. In this paper residential mortgage loan data has been used, but model architectures described here can be applied to other types of loans as well - provided a suitable dataset is available for training.
The rest of the paper is organised as follows. Section 2 provides details on the construction of the ﬁnal dataset and feature analysis. Section 3 speciﬁes model architectures and provides high-level implementation details. Section 4 provides summary and analysis of results. Conclusions are drawn in Section 5.
(To continue reading please open the PDF link below)