Valuatum’s system generates bankruptcy risk estimates instantly and accurately
Valuatum uses state-of-the-art machine learning algorithms for maximum accuracy
The model assigns weights to explanatory variables dynamically, improving the performance compared traditional models
We break the bankruptcy risk into components to increase transparency and users’ understanding
BASICS OF VALUATUM MACHINE LEARNING MODEL
Machine learning models are the cornerstone of bankruptcy risk estimation in Valuatum system. Valuatum has studied several machine learning algorithms to choose the most accurate one, Gradient Boosting Model, to estimate bankruptcy risk. Our model is developed with XGBoost (eXtreme Gradient Boosting) library. Information about other machine learning algorithms we have studied is collected here.
Historical financial data fetched from Valuatum database or inputted manually
Future financial estimates generated automatically or inputted manually
Valuatum uses gradient boosting model for maximum accuracy in bankruptcy risk estimation
The model is trained with data from nearly 200 000 Finnish companies
The model estimates the bankruptcy risk using dozens of variables including industry-specific bankruptcy risk
Based on the input data, the model classifies the case company and calculates its bankruptcy risk
Credit rating and interest rate can be derived automatically from risk values
Bankruptcy risk is broken into components to see which factors have the most significant effect on the risk
Choosing the input variables for the bankruptcy risk model
Model inputs are the data that is fed into the model for bankruptcy risk estimation. This consists of mainly data derived from financial statements as well as some industry-specific data. Valuatum’s gradient boosting model uses approximately 30 different explanatory variables, including an industry-specific bankruptcy risk calculated based on actual Finnish bankruptcy data. Most of the company-specific variables can be divided into four main categories, each of which measures different characteristics of a company.
Company’s solvency is one of the most important factors to consider when estimating the bankruptcy risk. While it is usually optimal for companies to acquire some debt, low solvency hurts the company’s ability to survive in case of external or internal shocks, therefore increasing the company’s bankruptcy risk. We capture the impact of solvency in bankruptcy risk estimation via, for example, equity ratio and gearing
While profitability does not say much about the company, the trend over time is a fairly good indicator of the validity of the business model. Companies with upward trending profitability are more likely to survive in the long run than their counterparts. This is the case even if the absolute profitability is currently negative. Figures such as ROA-%, ROI-% and net earnings-% are used to estimate the profitability in the bankruptcy risk estimation.
Liquidity is perhaps the most obvious characteristic to consider when estimating creditworthiness of a company, since it gives direct information on company’s ability to meet its obligations. However, as we will explain later, poor liquidity does not automatically translate into poor creditworthiness, just like good liquidity does not always mean that the company will be able to pay its debts. Impact of liquidity is captured via several key figures, such as quick ratio and current ratio.
Some industries are more risky than others. This may be due to higher exposure to external shocks or more competitive environment leading to lower profit margins, among other things. Therefore, the industry-specific bankruptcy risk varies a lot between industries, and it is important to factor this in when estimating the bankruptcy risk of a specific company. A good example of a high-risk industry is construction business, while, for example, companies offering legal services are generally much less likely to go bankrupt. We are currently using industry risks calculated based on Finnish bankruptcy data from several years and are planning to include data from other countries as well. If an industry has not enough data available, we use the average bankruptcy risk.
Size matters. Large size is typically an indicator of successful past and future potential. Moreover, larger companies are likely to have better resource that help them survive in case of external shocks. Similarly to profitability, with company size the trend of growth rate may be even more important than the absolute value. Company size is estimated using, for example, total assets and net sales while growth rate is captured via net sales growth %.
Bankruptcy risk model development
Developing a machine-learning model for bankruptcy risk estimation is a two-part process: first we train the model with massive amounts of data, after which the performance of the model is validated by comparing the calculated risk values to real life observations.
Training the model
Once the input variables are decided, data with hundreds of thousands of data points from different companies is provided to the machine learning algorithm. Dataset contains all relevant explanatory variables as well as the information whether a company has gone bankrupt in the following years. The algorithm then crunches the data and tries to find links between the input data values and bankruptcy information. When the model is complete, we can estimate any single company’s bankruptcy probability by giving the corresponding input data to the model.
It is important to note, however, that the machine learning algorithm does not produce a simple polynomial equation like, e.g., logistic regression. Instead, the machine learning algorithm is able to adjust the weights of individual parameters on a case-by-case basis based on other parameters. For example, the impact of liquidity is much higher for companies experiencing losses compared to those that are highly profitable.
Gradient boosting is a machine learning technique used for regression and classification problems that has recently been dominating applied machine learning competitions. Models build using gradient boosting technique are ensemble models, meaning that we build several models and combine them at the end to form the final prediction. With gradient boosting several sequential models are built. This means that each model that is being built learns from the mistakes of previous models. This feature is called boosting. Each model in a boosting process is called a weak model. While each individual model may indeed be weak, the idea is that together they will form a strong and accurate model.
Valuatum uses a gradient boosting algorithm called XGBoost (eXtreme Gradient Boosting), which uses decision trees as weak models to develop our bankruptcy prediction model. As a result, we get number of decision trees whose results are added together. The biggest advantages of XGBoost over other gradient boosting methods are execution speed and model performance. XGBoost is also well suited for classification problems such as prediction of bankruptcy risk.
Validating the performance
Once the model training is complete, we test the accuracy of the model with real life financial and bankruptcy data. Valuatum’s model is originally trained with Finnish financial statement and bankruptcy data from years 2010-2012. We then validate the model by giving it financial data from another year, e.g., 2014, which the model uses to calculate the bankruptcy risk for the following years. After that, we can compare the calculated risk levels to actual bankruptcies that happened during that time. Additionally, we can validate the model performance quantitatively using, for example, Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves. You can read more about model validation from our whitepaper.
After careful training process we have been able to achieve accuracy levels that are close to or above those found from academia.
Bankruptcy risk model output
The model calculates the probability that the company will face bankruptcy within the next two years. Bankruptcy risk is calculated for each year individually, making it possible for users to observe how the bankruptcy risk has developed in the past. The risk value is also calculated for future periods, allowing the user to see how their financial estimates affect the company’s bankruptcy risk.
Bankruptcy risk and classification
The most important output figure is naturally the bankruptcy risk, which tells the probability of a specific company going bankrupt over the following two years. However, due to economic cycle and other external factors, the bankruptcy risk percentage alone may not always be the best indicator of actual risk. Therefore, we also show graphically and numerically each company’s risk relative to all other companies.
Based on the bankruptcy risk, we can create customized classification models as per customers specifications. For example, we can divide companies to different credit rating classes, such as AAA, AA, A, BBB etc., or give them a numeric rating based on their bankruptcy risk, either relative or absolute. It is also possible to automatically determine a suitable interest rate based on bankruptcy risk. All these classification models can also include other variables in addition to bankruptcy risk.
Bankruptcy risk components
One of the downsides of machine learning algorithms is that they often act as a black box, meaning that it is not apparent to the user what actually happens inside the model. For example, with logistic regression models we would always know the impact of each explanatory variable on the final risk value since the weights are constant. But with machine learning algorithms, the weights change on a case-by-case basis, so things are not as simple. We are, however, able to break the bankruptcy risk into components by using a game theory concept called Shapley value.
Benefits compared to traditional methods
The biggest advantage of machine learning models over traditional bankruptcy risk estimation methods, such as logistic regression models, is their ability to adapt to different scenarios. While logistic regression models always assign the same static weight for each variable included in the model, machine learning algorithms are able to recognize the most important factors in each situation and assign weights accordingly.
For example, when estimating the risk of a company that is in a dire situation financially, liquidity is given more emphasis. On the other hand, in the case of highly profitable and financially sound companies, liquidity is less important. But even then, our models do not blindly assign a good rating just based on seemingly good financial ratios. Rather, they will dig deeper and often pick up on things that static models would not be able to. A good example of this is a company that is profitable and has a relatively low gearing, but has a high receivables turnover time and sales receivables account for a large portion of total assets.
While static models would likely give this company a good rating based on, e.g., good profitability and gearing, machine-learning algorithm is able to recognize that high receivables-to-assets ratio and long receivables turnover time might be signs of worthless receivables and therefore the seemingly good financial situation (both profitability and solidity) might be untruthful. Modern machine learning methodology would assign this kind of case with poor credit rating despite good profitability and low gearing, while traditional logistic regression models would probably assign it a good credit rating without radical manual corrections to the financial figures.