Quantifying one of the largest problems in banking.
Other than those with an extremely unusual financial grounding, in order to make a transaction of any significant worth, you are forced to take out a loan. The institution most looked upon to provide these funds are banks, but for those suffering in the extreme cases of financial instability, risk averse and dangerous alternatives are used.
Many in these situations find themselves completely unable to provide the funds necessary to pay these loans back in a timely manner, resulting in an uncontrollable increase because of interest rate.
Far too many financial and loan-giving institutions establish standardized interest rates simply based on general pieces of demographic information and the institution’s funded amount. For the most optimized results, demographic data should be put slightly aside, and an emphasis on valuable quantitative information should be used. While this may seem counterintuitive-quantifying a process that should be nearly completely merit-based-it has been proven time and time again that computational analysis often defeats human intuition.
It seems quite simple. There is an obvious problem, there should be an obvious solution right? Wrong.
The issues with problems like these is that every situation is different. It is incredibly hard to use one datapoint to create some overarching model for optimization. With one generalized datapoint added to others, this level of accuracy rises tremendously. With this understanding in mind, I created a model that is fundamentally reliant upon this principal.
The Model Used in this Specific Circumstance
Although alternative models could be used, in some sense, they are quite unnecessary to implement being that the data does not need to be heavily altered and subsequently analyzed. In this case, the job can get done with a simple reinforcement learning architecture and or linear regression. In my case, I choose the latter.
It seemed to be an obvious choice in this instance given the makeup of the dataset.
Before I dive in my model in particular, I would like to tell you about the fundamentals on linear regression.
What is it?
Well, recall the concept of line of best fit from your middle school math class, at a very basic level, linear regression models do exactly this. If your memory is currently failing you, I’ll remind you of the fundamentals of this elementary concept.
Most every graph is simply a set of data points that correlates with the axises that make up that visual model. All that a line of best fit really is, is a line with the average slope of all the points. This line, conveniently name, shows the best prediction for data point location in all of the missing values of the graph.
The ability to separate and define data is powerful. It can used as the line of separation, separating combinations of data, and defining these similar combinations as unique groups.
In conventional graphing models, there is a y for an x. Two data points, and a very easily visualized separation. This model however, is not just limited to two dimensions. Instead of having simply calories of the x-axis and sodium on the y, imagine having an additional three data points. The information value given by the line of best fit rises tremendously. This fifth dimension data separation is nearly impossible for a human to visualize, let alone create. For computers however, this task becomes orders of magnitude simpler.
The applications of the modeling technique have been, and will continue to be numerous.
Take the example of college admission. Imagine that a collection of data points are amassed from each student that has applied to X University in the past five years. Their grades, standardized test scores, number of extracurriculars, merit score, and final admissions decision are all collected and plotted for analysis. Using the method of linear regression, the data would ultimately be separated such that computer could define whether or not a certain combination of the above data points would result in acceptance or rejection.
If made open source, imagine the benefit it would bring to high school students in their stressful college admissions process.
The computer, using this process, creates an expert-grade and intuitive understanding of data in a variety of circumstances.
Optimizing Interest Rates Using the Lending Club Open Source Datasource
As mentioned above, it’s difficult to quantify this practically merit-based process, but with a universal positive datapoint-paid back loans-with supplemental data points to create further accuracy.
To do this, I used the Lending Club dataset available on Kaggle. I extracted the most fundamental numeric data points(loan amount, interest rate, installment, and annual income) so as to cancel subjectivity in the valuing of say the loan participant’s employment title. I then created a new ‘csv’ using exclusively these columns. Going forward, it may be beneficial to value to various zip codes on the map to allow for the quantitative analysis of diversity and economic struggle in certain regions.
Once the new ‘csv’ was in place, it was necessary to perform an object to float conversion. While all the inputs were numeric, some of the data types were still technically in the form of objects, inhibiting the execution of the model. In order to perform linear regression, all data points must be in a true numeric form. This, of course creates unbiased data, which can not be said when value is determined through personal analysis.
Impact of the Technology
With the ability to calculate an optimized interest rate, customers of banks and financial institutions can finally fact check and fight for themselves. No longer must we rely upon a centralized power to decide the value of our dollar. The people, not institutions, are now able to decipher what an optimized interest rate looks like in their specific circumstance. This is loan giving reimagined. This is the future of banking.