Resolving Lending Bias - a proposal to improve credit decisions with more accurate credit data

Jeff Hulett
Sep 7, 2021
16 min read

Updated: Oct 16, 2023

“…. most of what goes wrong in systems goes wrong because of biased, late, or missing information.”

- Donella Meadows, a Harvard and MIT-trained scientist and systems researcher

In the world of lending, algorithms drive much of the credit decisoning. This is particularly true when lending to people, like Mortgage and Consumer Lending products. While policy experts set policy cutoffs, it is credit algorithms that perform much of the credit assessment. The most popular algorithm is called the classic FICO credit score.

This article is focused on resolving cultural biases that may exist within traditional credit data and the systems that render and manage credit data. We take a systems-level approach. Our belief is the path to significant and lasting change is by addressing system rules and goals. [i] Our implicit system goal is that all people, regardless of past inequities, should receive an unbiased credit assessment.

Jeff Hulett and several collaborators authored this article. Jeff’s career includes Financial Services and Consulting related leadership roles in the decision sciences, behavioral modeling, loan operations management, and risk and compliance. Jeff has advanced degrees in Mathematics, Finance, and Economics. Jeff’s journey in the algorithm-enabled financial services world is summarized in the appendix.

A color-blind algorithm

There is good news from a lending bias standpoint. Since the algorithm is "color-blind," it will not provide a decision recommendation other than that rendered by the scorecard. (Please see note [ii] for a definition of “protected class” and “color-blind”). The scorecard is created via a statistical process fitting past characteristic data about borrowers to those borrowers' credit outcomes. Thus, the scorecard is predictive of an outcome like default or delinquency on a loan. The data driving the scorecard is from large, historical U.S. credit data sets provided by one or more of the primary credit bureaus. (Please see note [iii] - aka “CRA” or “Credit Reporting Agencies”) The data is professionally maintained and governed via regulations like the Federal Credit Reporting Act (FCRA). By design, the scorecard is the algorithm's instructions to calculate the credit score from our individual credit data. For its massive size and notwithstanding periodic data breaches, the data from the three largest credit bureaus is well regarded as one of the highest quality and deepest data repositories in the world. On its face, credit bureau data seems "color-blind" as they are limited by Fair Lending regulation to only warehouse attributes associated with credit risk. For example, typical credit attributes, also known as "tradeline data" are

the number of times delinquent
the number of credit inquiries,
date open / closed,
balance,
payment, etc. [iv]

No data explicitly identifies the borrower's race or other legally protected class information. [v]

The use of more sophisticated credit algorithms is increasing. Credit modeling companies like Fair Isaac Corporation (maker of the FICO Score), and banks with a permissible purpose to the credit data regularly test Artificial Intelligence and Machine Learning-based algorithms to improve the credit risk separating ability of the existing algorithms. Banks have been slow to move away from the more standard, transparent linear-based modeling techniques, such as the Classic FICO score. Among other reasons, some banks are slow to change credit scores because of:

Fair Lending laws,
Customer reporting laws like the Equal Credit Opportunity Act (ECOA) [vi], and
Inertia because of the fear of changing the credit score. There is comfort with the credit performance of the existing credit score.

But, competitive pressures are driving banks to use more sophisticated models to improve the algorithm’s credit risk-separating ability. [vii]

In summary:

1) The algorithms, by their nature, are color-blind and regularly used to make individual credit assessments;

2) While there is inertia and other factors slowing the improvement of credit scores, there is also competitive pressure to use even more sophisticated, but less transparent algorithms to further improve the quality of credit assessments.

It seems mostly good news with a few speedbumps for improving credit assessment algorithms. Unfortunately, there is a fly in the ointment that enables bias and reduces credit assessment accuracy.

Biased data

While the algorithms are color-blind, there is a significant body of literature suggesting the data utilized to train the algorithms may be systemically (aka: structurally) biased. [viii] So, how is it possible for an algorithm to be color-blind but for the data to be biased? Ultimately, algorithms are just a tool. A very sophisticated tool but still just a tool. Data, on the other hand, is a representation of past reality. For example, we know the U.S. has a history of racism. This is certainly not a secret and persists at some level today. [ix] So, since the data used for credit algorithms is from our recent past and our recent past contains racism, then it follows the data itself may be systemically biased by race or other protected classes.

Please note: Systemic bias is more challenging to perceive because it is at the system level and not necessarily at the individual level. So, it is quite possible, the individual lending participants are not biased, but the system in which they participate contains systemic bias as part of the system's rules, habits, and outcomes. That is, the act of a participant following the existing system rules may reinforce systemic bias.

Systemic bias may persist although banks and non-bank lenders follow Fair Lending, HMDA (Home Mortgage Disclosure Act), FCRA, and related laws intended to alleviate discrimination. The operative questions relate to the direction of the causality arrow connected to poor credit behavior. Such as:

1) Did the poor credit behavior of a protected class citizen cause their low credit score?

That is:

Did A → B ?

- OR -

2) Was the poor credit behavior of a protected class citizen caused by systemic bias and a lack of opportunity?

That is:

Was A ← C ?

The A → B causality assumption is generally a customary belief many people hold but do not rigorously test. In the banking context, it is assumed people make active decisions about whether or not they pay their bills. This is an important assumption because the degree to which people pay bills is directly correlated to the bank's profitability. Credit loss associated with not paying a bill is a bank expense that reduces profitability and may increase credit provisioning and required credit capital.

Causality is essential. The validity of credit algorithms and associated credit assessments depends upon the “A→B” causal direction from statement 1.

Increasingly - systemic bias, statistical model bias, and other factors are calling into question the traditional “loan delinquency is a choice” assumption. There is evidence some people may not have as much choice in making payments as we may traditionally believe. In the book Scarcity, the authors argue that scarcity, whether from money, time, or other resource constraints, causes a short-sightedness ("tunneling") that creates a sort of self-fulfilling prophecy in terms of negative feedback loops. These negative feedback loops may cause a failure to meet scarcity-based commitments. Based on the book’s thesis, think about how you feel when:

Your work is so busy you feel like you will never get your work done. You may feel tired and short-term focused. You may just want to get to the end of today.
You have been working for 14 hours and you have one more item on your list. If it is not life-threatening, you may want to put it off until tomorrow. You may feel tired and irritable.

That feeling is similar to someone without enough money to make a payment. Except, for the poor, this feeling never goes away.

Then, the questions remain:

To what degree does systemic bias exist in the data used for credit algorithms?
What if the causality arrow goes in both directions? What are its impacts on the credit assessment?
How do we know the degree to which bias impacts credit decisions?
Should lenders use different data or algorithms?
If the large, government-based buyers and insurers of mortgages* all require credit algorithms exposed to systemic bias, how much influence do lenders even have to use different data or algorithms? * Collectively these organizations include the Government Sponsored Enterprises and Federal Loan Insurers (GSE / FLI - like Freddie Mac, Fannie Mae, Ginnie Mae, FHA, VA, and USDA)

The following proposals address these questions.

Proposals to address Lending Data Bias

Proposal 1: Create a bias-free level data playing field

We may address most of these questions with a simple hypothesis, test that hypothesis, and utilize the existing credit backstop from the U.S. government.

The simple hypothesis is this: white people, especially white men, on average have historically received more credit opportunities than other protected classes. This bias-based advantage, while possibly less today, still exists today. This proposal proactively tests this hypothesis and provides a solution based on the test outcome.

The proposal is to redevelop the predominant credit-granting algorithm, the classic FICO score. It will be independently redeveloped using two distinct data sets based on FHA-defined protected classes. One data set will contain no protected classes ("NPC" - only white men) and the other will contain 1 or more protected classes ("PC" - people of color, women, etc) The data will be sourced from the Credit Bureaus, HMDA [x] data, and census data. In effect, there will now be 2 comparable FICO scores, an "unprotected class FICO" and a "protected class FICO." Upon the completion of the new FICO scores, then, apply them to a performance validation data set. If both the NPC and PC FICO scores are identical in terms of separating risk and the credit assessment decision, then there is no bias. Back to the hypothesis, a “no bias” result invalidates the hypothesis and this proposal becomes unnecessary. However, if the result shows lower credit granting and/or higher credit loss attributable to the protected class FICO, then there is systemic bias.

A potential criticism of this approach is that the hypothesis is potentially based on a selection biased-based population. That is, the NPC modeling segment is based on the "banked" population currently found in the CRAs. It is possible the "NPC" segment found in the current CRA data is not fully representative of the full NPC population found in the United States. The concern may or may not be true. However, if the two CRA-based populations are found to be different, then there is certainly bias in the current approach. If the differences are insignificant, this does not disprove that there still may be bias in the broader NPC population.

As a matter of policy, the GSE/FLIs could compel all lenders to only utilize the protected class FICO scorecard for all credit decisions, regardless of the applicant's protected class status. This would facilitate a “de-biased” decision because the scorecard attributes are tuned to those that have not received a social advantage. In effect, it would level the credit-granting playing field with credit algorithms trained via data sets associated with the same (lower) social advantage.

The costs of credit losses associated with these programs are generally borne by the borrowers when the loans are originated. This is accomplished by mortgage insurance premiums (MIP) or an interest rate pass-through associated with loan guarantee fees (G-Fees). As such, there should be little effect on loan salability in the secondary markets. Also, this approach will accelerate the Federal Government’s and the GSE / LFI's social and fairness mandates.

Similarly, banks could also use the protected class FICO for their portfolio credit decisions. This gets a little trickier because the bank may end up with higher credit losses than if they used the traditional credit score. This could be managed similarly to how the FDIC loss share mechanism is used for banks that purchase failed banks. [xi] The banks could make a claim to the U.S. Treasury on the higher credit loss and associated management costs by determining which loans that created a loss would have been declined originally if they had used the traditional score. This approach holds the banks harmless for a social program to eliminate lending data bias.

Proposal 2: Reduce data bias with non-traditional credit data

Then, the question becomes:

"If there is systemic bias resident in the credit data, how do we fix it going forward?"

This proposal seeks to increase and improve the data available for credit algorithms. The incremental data is intended to be appropriately representative of all U.S. citizens, not just those traditionally “banked.” The idea is to utilize valid, but non-traditional payment sources typically used by people that are not active participants in the traditional lending system. These U.S. citizens are known as the "unbanked." Please note: When it comes to credit data, the "banked" and "unbanked" descriptions are a bit confusing. From an institutional standpoint, "Nonbank" lenders do often provide payment information to the CRAs on behalf of their clients. To eliminate potential confusion between an unbanked consumer and a nonbank institution, we describe credit data in terms of "CRA" or "Non-CRA." This suggests:

CRA - payment-taking institutions already submitting customer data to the CRAs, like lenders.
Non-CRA - payment-taking institutions NOT submitting customer data to the CRAs, like many payment apps, utilities, rent, and deposit accounts.

The idea is that bias will be reduced by including a more fulsome representation of individual payment behaviors in the CRA data repositories. Also, an important part of the proposal is to leverage the existing CRA data repository infrastructure. This will ensure data quality, including customer dispute mechanisms. To be clear, the intent is NOT to provide credit to people with a poor credit history. The intent of the additional data is to equalize the understanding of how payment behavior relates to delinquency and default, across both protected and unprotected classes. Regardless of the source of payment data, anyone not making their payments will likely have a worse credit score.

A larger % of protected class members make payments outside the CRA bank payments system. "Thin files" relate to no or low credit scores. It is possible a "Thin file" borrower is making satisfactory payments outside the CRA reporting system.

Using non-traditional data for algorithm development is not a new idea. What's new is focusing on system incentive alignment and existing CRA infrastructure, instead of "pushing on the string" of misaligned incentives and lower-quality data. [xii] - Please see this note for a discussion concerning the importance of incentives.

The idea is to include verified, FCRA-compliant non-bank payment data in the CRAs and on consumer bureau reports as separate tradelines. The incentive includes providing non-bank payment companies access to all CRA data and CRA-enabled information products in exchange for the inclusion of verified non-bank payment data, subject to FCRA-compliant dispute validation. This will enable non-bank payment data to be included in the CRA data process and CRA-enabled data products. (like FICO Score)

We are assuming non-bank payment data providers have an economic incentive to exchange verified non-bank data for access to the full CRA library of data, including traditional bank data and non-bank payment data. CRAs have a high-quality, secure, and scalable process to include verified, FCRA-compliant non-bank payment data. FCRA may either be interpreted or amended to expand “permissible purpose” requirements to include non-bank payment providers. New permissible purpose rules will enable non-bank payment companies to utilize credit data to make credit-related decisions and to evaluate current customers' credit risk and credit product usage likelihood.

This could be done by expanding the Metro 2 [xiii] input file to include non-bank payment data. Expand dispute processing to include non-bank payment companies. Develop training and related CRA information through non-bank payment industry groups. Include non-bank payment data in CRA-enabled information products (like the FICO Score).

Proposal Integration

Finally, think of proposal 1 as an initial step. Proposal 1 serves as an approach for reducing the current impact of data-related bias specifically used for algorithm training. Also, it helps us understand the degree to which systemic bias is resident in the data.

Also, proposal 1 will serve as a feedback mechanism as to the effectiveness of proposal 2. That is, as proposal 2 is implemented, we will know how it is working based on how far the proposal 1 bias hypothesis is from becoming invalid.

Proposal 2 is more of a long-term fix, to ensure all borrower classes, protected or otherwise, have exposure to well-trained algorithms. These credit decision-rendering algorithms should be trained with high-quality, professionally managed, and unbiased data.

I think of proposal 1 in terms of the need to have dials and gauges to know if we are on the right track. Next is a plane analogy:

If we were flying west from New York to Tokyo, setting the goal, having a good plane with plenty of gas, and flying in a westerly direction toward Tokyo is important.

(Proposal 2)

However, having the proper gauges and dashboard to know if the plane is too close to the ground, a mountain, or a storm is critical to the success of the flight and the safety of all those on board.

(Proposal 1)

Conclusion

The focus on credit data is the essence of reducing lending bias. If we wish to reduce bias in lending, getting the data right is critical to enabling algorithmic success.

An algorithm is just a tool. Data is a representation of past reality. Don’t blame the tool for our past mistakes…. fix the mistakes as found in our past reality.

By definition, the job of the algorithm is to “discriminate” or separate data by some dependent variable. The question becomes that of the data itself.

Was there bias in the formation of the data set? Is past “bad” (racial) discrimination hidden within the “good” (credit) discrimination data?

If we get the data correct, the algos will do their job with a more accurate credit decision.
If we get the data wrong, the algos will still do their job but with a less accurate decision.

It is easy to confuse the precision of an algo with the data-impacted accuracy of a credit decision….sadly they are not always aligned. We are proposing to align the precision of the credit scoring algorithms with the accuracy of our data...and provide the means to resolve our past mistakes.

Appendix

About Jeff Hulett

“I started in banking in the late 1980s, my first banking experience was in a bank branch as a loan officer. We were located in predominantly black neighborhoods of the northside area in Richmond, Virginia. Ironically, the branch was located near Jefferson Davis Highway. The ironic narrative is that Jefferson Davis was the president of the Confederate States of America during The U.S. Civil War era. Davis' unambiguous political mandate was to continue the enslavement of black people. The highway has since been renamed.

I made loans to many folks from the local community. It was very old school. People came to the branch to apply in person. I pulled credit reports on fax-like paper. Our credit decisions were purely judgmental. I was fortunate to be mentored by Mr. Edwards, a 30+ year banking veteran. He taught me about the “5 C’s” of credit and related traditional lending assessment techniques. He also taught me about the unique lending needs of our neighborhood. He was well respected in our Richmond community. Mr. Edwards was tough but fair.

Since then, I have been part of a massive automation and decision sciences-led banking revolution. My roles evolved. I went on to develop statistical models, implement algorithm-enabled decision platforms, lead enterprise algorithm-integrated risk management, and lead bank lending and risk management divisions. I have also worked for large banking and decision sciences-related consulting and software firms. I feel fortunate to participate in the Artificial Intelligence/Machine Learning revolution from the very beginning. I also appreciate my foundational lending experiences in Richmond.

From a social standpoint, I have come to appreciate our focus on credit algorithms is both helpful and high-risk. Helpful, because a credit algo only judges the current applicant based on their ability to repay a loan. It also drives down costs and increases credit availability. High risk because racism and other “-isms” still exist and related bias may be subtly present in the data used to train our models. As algorithms become more powerful, the importance of getting the data right increases substantially. Systemic bias is difficult to eliminate and must be managed proactively and systematically.”

Notes

[i] The late Harvard and MIT-trained scientist and systems researcher Donella Meadows said:

“…. most of what goes wrong in systems goes wrong because of biased, late, or missing information.”

[ii] By "color-blind," I'm referring to the lending protections afforded “protected classes” as defined by the Fair Housing Act. Protected classes include Race, Color, National Origin, Religion, Sex, Familial Status, and Disability.

[iii] The 3 major Credit Reporting Agencies (CRAs) are Trans Union, Experian, and Equifax.

[iv] For a nice primer on credit bureau data and usage, please see the Consumer Financial Protection Bureau’s (CFPB) whitepaper Key Dimensions and Processes in the U.S. Credit Reporting System

[v] Fair Lending laws make it unlawful for creditors to commit "Disparate Treatment" of protected classes.

[vi] The ECOA, as implemented by Regulation B, makes it unlawful for “any creditor to discriminate against any applicant concerning any aspect of a credit transaction (1) on the basis of race, color, religion, national origin, sex or marital status, or age (provided the applicant has the capacity to contract); (2) because all or part of the applicant’s income derives from any public assistance program; or (3) because the applicant has in good faith exercised any right under the Consumer Credit Protection Act.”

As an on-point example for this article, the ECOA requires lenders to provide applicants an "Adverse Action Notice" in the event they are declined for credit. This is a notice that explains the cause for the loan decline decision. In the linear-based classic FICO score world, decline reasons are straightforward. The weighting of the FICO scorecard itself will render a prioritized and individual decline reason. In the non-linear Machine Learning/Artificial Intelligence world, decline reasons are not easy to provide. By the very nature of these newer algorithms, causality is not a natural by-product of the machine learning process. Some "workaround" techniques have been proposed, like Shapley Values and other techniques, but consistent decline reasons continue to be a challenge.

[vii] Another reason why more sophisticated credit assessment techniques have not accelerated as quickly has to do with the robustness of the historical modeling approach. The FICO models, sold by Fair Isaac Corporation, have been in use for about 25 years. As such, much of the nonlinear, dynamic signals that are captured today with nonlinear-based automated Machine Learning techniques have already been captured within the linear-based traditional FICO models. This was done with "Machine Teaching" via years of nonlinear variable transformation-based human learning and model updating. Please see this article for more information: Can Machine Learning Build a Better FICO Score?

[viii] Structural / Systemic racism and credit data-related sources include:

Ludwid, S., Credit scores in America perpetuate racial injustice. Here's how

Blattner, L. and Nelson, S., How Costly is Noise? Data and Disparities in Consumer Credit

Heavan, W., Bias isn’t the only problem with credit scores—and no, AI can’t help

The following is an analogy narrative I find helpful to understand systemic bias:

Imagine there are 2 people with similar fitness goals, they want to be fit and are willing to work out, eat healthily, and generally live a healthy lifestyle. They were both born with similar bodies and their bodies respond similarly to fitness. The first person, Sheila, has access to excellent equipment, information about healthy workouts, and an encouraging community. The second person, Liam, has little access to equipment, little information about healthy workouts, and a health-indifferent community.

While Sheila and Liam share similar desires and physical proclivity regarding health (endogenous factors), the healthy road will be much tougher for Liam. In fact, on average, people that have the same structural environment (exogenous factors) as Liam tend to lead a less healthy life and with more health problems. THIS IS AN EXAMPLE OF SYSTEMIC BIAS. In this example, it is a function of opportunity availability that drives healthy outcomes, not the individual characteristics of Sheila or Liam. The question is one of "how" more than "what." We now know the "what" - or that systemic bias exists and is particularly exacerbated when combined with AI. The harder question is "how." How do we go about fixing the systemic bias challenge? That is what this article proposes.

[ix] The Urban Institute:

"Throughout this country’s history, the hallmarks of American democracy – opportunity, freedom, and prosperity – have been largely reserved for white people through the intentional exclusion and oppression of people of color."

[x] The Home Mortgage Disclosure Act (HMDA) requires lenders to collect loan-level protected class data.

[xi] The FDIC primarily uses the Loss Share mechanism during times of economic crisis, like the bank failures following the 2007-08 Financial Crisis.

[xii] Why are aligned incentives important?

In our view, aligned incentives are at the heart of the success of the current CRA-based system. Aligned incentives enable the CRAs to successfully a) collect banking data, b) utilize that data to build information products like scoring models (like the FICO score used to rate consumer creditworthiness), and c) provide "permissible purpose" (as defined by FCRA) information to banks for credit decisions, portfolio monitoring, and marketing.

Aligned incentives mean the banks are motivated to provide data to the CRAs because they generally get more out of it (the ability to evaluate creditworthiness) than the cost of validating and submitting the data (operational cost and competitive cost). Generally, it would not make sense for a bank to go it alone. Meaning, for a bank to access the large library of CRA credit data and CRA-enabled information products, they must provide existing customer data as a trade.

[xiii] If a company furnishes consumer credit account data regularly to credit reporting agencies, they have duties under the Fair Credit Reporting Act (FCRA) to correct and update consumer credit history information.

To assist data furnishers (such as banks, credit unions, consumer credit card companies, retailers, and auto finance companies) in this process, the credit reporting industry has adopted a standard electronic data reporting format called the Metro 2® Format.

Personal Finance Reimagined