Synthetic Profiles and Increased Potential for Fraud in a Credit Approval Process

0
222

Background/Objective

Research Question: Can synthetic profiles evolve over time to consistently beat a credit approval process? To answer this question, the paper dives into the credit approval process, specifically researching false negatives (missed or undetected risk) in credit approval using simulations. We study a system where the credit approval mechanism may rely on outdated data, while the attributes related to applicants change naturally (e.g., age) or circumstantially (e.g., change in employment status) and are adjusted intelligently by a fraud generating system to better the odds of credit approval. Our results indicate that static systems, such as depicted in this study, are truly susceptible to fraud. Our study also shows that using US valid demographic and financial information, code can be built to align with a fraudulent SSN in a meaningful way to increase the false negative counts. We further provide recommendations to address these areas of concern.
Methods: In our context, “fraud” is the act of impersonating a human, using a valid SSN (Social Security Number) and PII (Personally Identifiable Information), to apply for credit approval and to misuse the granted funds. Our study uses simulation methods to study vulnerabilities of credit approval mechanisms to fraud. We use data from various credible sources to generate synthetic profiles that are used in the simulation.
Conclusions: Our research shows that credit approval processes need to be modified to be dynamic and interactive; a standard online process for approval would not be sufficient to prevent fraud with the misuse of AI to generate  synthetic profiles. We also show how synthetic application profiles may be created via code to beat systems over time.

Keywords

SSN – Social Security Number
SP – Synthetic Profiles
ABM – Agent Based Mode
DoB – Date of Birth
PII – Personally Identifiable Information

Introduction

Background and Context / Literature Review

SSN information is requested for multiple services that every individual in the USA requires at some point in life, including request for driver’s license and registering at a doctor’s office or hospital as a new patient. Data loss from these agencies makes SSN information readily available. The data loss also provides other PII information readily to criminal organizations.

Credit fraud using stolen personal information such as SSN details1, DoB and Address is not new. While the estimate says that deliberate fraud is probably less than half of the 20M names attached to multiple SSNs, this data is old (from 2010). It is, therefore, quite reasonable to assume that the numbers where people are attempting to commit deliberate fraud by misusing available SSNs has gone much higher since.

Credit application process in the USA is built on the framework of using SSN and PII, which is why credit fraud is prevalent and costs financial institutions and consumers billions of dollars in losses each year.2

Coupled with that event and hacked information from organizations, including credit agencies themselves, criminals may misuse AI to synthetically manufacture profiles that are then used to apply for credit, i.e., the credit application fraud.

With the advent of AI and higher programming capabilities, there is a high potential to create “synthetic” application profiles, i.e., mimic a normal creditworthy human applicant using a matching of personal information in a programmatic manner. The issue is very serious since SP creation using AI can turn up many profiles with multiple combinations of the same or slightly altered data. Add to the fact that SSN information can be easily and fraudulently obtained3, a synthetic fraud profile can be assimilated quite easily and then used to request new credit.

Our concern is with this additional capability, and we question if existing personal information-based credit approval mechanisms may be able to identify such fraudulently built profiles.

Our approach is based on the following theoretical questions.

Question #1: As a criminal organization collects random PII information, including SSN information, and keeps feeding those information into an AI system with a singular ask to create an SP that may beat credit-approval/denial mechanisms, we question the following:

  1. Is the AI able to build those profiles by assimilating the diverse information?
  2. Are those profiles good enough, i.e. human-like, to beat existing credit-approval/denial mechanisms?

Question #2: In our study, since we do not have access to such an AI model or access to real-world PII, can we build a data model that can also build SPs on request?

  1. To simulate an SP with accuracy, can we use commonly available human data points (demographic information, employment data) to statistically (and randomly) create those profiles?
  2. What are the basic data points needed to feed into our SP generating mechanism?

Question #3: Since each credit-approval agency would devise its own approval/denial mechanism, can we create one within the study?

  1. Can we use it to process our SP-s?
  2. Are we able to predict the susceptibility of this approval mechanism when SP-s are thrown at it for processing?

Our data science processing answers “Yes” to questions 2 and 3, which leads us to a reasonable conclusion that in the real-world, there is a strong possibility that an AI based system would be able to do so, even better, and therefore beat existing credit-approval checks.

Hypothesis: A static credit approval process can be systematically defeated by synthetic profiles that evolve over time.

Problem Statement and Rationale

Our study bases its dataset on one fixed parameter to identify an individual, a unique SSN value that does not change. Using that identifier, we proceed to simulate a human’s profile (age, address, employment data, etc) followed by changes to those parameters over time by adjusting the profile based on randomized changes through our code. As these profiles keep manipulating and transforming, our study passes these through our fixed credit-approval/denial flow.

As an example, keeping the SSN constant, our code shifts its employment factor from unemployed to employed (outside manual control, randomized through our code model) and reapplies for credit. Our question(s) – as these parameters keep shifting, is the credit-approval/denial process keeping up? Our research question is to find out if these shifting profile adjustments and re-application through the credit approval/denial process is enough to prevent additional fraud in the approval process.

We are missing studies that intake valid human demographic points, throw it on to an intelligent system to create a simulated profile, one that keeps perfecting that profile, and then apply those ever-improving profiles onto existing credit-approval/denial systems to observe the results. In our study, that is the exact process we have attempted to do, i.e. think of a system that can be improved over time, with a singular mindset of attempting to create credit fraud by using the SPs it provides.

Therefore, the simulation process is within the code model and will continue to improve as the code is improved with better statistics such as human demographic, employment, marital attributes. The simulation process is automated and completely outside of any manual influence once the model has been created.

The risk is to pass on available PII information to an AI system to request an SP to be generated by it. An AI system with its vastly greater assimilation capability would be able to get these SPs generated much more accurately and therefore existing credit-approval/denial mechanisms need to be revisited to improve its internal approval/denial process and/or add additional guardrails in the process.

Data Sources: In this study we have used data sources (refer to the Appendix) from multiple agencies to build the SPs. Our code has been fed these overarching data points, which are all published data, and then used to build the SPs. The SP building process is all randomized within code so that it first creates an SSN, then grants it human data characteristics based on valid human demographics. As time changes, it keeps the SSN constant but transforms the other parameters with its coded logic. This logic may be altered and improved by anyone wishing to impose their own thinking into how a human’s profile should change over time.

Why are SSNs susceptible to fraudulent uses?   

The Social Security Administration (SSA) issued about 3.58M SSNs in 2023 on the basis of birth records in the USA that year4. In total, SSA issues about 5.5M SSNs per year5 the difference in those numbers being reasonably assumed as issued for immigrants in a legal application process.

On human deaths, many SSNs are unreported; SSA only has records of those deceased whose deaths have been reported to it either by the family members or by funeral homes. There exists no active mechanism from the SSA to identify and remove SSNs from circulation other than via trust in reported deaths6. That number reported is around 1.3M per year7 although the actual number of deaths in a year is near 3.27M per year8. This gives rise to a gap of around 1.97M SSNs that are potentially recirculated in the US each year and may be misused.

There are multiple well-understood reasons why there exists a large number of SSNs in circulation that should not be so; and this potentially feeds into the capability, by criminals or organizations, to use these SSNs to commit credit application fraud. One of the prime reasons for non-reporting are from those who may have committed a crime in their lives9 and have outstanding warrants on their names.

Another set are those individuals who have been reported missing10. Many are rendered homeless and easily sell off their identities including their SSN data: Whitey Bulger,  a crime boss in the Boston area, fled and went into hiding using identities of homeless individuals to outrun the authorities for 16 years. It is proof that getting hold of SSNs and associated information is easy to obtain3.

Finally, it is the cost associated with tracking down an identity as well as legal restrictions that result in SSNs kept available long after those should have been revoked. This is seriously misused in the USA by illegal immigrants. There is lax enforcement during hiring of illegal immigrants on the I-9 application, including the verification of SSN data purchased from various criminal enterprises since the SSA cannot enforce penalties. The Internal Revenue Service (IRS) does have enforcement powers but the costs and penalties to the unscrupulous hiring agencies are a pittance of the cost to chase down an illegal SSN user11.

Overall, including the fact that high volume of additional information is available over the dark web1213, it is very reasonable to understand why one in seven SSNs are re-used and at this time, absent legal powers during I-9 checks primarily, there will be rampant availability of SSNs in the black market for purchase.

In addition, the simplicity of creating a fraudulent profile with one credit agency and misusing that profile with another is also of concern14. “Creating a synthetic identity just requires an attacker to apply twice and with two different lenders. The first application will fail due to lack of credit history, but the record-check itself will start a credit file with a credit reporting agency. Then, the attacker’s second credit card application with a different issuer will typically succeed due to the existence of a credit file.”15

Why does credit application fraud occur so easily?

The issue is with the process being followed by the credit granting agencies where they rely on static PII (Personally Identifiable Information) which is the same static information being used by the criminal minds to create the synthetic profiles in the first place. In this world of profit, even tax firms who are supposed to critically and securely store their clients’ information end up selling customer information to others to bolster their profit margins16. These are extremely serious violations that again result in data being transferred across firms and therefore liable for misuse17.

SP based applications to credit services may potentially, over time, be able to beat the systems regularly and credit application fraud will become a major issue. We coded SPs generation method, simulated a standard credit approval system based on personal information to prove credit fraud using SSNs and other personal information would increase dramatically over the years. Our hope is that credit approval methods are upgraded to be more dynamic and interactive in nature so that intelligently generated SP-based attacks may be prevented.

Significance and Purpose    

While credit fraud is not new, AI incorporated into the fraud setup is the bigger risk. Our study points out that using static SSN and PII information may not be enough in the coming years to identify an individual accurately in the credit approval process. The costs associated with these fraud based losses will ultimately be transferred to customers while the corporations take a hit to their reputations.

Objectives

We intended to prove that synthetic application profiles generated to defraud credit approval systems can be formatted easily and that existing credit approval methods would fail over time in catching these fraudulent profiles.

In the first part, we simulate an SSN fraud process to understand its effects on the growth of SSN fraud in the coming years. We note that any simulation only approximates the real world and can’t accurately predict the future, which is not our focus. In this study, we are interested in understanding the dynamics and highlighting the problems that might occur if the systems do not improve.

In the second part of this study, we work backwards to think how a fraudster might use advanced algorithms like AI and machine learning to identify SSNs. Going through this exercise, not only motivates us to think critically about the problem but also gives us insights into how we can prudently avoid such scenarios by thinking forwards and developing systems and technology to counteract such activities.

Scope and Limitations         

Our data study and final data output including processing is done through code. We have not used any additional surveys or data mock-ups. The code is available here: https://github.com/DShreya1/DSCreditFraud/

It is also based on acceptable human behavioral characteristics and datasets that have been observed and reported via multiple reputed organizations including, but not limited to, US Government websites, NBC news, Cornell Law School and FBI.181939102021.

Human characteristics and behavior are valid constants in our ABM. Some data, such as zip codes and SSNs have been generated for our study.

We have also used statistics that may change over time, such as marriage rates, unreported death numbers to the Social Security Administration; those parameters need to be adjusted as statistics change.

Finally, our credit approval process is a simplified one since we do not have information on exact checks performed by each approval agency.

Additionally, statistics and demographic data change over time, therefore the constants used in our study may have changed slightly over time.

Following assumptions are taken while conducting this study:

Assumption 1: It is important to note that our study processes only online transactions for approval, i.e. there is no physical check of the individual who is applying for credit.

Our study creates a code-generated human profile, i.e. the  “Synthetic Profile”, and applies for credit (financial approval) using that profile. Our goal is to simulate the chances (probability) of getting approved for credit using these fraudulently generated personas. We hypothesize that as the code keeps manipulating the profiles to adjust them over time, the static approval flow (code-generated for our study) will fail to keep up.

The standard approval flow process in our study mimics a regular credit application check with either an approval or denial as the result. In this procedure, an individual provides PII information including a Name, DoB, Address, Citizenship / Residency status, SSN, Income sources, employment details, housing details and a permission to apply for new credit. The credit-approval/denial mechanism then processes the data to approve or deny the application.

Synthetic profiles accommodate dynamically adjusted features such as zip code changes or employment status to reflect the real-life changes in such data.

Assumption 2: This credit-approval/denial process has been built with our thinking of how credit approval agencies may be using the submitted profile information. The internal workings of approval / denial process followed by each agency are not openly available.

As we process these profiles through our approval mechanism, our study identifies a clear pattern of rising fraud probability over time as these synthetic profiles re-adjust and re-apply to the fixed credit approval process, and keep beating it over time using better simulated fraud profiles.

Assumption 3: This credit approval process, outlined later in our study, is built through code and does not adjust to the profile changes. This non-adjusting mechanism is the risk we identify from our study.

Through our experiments, we show a clear pattern of failure in the credit-approval/denial checks as the profiles and the credit approval procedures are outdated. Our study therefore points out a major concern for regulators and credit approvers to act on. They need to harden and check their approval process mechanism(s) to firm up against this threat.

Assumption 4: Personally Identifiable Information details are readily available and our model is built on those data points. Humanly, it is not easy to parse through billions of data points to identify a single individual, but an AI system will have the capability. It will be able to tie the disparate data into an accurate profile that matches accurately to that of the human individual. Those profiles will be much more sophisticated than ones we are creating via code and therefore credit approval has even more susceptibility to fraud.

Theoretical Framework       

Discussed in “Background and Context” earlier in the document.

Methodology Overview

In this paper, we focus on SSN-based fraud happening due to the lag in information updates, either due to changes in the information related to the person or reporting of the deaths(3-A). For example, a person moving to a new address has outdated information registered with the SSN number. As a result, a fraudster with access to the old information paired with the SSN may use it to get credit approval. Note that we use credit approval just as an example; a fraudster can use it for various purposes, such as using it for rental applications or car loans masquerading as a different person.

We first understood the concept of SPs, its fraudulent use and examples. Then we delved deeper into the fraud perpetration mechanisms employed and finally drilled down specifically on credit application fraud using SPs.

Our task then was to be able to realistically generate synthetic applicant profiles for which we did extensive study of data available. These data were related to human behavior and accumulated statistics of such behavior, examples being a typical homeowners’s average age, how often people change their residential addresses, marriage and divorce rates and employment rates.

After this study, we coded our SPs generation method. We expanded the dataset to a large enough number of such profiles so that we may generate those accurately reflecting population characteristics such as age groups, marriage probability, homeowner / renter probability, zip code and location, etc.

We also extensively studied the credit approval process, specifically inputs needed for approval in real life. We also coded a typical credit approval system. Next, we ran those simulated profiles through the credit approval system to identify missed cases of fraud as well as valid identification of fraud.

Finally, we expanded the entire study over many years, keeping the credit approval system as-is. The profiles kept updating themselves using age progression and then adopted behaviors for those ages. That capability of the profiles to self-adjust was critical since it proved that a static credit-approval system would let in more fraud over time.

The design approach to building an SP to mimic a human applicant profile was as follows.

  1. Gather population data such as age group distribution ranges, marriage rates, employment rates from various published datasets. Please review “constants” in the Appendix and Reference numbers for each. These are the exact human patterns selectively used within this study.
  2. These data sources have been collected through demographic studies (outside of our study, please check the references for each data point used). In our study, these characteristics, when selected randomly through code, was the base for defining a human-like persona, i.e. our SP. Each SP therefore is a random “human” with characteristics chosen from the available sets of demographic and financial data.
  3. In our study, we have not used any samples. Instead, we took overarching human data points from the macro understanding, then understanding the distribution of each, (e.g. age distribution in the population) instructed our code to randomly build our samples, i.e. our SPs.
  4. For the credit approval mechanism, we built the code (section “Procedure”, page 9, point 5) that is our assumption on how a standard credit approval/denial is performed. This method is open for additional strengthening based on community feedback. Our model (‘query_ssn’ method, https://github.com/DShreya1/DSCreditFraud/blob/main/DShreya-CreditFraudSynthetic.ipynb) therefore is a work in progress and at this point does the following processing:
    1. Each SP has profile data, which is its true setup information. Those are:
      1. ssn, birth_year, age, zipcode, employment, changeed_employer, moved_home, home_owner, prior_credit_history and marriage
      2. It also knows within itself if it generated this data in a fraudulent manner through the field “fraud”
      3. Each such object instance, is initialized through the def _init_method
    2. Each year, it updates randomly its dataset to make itself fraudulent through the random setting in method update_fraud based on real world research data, i.e. typical credit fraud rate
      1. Therefore, the study knows which profiles are modified to a fraudulent one and keep track of how the credit approval is able to adjust
      2. The code may therefore be adjusted on the constants used, the design of a “profile” with more relevant parameters, the update of the credit check to include more real-world checks, to keep it updated with latest changes.
    3. It then uses a function called check_info, which compares its valid internal data with the yearly updated data to check if both match. E.g. if its zipcode matches the recently (one year subsequent) zipcode and employment matches the recent (one year subsequent) employment data, etc., it says that the credit approval should proceed.
      1. However, in step 4.a.ii, the SP, i.e. “self” instance, has updated itself to a fraudulent profile
      2. The check is whether this simplistic comparison of ssn, birth_year, age, zipcode, employment, changeed_employer, moved_home, home_owner, prior_credit_history and marriage is therefore enough to find out fraud, i.e. if the system has updated a profile to fraud=”YES”, and readjusted its internal information to mimic a typical human’s progression of demographic, financial and personal progress, is the “check_info” strong enough to find those fraud changes
      3. Our goal is to strengthen the “check_info” in a way that typical human parameter changes should not form the basis of credit check. Our study proves that if we continue with these static checks, a system such as ours may find typical patterns of human movement to readjust itself and therefore beat a credit check process.
    4. This change (i.e. the system updating each of its starting profile once each year with a fraud=”YES” or “NO”) should be caught by a credit approval mechanism, i.e. it should be specific enough so that simple picking of valid available data of human characteristics against an SSN that is not in use (or not reported to the government as not in use) should not make the profile human-valid. Our concern is that an AI system will go much deeper and collect very specific information about humans to format these fraudulent profiles.
    5. Our study has the baseline profile per instance, then updates those randomly each calendar year. The baseline and the changes are picked from available data such as human age distribution, employment chances, marriage rate, etc. The study does not use AI to pick and build selective profiles. Instead, it takes an SSN available rate (which is one-out-of-seven) for fraud use, creates a human-like profile with it, then uses the “fraud” profile to apply for credit.
    6. This is a limitation in our study, i.e. we are not using human data gleaned by an AI system to create a fraudulent profile, we are getting readily available characteristics instead to build the profile. In a real-world scenario, we would use the AI system to collect true individual characteristics of someone who has died, find an SSN that has not been reported as fraud, tie the two data-sets into a meaningful profile, use human demographic/movement/financial/employment data for any missing element in the profile, and then submit for credit application.
    7. There is no cryptographic approach in the study.
    8. We do not find a study that takes the fraudulent potential use of SSN (one-in-seven possibility), uses any coding mechanism to create a human-like profile with one and apply for credit.
    9. Credit systems have to evolve to counter this threat. Static data checks cannot be used. For example, in-person application may be one answer. Searching the dark-web for all not-in-use-ssns could be another. Holding a fraudulent application’s SSN in a “lock” mode at a central setup could be one more. Our study is showing that profiles may be created with available SSNs in a meaningful way to beat credit approval if additional checks are missing. Real-world fraud detection is agency specific, i.e. the credit approver specific. We simply point out that typical application processes with static datasets are more prone to AI based fraud which can glean social media data and apply typical human changes into meaningful profiles that a credit-approval mechanism would consider valid. The study does not solve or offer solutions to the problem statement.
    10. The study takes true human change patterns (e.g. job change probability, marriage probability along with one-in-seven SSN misuse scope) to build profiles. There are no assumptions made. These are true US specific numbers used to build the SPs and are standardized comparable metrics for US population.
    11. The study asks the question about how an AI system could potentially misuse the one-in-seven floating SSN values, which a credit system, unless informed via a government feedback loop, will accept as a valid individual applicant. Other potential approaches to fraud in credit are not being  considered.
  5. In the study, we asked our code to create 10,000 SPs, then pass those into the credit check model, note the approve/denial rate, and re-adjust the SPs to align with human changes (e.g. job loss change probability, home ownership probability) each year. Then, those 10,000 altered profiles were again applied to the credit approval model. Our study was therefore focussed on a system building intelligence over time, knowing how human profiles change, readjusting itself and then requesting credit with a “better” profile the next year.
    1. Our scope is simply to make the SPs better over time, reapply and see if we could beat the static credit approval model
    2. Our goal is to prove that keeping static credit approval models, e.g. standard parameters like employment data, SSN info, age, etc, may not be enough within a model since the applicant (our SP) is intelligently changing its characteristics
  6. The study has limitations in the following:
    1. It only considers those data points that are requested in a credit application, i.e. SSN, DoB, employment and marriage details.
    2. These are generalized human datasets that the code has used to create an SP. In a real-world situation, an AI will be able to collect deeper and richer sets of information about a person and only use the generalized datasets for the parts it is unable to obtain.
    3. The credit check algorithm is simple – it compares the person’s known profile with one that has been aged by a year, matching the SSN and Age as constants, and makes a determination if this SP is credit worthy. In a real-world scenario, each agency would use its own algorithm to approve/deny credit. The author intends to pursue deeper understanding of the algorithms and create one that can prevent AI based fraud in SP creation keeping in mind that static changes and easy to glean social profiles should never be used in credit checks.

Results

Results

Table 1: Our study generated the sample profiles. Each profile is an SP. The first record is a person with SSN # 246251444, who is of age 20 as of Year 2020 (birth year 2001), who currently lives in Zip Code ZIP-18, is employed, is married and a homeowner.

When this SP requested for credit approval, the credit approval system “Approved” his/her application. We know, based on how the SP was created, whether that SP was mimicking a fraudster or a genuine person, i.e. the “Really Fraud” is known only to the study.

Figure 2: Confirms the unbiased nature of age groups considered in our study.
Table 3: We started processing with the first set of SPs in the year 2020. We allowed the system to alter the SPs for 15 years to adjust. In Table 3, the left shows that to start off, the credit approval mechanism is able to distinguish genuine applications, but over time, as represented in the right table, it starts to approve fraudulent profiles in larger numbers.
Table 4: As explained in our study, the SP keeps adjusting itself to better the data within and then keeps applying to the credit approval model each year.

In the year 2020, the credit approval mechanism received 3,097 requests, of which it approved 2,796 and failed 301. That year, the first year in our setup, an overwhelming number (2,669) of those approvals were SPs that were “valid”, i.e. not fraudulently established.

However, in 2034, 15 years later, the SPs had been able to adjust better and in that year, the model approved 3,428 fraudulent profiles. Table 4 depicts these changes.

Table 5: shows that over time, the credit approval model is unable to keep up with the ever changing SP data.

In 2020, it approved only 391 fraud SP profiles but as the SPs kept adjusting, in 2034, the approval model ended up approving 3,102 fraud SP profiles.

Our study simply wants to point out that SPs can readjust and reapply using the same SSN and Age but modify other data (marriage, homeownership, employment) to beat the approval mechanism over time.

Figure 6: shows that over time the credit approval model, which has been kept static, will approve more fraud applicant profiles. This is simply due to the fact that its model is not adjusting to the SP model which keeps understanding human changes based on demographics, income, marriage, etc. Our study shows that basing approvals of Age and SSN consistency may not be enough since an intelligent model may readjust to an extent that it appears to be genuine.

Based on the output of our simulated study, it is clear from Figure 6, that as we keep manipulating the profiles and applying those over time to a credit approval process, there is increased chances of fraudulent approval. There is also a higher possibility of rejecting a valid profile. This leads to the conclusion that when simulated profiles are attached to credit approval systems which are not modified to adjust for these seismic changes, we are under threat of credit application fraud increases over time.

The study offers some suggested approaches to resolving the credit approval changes to align with the ever-growing SP threats.

Discussion

Synthetic Profiles are a serious threat to our financial systems because AI tools may be used to continuously read demographic data, match known dark-web available profile information to create immaculate profiles which are then applied to credit approval systems. As these systems age, they will be unable to read the threat correctly and will approve fraudulent applications resulting in serious monetary losses for multiple businesses. The synthetic profile generation may be extended into many other fields with the ABM defined and may have applications in various other industry or research scenarios.

In summary, we focused on a particular manner in which SSN based frauds are happening. This is based on specific cases that have come to light where fraudsters have mimicked a genuine individual using SSN and other personal information to defraud creditors. We understood the process deeply by simulating such a mechanism itself. This gave us insights into how a fraudster or their network can utilize the available information about SSNs and leverage advanced technologies like AI to increase their fraudulent activities.

Finally, we discussed ideas with policy implications to counteract such activities, specifically,

– What factors contribute most to fraudulent approvals?

– What real-world countermeasures can prevent such fraud?

Recommendations: Proactive measures to reduce fraudulent activities

It is well known that data is the key to developing any AI system. The more data one can accurately create, the better an AI system will be. We described one way how a fraudster can use generated data to commit fraud. It is therefore natural to counteract such attempts by being proactive, as individuals and as government agencies.

Our study is based on the premise that any available information such as SSN is susceptible to fraudulent use. Since AI systems keep updating its internal datasets with demographic and personal information, it can tie those pieces into a meaningful human-like profile with an available SSN.

In a real-world scenario, the AI system will create a profile based on its knowledge of an individual, identify an available SSN, fill in the gaps with demographic / financial information typical to a zip code and be able to build an SP that beats a credit fraud check.

We suggest the following ways to overcome this problem.

[Policy] Awareness among the users to keep their data private and not share them anywhere. This is tough to achieve since our research shows that data is shared and stolen even from agencies entrusted with safekeep of the data12.

[Technological] Better systems to keep the information up-to-date, i.e., to avoid any lags. However, in the specific case of SSN misuse, we see that there will be a lag since it is up to private parties to provide SSA the death information6. This lag needs to be eliminated.

[Policy / Technological] Funding for research technologies and methods so that individuals can share their personal information without explicitly disclosing the information. Additionally, fund the research in cryptography to design methods for effectively validating information or developing better AI systems to detect the possibility of fraud.

[Process] If a credit applicant profile associated with an SSN is flagged for attempted fraud, share the data with other credit approval agencies. Subsequently, unless cleared through personal verification (video / in-person), keep that SSN locked.

[Government / Private partnership] Keep laws of credit processing updated to break a consistent approval pattern as soon as those are observed in a private fraud scenario. With a continuous loop of fraud patterns shared with government agencies, and tightly restricted to other corporations in that same domain, it is possible that there may be an industry standard that is continuously understood and updated to beat fraud.

Further research is required into:

  1. Exact credit processing mechanisms in each agency and how they notify each-other of fraud attempts
  2. SSNs used for fraud: mechanism to report to the government agencies to validate if those are associated with deceased individuals (or those who may have left the country)
  3. Feeding research based strengthened profiles into a research AI model which is able to collect social media data, build the missing parts of a profile and work with a credit approval agency to check its approval mechanism
  4. Continue the step 3 in a loop to strengthen profiles and adjust credit approvals.

Methods (For Research Papers)

This section is built to provide understanding of the code. Details of the approach and thinking have been covered in the “Methodology Overview” earlier.

Research Design         

We have used Experimental Research to create and study the effect of simulated applicant profiles when applied on credit approval process (which is held constant).

Participants or Sample         

Sample data for this study mimics human behavioral and demographics data. We simulated SPs based on true and available US population data. See References section for additional information on all data sources. SPs were programmatically generated using code in a randomized fashion based on this demographics dataset so that introduced biases in the profiles are minimal.

Data Collection 

Demographics data are available freely on the Internet from reputable sources181939102021. No additional tool or method was employed. Since this study involves simulation of human applicant profiles, the available demographics information was used as the single basis for randomly generating SPs for use in the study.

Variables and Measurements        

There are 4 measurements under investigation when 10,000 SPs were applied to our credit approval system over a period of 15 years:

  1. Approval graph of Fraudulent SP — main concern of the study
  2. Denial graph of Valid SP — secondary concern of the study
  3. Approval graph of Valid SP
  4. Denial graph of Fraudulent SP

NOTE: These SPs, time-period of study and credit approval process were all configured and generated in code and the application of SPs to the approval system was also within code. The code is based on Python language using various libraries for processing and graphical representation.

Procedure 

Please reference code to match the procedure description and matching sections: https://github.com/DShreya1/DSCreditFraud/ Main Code: https://github.com/DShreya1/DSCreditFraud/blob/main/DShreya-CreditFraudSynthetic.ipynb

Section 1: Imported libraries

Section 2: Study constants – see Appendix

Section 3: The SP Agent We defined an SP class, i.e., to randomly generate human profiles, all valid at start – Class SSNGenerator. – see Appendix

Section 4: We generate a maximum of 10,000 profiles for the study.

Section 5: We defined our credit approval model. This is “query_SSN”.

Section 6: We extended our SPs over 15 years by applying the 3 rules – age increase, personal details change, random chance of a criminal picking up a profile to use for fraud

Section 7: We have our entire simulated data-set including the SP information as well as the Credit approval decision

See Table 1.

Section 8: We validated that our age group still matches the general population

See Figure 2.

Sections 9 and 10: We commented this; it was used to make sure our overarching dataset is still looking good when matching demographic patterns. One may uncomment to understand the data set better. The author simply highlights that uncommenting these sections may allow a better understanding of the data as it modifies through the flow.

Sections 11 & 12: We are able to see the subset of each profile where credit system approval results are tallied with fraudulent / valid profiles

See Table 3.

Sections, 13 through 21: are code to process and identify the 4 measurements under investigation which is then represented in the last graph

See Table 4.

See Table 5.

Section 22: This shows that using SPs that can adjust over time, fraudulent approvals of credit will increase if the credit-approval process is not modified to adjust for AI and simulation.

See Figure 6.

Data Analysis    

Data analysis was based on the data generated by our code and using our own built code to check how fraud increases and how a (static) credit approval process fails to adjust. The underlying principles of a human profile were utilized, and all profiles constructed were within bounds of those rules. We added the randomization of fraud into profiles over the years via the code update_fraud, update_personal_details, age_one_year methods and continuously checked those against the credit approval method, i.e., query_SSN. Finally, these changes were plotted over time to collect the 4 measurements that we wanted to investigate.

Ethical Considerations        

N/A

Appendix    

Study constants

Statistics and demographic data change over time, therefore the constants used in our study may have changed slightly over time.

Research shows a single SSN attached to multiple names, one-in-seven, leading to an enormous potential for fraud. Since SSN is a mandatory input for credit application, this was the main basis along with others needed for credit approval.

Data was collected on typical inputs needed for a credit check. This included characteristics such as zip codes, employment, marriage, home ownership.

To simulate an SP, demographic data was collected. This included:

  • Employment rate 16 years and older – 62.6%
  • Unemployment rate 16 years and older – 4%22
  • Credit application 25 years and older – 73%20
  • Never use credit – 10%23
  • Marriage rate 15 years and older – 48% married at any given time24
  • Divorce rate – 47.76%25
  • Home ownership probability – 66%26
  • Homeowner average age – 56 years27
  • Age distribution by age groups in the general population21

The SP Agent

We built other methods as follows:

  • First, update_fraud, randomly assigns one of these profiles as fraudulent using the one-in-seven potential chances
  • Second, age_one_year, that ages an SP by one year and internally also adjusts the characteristics based on demographic data set
  • Third, update_personal_details, updates potential changes in a user’s profile
  • Fourth, check_info, which is used by our credit check simulator

References

  1. “Odds someone else has your SSN? One in 7” By Bob Sullivan, Columnist, NBC News Dec.3, 2010 https://www.nbcnews.com/technolog/odds-someone-else-has-your-SSN-one-7-6c10406347 []
  2. Credit Fraud Losses https://www.ftc.gov/news-events/news/press-releases/2024/02/nationwide-fraud-losses-top-10-billion-2023-ftc-steps-efforts-protect-public []
  3. Boston Mobster James “Whitey” Bulger, an example to wealthy San Franciscans on reaching out to the Homeless By Allen Jones Jul 14, 2019 https://calclemency.medium.com/boston-mobster-james-whitey-bulger-an-example-to-wealthy-san-franciscans-on-reaching-out-to-the-bf43428adb23 [] [] [] []
  4. “Social Security number holders” – Social Security Administration website https://www.ssa.gov/oact/babynames/numberUSbirths.html []
  5. “FAQs” – Social Security Administration website https://www.ssa.gov/history/hfaq.html#:~:text=Even%20though%20we%20have%20issued,changes%20in%20the%20numbering%20system []
  6. ”How to notify the Social Security Administration of a death” https://www.protective.com/learn/how-to-notify-the-social-security-administration-of-a-death [] []
  7. “Social Security’s Death Records” – Social Security Administration website https://oig.ssa.gov/congressional-testimony/2012-02-02-newsroom-congressional-testimony-hearing-social-securitys-death-records/#:~:text=Each%20DMF%20record%20usually%20includes,1.3%20million%20records%20each%20year []
  8. www.statista.com “Number of deaths in the United States from 1990 to 2022” https://www.statista.com/statistics/195920/number-of-deaths-in-the-united-states-since-1990/#:~:text=In%202022%2C%20about%203.27%20million,and%20from%202.85%20in%202019 []
  9. FBI publication, ucr.fbi.gov https://ucr.fbi.gov/nics/reports/active-records-in-the-nics-index-by-state [] [] []
  10. Law Enforcement Resources: “2023 NCIC Missing Person and Unidentified Person Statistics” https://le.fbi.gov/file-repository/2023-ncic-missing-person-and-unidentified-person-statistics.pdf/view [] [] []
  11. “How do Undocumented Immigrants Pay Federal Taxes? An Explainer” By  Hunter Hallman Mar 28, 2018 https://bipartisanpolicy.org/blog/how-do-undocumented-immigrants-pay-federal-taxes-an-explainer/ []
  12. “AT&T Addresses Recent Data Set Released on the Dark Web” March 30, 2024 https://about.att.com/story/2024/addressing-data-set-released-on-dark-web.html [] []
  13. “Equifax Data Breach Settlement” February 2024 https://www.ftc.gov/enforcement/refunds/equifax-data-breach-settlement []
  14. “credit card fraud” from Cornell Law School https://www.law.cornell.edu/wex/credit_card_fraud []
  15. “Credit Application Fraud Prevention” By www.f5.com https://www.f5.com/go/solution/credit-application-fraud []
  16. “FTC Warns Tax Preparation Companies About Misuse of Consumer Data” September 18, 2023 https://www.ftc.gov/news-events/news/press-releases/2023/09/ftc-warns-tax-preparation-companies-about-misuse-consumer-data []
  17. “3 tax prep firms shared ‘extraordinarily sensitive’ data about taxpayers with Meta, lawmakers say” By Fatima Hussein https://apnews.com/article/irs-taxpayer-tax-preparation-meta-congress-9315cfca7a0942ab89f765d183fbf822 []
  18. “Odds someone else has your SSN? One in 7” By Bob Sullivan, Columnist, NBC News Dec. 3, 2010 https://www.nbcnews.com/technolog/odds-someone-else-has-your-SSN-one-7-6c10406347 [] []
  19. “Social Security number holders” – Social Security Administration website https://www.ssa.gov/oact/babynames/numberUSbirths.html [] []
  20. “How Many Americans Use Credit Cards?” https://www.forbes.com/advisor/credit-cards/credit-card-statistics/ [] [] []
  21. Government Census Data https://www.census.gov/popclock/data_tables.php?component=pyramid [] [] []
  22. United States Unemployment Rate https://tradingeconomics.com/united-states/unemployment-rate []
  23. “Credit Card vs. Cash Statistics” https://www.forbes.com/advisor/credit-cards/credit-card-statistics/ []
  24. “How does marriage vary by state?” Tue, July 25, 2023
  25. https://usafacts.org/articles/how-does-marriage-vary-by-state/ []
  26. “Divorce rate” https://www.wf-lawyers.com/divorce-statistics-and-facts/#:~:text=Almost%2050%20percent%20of%20all,8 []
  27. “2024 Homeownership Rates in the U.S. by State and City” https://www.propertyshark.com/info/us-homeownership-rates-by-state-and-city/ []
  28. “Repeat buyers” https://www.axios.com/2023/11/20/american-housing-market-older-homeowners-2023 []

LEAVE A REPLY

Please enter your comment!
Please enter your name here