> To measure adverse impact, we apply the EEOC’s “four-fifths rule,” which flags a position when one group is recommended at less than 80% of the rate of the most-recommended group
That seems like a nonsensical way to measure racial discrimination. What could justify it?
Have you googled this? The EEOC is a federal agency, and they've published on this topic quite extensively. The four fifths rule is used to define if there is a "substantially different selection rate". It does not measure racial discrimination. It measures selection rate.
It indicates there may be adverse impact to one group. It specifically is not used to resolve racial discrimination.
It's purely a signal for "we should consider asking more questions, because this appears unusual". That's what your quote says too, it "flags" a low recommendation -- it's indicating further study and investigation is likely warranted.
This is an application of the disparate impact doctrine. Even facially neutral policies are considered suspect if they produce results that correlate against protected groups, irrespective of intent.
This doctrine is the basis for much of employment law. It is a significant reason why employers don't administer IQ tests (or equivalents) to screen candidates since ~the 90s.
A common objection to the doctrine is that it leads to unfalsifiable discrimination claims, which is why it seems nonsensical to you.
Importantly, the rule is not used to resolve racial discrimination claims. It's purely meant as the first test to evaluate whether a deeper dive is warranted. Fast, first pass data analysis tools are very useful for spotting unintended consequences.
You are selectively adhering to the letter of the law, when the effects are already well known and studied. One is not obligated to ignore literature, nor abstain from doing a simple extrapolation from the incentives placed on the table.
There is a large body of literature concerning the question "does disparate-impact enforcement cause employers to alter hiring behavior in ways unrelated to actual productivity or discrimination?" and the answer is largely "yes". As you suggested elsewhere in this discussion, Google may be useful.
The assumption that applicants from all races are on average equally qualified for every position. Whole subfields of modern academia are based on that assumption.
> Since the 80% test does not involve probability distributions to determine whether the disparity is a “beyond chance” occurrence, it is usually not regarded as a definitive test for adverse impact. Instead, other statistically significance tests, such as the standard deviation analysis, may be used for this purpose.
But then my question recurs: isn’t this a ridiculous way to measure discrimination? It’s assuming that the only thing that differs between the different ethnic applicant pools is their ethnicity, which is essentially never going to be true.
It's not used to measure discrimination. It's used to identify outcomes that appear to be potentially discriminatory. You have to do the legwork afterwards.
Like. If I am evaluating a developer on lines of code written, I am a bad manager. But if an engineer has 40% fewer lines of code than the team median, it's absolutely ok for me to go, "Interesting. What's the story there? Are they slower or is there some other factor?"
Same idea -- this is purely a fast, first pass metric that can quickly assess if something warrants a deeper evaluation.
How would you like me to define "starting point" in a way that you believe you'll be able to understand?
If you are trying to say "more data needed, headline misleading" you should say that instead of misrepresenting the 4/5ths rule. Also the word "can" implies uncertainty of conclusion. This isn't ridiculous, the authors point out that this is the first large scale study of this topic. Nothing has been "proven" here, it's showing that this warrants further investigation and attention.
Do you read many academic papers, because you seem to be having a rough go here.
The European Union passed The Artificial Intelligence Act, which classifies:
High-risk – AI applications that are expected to pose significant threats to health, safety, or the fundamental rights of persons. Notably, AI systems used in health, education, recruitment, critical infrastructure management, law enforcement or justice. They are subject to quality, transparency, human oversight and safety obligations
Did I miss the part of the article where they break down how they determined race? Is the algorithm blind to race? It looks like they specifically looked at 83k people applying to ~100 companies which notably were Fortune 500 companies. Could there simply be candidate discrepancies here? Hard for me to follow the full methodology but it doesn't necessarily seem either malicious or that well structured. Don't you need to have a control group of applicants who are similar on paper? To allege DISCRIMINATION is quite bold.
Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053
The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.
"Cards held by African-American sellers sold for approximately 20% ($0.90) less than cards held by Caucasian sellers, and the race effect was more pronounced in sales of minority player cards."
I truly don't doubt it's possible for the AI to be 'racist'.
>If the AI had recommended Black and Asian candidates at the same rate as it recommended the most-favored group (typically white applicants), 40,000 more of their applications would have advanced to the next stage of hiring.
I don't think this is the right benchmark here, or at least, it would be very interesting if the actual outcome, offer or rejected, was considered at the end.
You are misreading this sentence. This sentence is saying: "Using a constructed dataset of resumes, whose only difference was a name change, we would anticipate a system evaluating on qualifications to produce an equal distribution of candidates across names. Our observed result was highly unequal, and that warrants further investigation."
Would be very interested to see how this affects post-50 workers. That's a protected class and I would imagine an ambulance chasing lawyer would be excited for a class action lawsuit.
Some job application websites I've seen actually have a yes or no option to consent to AI review that they claim is to simply assist HR and not actually screen you. I always select no. There is no way that selecting yes would ever be in my interest. I'm sorry, I'm going to force a real human to look at my stuff if I still can.
It won't be rejected. Your resume will be meticulously placed into a human review queue pending the allocation of someone to look at the contents. Meanwhile the position will be filled, and so serving no purpose the review queue will be emptied.
You don’t need a complicated study to find out, do it yourself for science. Get a resume, make few different versions but keep the context the same, change the layout (one time education on top other on bottom etc etc), and use different names to signal different backgrounds, and you can extend it to schools too and gender, and send it to the same employers, you will see wonders!!
I tried it before, and discrimination is there, I would get one resume rejected quickly and few days later the same company would invite another resume for a screening call. I tried this before and after AI hype, results weren’t that different btw, and that was tested in US and Canada employers only.
This is something I've been working on exposing to AI labs through my startup LatentEvals[1], and found similar results in other industries from lending to insurance claims.
Happy to share some sample reports if anyone is interested!
Don't have much to add beyond being grateful for everyone working to call this out, with a hope some lawsuits drop and our SCOTUS doesn't decide racial bias in AI is fine because we can't prove the AI is racist in its heart.
I’m sure (really sure) there are real problems with AI and bias, but this is a weird study that isn’t looking at resumes or anything, it’s looking at how candidates did in some weird psychometric tests.
> Using our large dataset of real hiring AI recommendations, we test our hypothesis. We find that people who submit multiple applications to positions screened by the same algorithmic hiring vendor are more likely to be rejected from every position to which they apply than would be true if the companies made decisions statistically independently from one another.
I would be surprised if the results were different.
Could the AI actually see the race of the applicants? Or was it just discriminating on the basis of some factor it found that was correlated with race, like SAT scores?
It rejected Asians more because of their higher SAT scores? If it’s not directly based on applicants disclosing their ethnicity then probably something more obvious like names.
I'm going to assume that people aren't allowed to put "don't send me black applicants" into their process even if they do see race in the application as that's entirely illegal.
The paper's conclusion, that we need to study this more, is showing the authors likely believe this to be a byproduct of inherent/invisible bias.
You are reading a paper without understanding the language of the paper. Adverse Impact has a specific meaning, and in this case it's specifically meaning that Black candidates were selected only four fifths as often as white candidates when their qualifications were identical. The study is only suggesting that further investigation is warranted.
> To put this in perspective: If the AI had recommended Black and Asian candidates at the same rate as it recommended the most-favored group (typically white applicants)
Some people just can't help but put their biases on display at every opportunity, even when it comes to the most minute details.
Nothing in this has any bias in it? Which words are you suggesting are biased? This study measured constructed resumes where only names were changed, and observed the rate each group was favored (the percentage of resumes that passed). One group must be "most favored" because thats how math works. It's the group whose percentage was the highest. The resumes were fictional and equivalent across race, only the names were changed.
Its fucking crazy that people are using these systems for important tasks like hiring. They have zero understanding about how these systems work. And LLMs are absolutely not designed to do those sorts of jobs, they're designed to be chatbots and to fool a human conversing them that they are responding intelligently. Of course they're gonna be useless at other tasks.
(I assume they're just using a big LLM for this, it doesnt say, it just says "AI" when they say "AI like that they usually mean LLM".. A custom trained hiring ML system would be better)
We can't take blanket percentages as a reason for racial bias. Were they all equally qualified?
Too many of these studies only focus on percentages and the end result is unqualified candidates getting hired from minority groups at the expense of qualified ones.
Please read the study or at least the comments here before jumping to the conclusion. Yes, they used constructed resumes, so the qualifications were exactly the same. And no, literally no one is suggesting this proves racial discrimination. It's applying the four fifths rule, a fast, coarse evaluation that is used to identify if maybe theres worth investigating more for a conclusive evidence of racial discrimination.
The authors are saying it's worth doing more research, because in a controlled data set the results appear unbalanced.
> To measure adverse impact, we apply the EEOC’s “four-fifths rule,” which flags a position when one group is recommended at less than 80% of the rate of the most-recommended group
That seems like a nonsensical way to measure racial discrimination. What could justify it?
Have you googled this? The EEOC is a federal agency, and they've published on this topic quite extensively. The four fifths rule is used to define if there is a "substantially different selection rate". It does not measure racial discrimination. It measures selection rate.
It indicates there may be adverse impact to one group. It specifically is not used to resolve racial discrimination.
It's purely a signal for "we should consider asking more questions, because this appears unusual". That's what your quote says too, it "flags" a low recommendation -- it's indicating further study and investigation is likely warranted.
I guess it measures if there's more than one std deviation gap between highest and lowest? Assuming that's twenty percent here
it sounds like how you'd get that kind of metric at least
This is an application of the disparate impact doctrine. Even facially neutral policies are considered suspect if they produce results that correlate against protected groups, irrespective of intent.
This doctrine is the basis for much of employment law. It is a significant reason why employers don't administer IQ tests (or equivalents) to screen candidates since ~the 90s.
A common objection to the doctrine is that it leads to unfalsifiable discrimination claims, which is why it seems nonsensical to you.
Importantly, the rule is not used to resolve racial discrimination claims. It's purely meant as the first test to evaluate whether a deeper dive is warranted. Fast, first pass data analysis tools are very useful for spotting unintended consequences.
You are selectively adhering to the letter of the law, when the effects are already well known and studied. One is not obligated to ignore literature, nor abstain from doing a simple extrapolation from the incentives placed on the table.
There is a large body of literature concerning the question "does disparate-impact enforcement cause employers to alter hiring behavior in ways unrelated to actual productivity or discrimination?" and the answer is largely "yes". As you suggested elsewhere in this discussion, Google may be useful.
>What could justify it?
The assumption that applicants from all races are on average equally qualified for every position. Whole subfields of modern academia are based on that assumption.
It's a starting point to flag.
Here's some analysis that took me 2 seconds of googling to find for you since you're clearly so curious: https://www.prevuehr.com/resources/insights/adverse-impact-a...
Thanks. I read the article:
> Since the 80% test does not involve probability distributions to determine whether the disparity is a “beyond chance” occurrence, it is usually not regarded as a definitive test for adverse impact. Instead, other statistically significance tests, such as the standard deviation analysis, may be used for this purpose.
But then my question recurs: isn’t this a ridiculous way to measure discrimination? It’s assuming that the only thing that differs between the different ethnic applicant pools is their ethnicity, which is essentially never going to be true.
It's not used to measure discrimination. It's used to identify outcomes that appear to be potentially discriminatory. You have to do the legwork afterwards.
Like. If I am evaluating a developer on lines of code written, I am a bad manager. But if an engineer has 40% fewer lines of code than the team median, it's absolutely ok for me to go, "Interesting. What's the story there? Are they slower or is there some other factor?"
Same idea -- this is purely a fast, first pass metric that can quickly assess if something warrants a deeper evaluation.
How would you like me to define "starting point" in a way that you believe you'll be able to understand?
If you are trying to say "more data needed, headline misleading" you should say that instead of misrepresenting the 4/5ths rule. Also the word "can" implies uncertainty of conclusion. This isn't ridiculous, the authors point out that this is the first large scale study of this topic. Nothing has been "proven" here, it's showing that this warrants further investigation and attention.
Do you read many academic papers, because you seem to be having a rough go here.
The European Union passed The Artificial Intelligence Act, which classifies:
High-risk – AI applications that are expected to pose significant threats to health, safety, or the fundamental rights of persons. Notably, AI systems used in health, education, recruitment, critical infrastructure management, law enforcement or justice. They are subject to quality, transparency, human oversight and safety obligations
That's a pretty common sense legislation to me.
The AI “safety” industry is lobbying for federal preemption so that states won’t have the power to enact these types of sensible regulations.
Did I miss the part of the article where they break down how they determined race? Is the algorithm blind to race? It looks like they specifically looked at 83k people applying to ~100 companies which notably were Fortune 500 companies. Could there simply be candidate discrepancies here? Hard for me to follow the full methodology but it doesn't necessarily seem either malicious or that well structured. Don't you need to have a control group of applicants who are similar on paper? To allege DISCRIMINATION is quite bold.
Definitely open to opposing or critical views
Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053
The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.
Ayres, I., Banaji, M. and Jolls, C. (2015), Race effects on eBay. The RAND Journal of Economics, 46: 891-917. https://doi.org/10.1111/1756-2171.12115
"Cards held by African-American sellers sold for approximately 20% ($0.90) less than cards held by Caucasian sellers, and the race effect was more pronounced in sales of minority player cards."
I truly don't doubt it's possible for the AI to be 'racist'.
>If the AI had recommended Black and Asian candidates at the same rate as it recommended the most-favored group (typically white applicants), 40,000 more of their applications would have advanced to the next stage of hiring.
I don't think this is the right benchmark here, or at least, it would be very interesting if the actual outcome, offer or rejected, was considered at the end.
You are misreading this sentence. This sentence is saying: "Using a constructed dataset of resumes, whose only difference was a name change, we would anticipate a system evaluating on qualifications to produce an equal distribution of candidates across names. Our observed result was highly unequal, and that warrants further investigation."
2 days ago there was another interesting article on the effects of AI in hiring[1]
I guess this one just compounds.
[1] https://news.ycombinator.com/item?id=48620142
Would be very interested to see how this affects post-50 workers. That's a protected class and I would imagine an ambulance chasing lawyer would be excited for a class action lawsuit.
Some job application websites I've seen actually have a yes or no option to consent to AI review that they claim is to simply assist HR and not actually screen you. I always select no. There is no way that selecting yes would ever be in my interest. I'm sorry, I'm going to force a real human to look at my stuff if I still can.
My fear is that pressing "no" on stuff like that is going to become an auto-rejection in the vast majority of cases
It won't be rejected. Your resume will be meticulously placed into a human review queue pending the allocation of someone to look at the contents. Meanwhile the position will be filled, and so serving no purpose the review queue will be emptied.
Oddly enough, being rejected by process versus being rejected by a person doesn't actually make me feel any better about the coming future
:)
It's probably not going to be an auto-rejection, it's just going to sit in a queue that looks like this
The paper is here: https://arxiv.org/pdf/2605.27371
They find "disparate impact" of pymetrics across racial groups, but it doesn't seem like they controlled for anything.
I don’t think AI screening is effective. But this study is just disparate impact.
You don’t need a complicated study to find out, do it yourself for science. Get a resume, make few different versions but keep the context the same, change the layout (one time education on top other on bottom etc etc), and use different names to signal different backgrounds, and you can extend it to schools too and gender, and send it to the same employers, you will see wonders!!
I tried it before, and discrimination is there, I would get one resume rejected quickly and few days later the same company would invite another resume for a screening call. I tried this before and after AI hype, results weren’t that different btw, and that was tested in US and Canada employers only.
This study only looks at one specific vendor algorithmn (a job assesment given by a company called pymetrics)
LLMs are trained on the Internet, which isn't exactly known for it's race agnostic opinions.
This is something I've been working on exposing to AI labs through my startup LatentEvals[1], and found similar results in other industries from lending to insurance claims.
Happy to share some sample reports if anyone is interested!
1. https://www.latentevals.com/
Don't have much to add beyond being grateful for everyone working to call this out, with a hope some lawsuits drop and our SCOTUS doesn't decide racial bias in AI is fine because we can't prove the AI is racist in its heart.
I’m sure (really sure) there are real problems with AI and bias, but this is a weird study that isn’t looking at resumes or anything, it’s looking at how candidates did in some weird psychometric tests.
> Using our large dataset of real hiring AI recommendations, we test our hypothesis. We find that people who submit multiple applications to positions screened by the same algorithmic hiring vendor are more likely to be rejected from every position to which they apply than would be true if the companies made decisions statistically independently from one another.
I would be surprised if the results were different.
Could the AI actually see the race of the applicants? Or was it just discriminating on the basis of some factor it found that was correlated with race, like SAT scores?
It rejected Asians more because of their higher SAT scores? If it’s not directly based on applicants disclosing their ethnicity then probably something more obvious like names.
Name. Other factors were controlled.
Where was this listed in the study? I can't find this anywhere in either the linked page or the Github https://algorithmichiring.github.io/
I'm going to assume that people aren't allowed to put "don't send me black applicants" into their process even if they do see race in the application as that's entirely illegal.
The paper's conclusion, that we need to study this more, is showing the authors likely believe this to be a byproduct of inherent/invisible bias.
I'm struggling to figure out what they're trying to say here in the linked (and very anemic) paper:
> 30% of Black applicants apply to at least one position that demonstrates adverse impact against Black applicants.
The whole thing reads like a tautology.
You are reading a paper without understanding the language of the paper. Adverse Impact has a specific meaning, and in this case it's specifically meaning that Black candidates were selected only four fifths as often as white candidates when their qualifications were identical. The study is only suggesting that further investigation is warranted.
> To put this in perspective: If the AI had recommended Black and Asian candidates at the same rate as it recommended the most-favored group (typically white applicants)
Some people just can't help but put their biases on display at every opportunity, even when it comes to the most minute details.
Nothing in this has any bias in it? Which words are you suggesting are biased? This study measured constructed resumes where only names were changed, and observed the rate each group was favored (the percentage of resumes that passed). One group must be "most favored" because thats how math works. It's the group whose percentage was the highest. The resumes were fictional and equivalent across race, only the names were changed.
Where do you think this sentence shows bias?
The phrase "most-favored" means, "most recommended by the AI relative to the field".
What did you think this sentence meant?
Its fucking crazy that people are using these systems for important tasks like hiring. They have zero understanding about how these systems work. And LLMs are absolutely not designed to do those sorts of jobs, they're designed to be chatbots and to fool a human conversing them that they are responding intelligently. Of course they're gonna be useless at other tasks.
(I assume they're just using a big LLM for this, it doesnt say, it just says "AI" when they say "AI like that they usually mean LLM".. A custom trained hiring ML system would be better)
Isn't HR basically just an LLM with ears and teeth?
We can't take blanket percentages as a reason for racial bias. Were they all equally qualified?
Too many of these studies only focus on percentages and the end result is unqualified candidates getting hired from minority groups at the expense of qualified ones.
Please read the study or at least the comments here before jumping to the conclusion. Yes, they used constructed resumes, so the qualifications were exactly the same. And no, literally no one is suggesting this proves racial discrimination. It's applying the four fifths rule, a fast, coarse evaluation that is used to identify if maybe theres worth investigating more for a conclusive evidence of racial discrimination.
The authors are saying it's worth doing more research, because in a controlled data set the results appear unbalanced.