September 18, 2025 | 6 minute read

An Image of Society: Gender and Racial Representation and Impact in Image Search Results for Occupations

by Danaë Metaxa, Michelle A. Gan, Su Goh, Jeff Hancock and James A. Landay

What I read

In this paper, the authors explore the way Google's image search algorithm prioritizes issues of gender and race, and the impact these results have on users. They conclude that there is significant underrepresentation of women and people of color, both in relation to total search results and to their representation to job data. This impacts the experience of searching.

The authors first describe the ubiquitous nature of search algorithms, and the extremely large use of Google, specifically. Algorithm audits have been used to study how these algorithms work in practice. Kay's 2015 study, which is then used extensively throughout the paper, showed underrepresentation of women in Google Image Search, particularly relative to their rate of participation in specific careers. The researchers indicate that their study will use a similar research methodology to understand the way Google Image Search performs five years later.

They describe related work focused on "algorithmic fairness," citing Crawford who discussed a sub-category of research described as "representational harm," where a system continues or enhances existing social inequalities. The authors describe that existing research has shown how images impact perception (and quickly), which then has real-world implications. Search engines, then, both draw from and change cultural narratives. There is a gap in the research, however, on how "algorithms in practice" impact user behavior and belief. Again, the algorithm audit is described as a tool for studying this, and this is both the focus and method of the study that is later described in depth.

Through two studies, the authors explored five research questions:

Do search results represent workforce data?

Has the representation of women in search results improved in the last half-decade?

Does increasing the prevalence of women and people of color in search results impact how people estimate the accurate representation of those fields?

Does this impact a sense of belonging in an occupation?

What is the impact of all of this on the experiencing of searching?

A description of the study methodology is skipped here.

The first study found evidence of under-representation of both women and people of color in two qualities of image search results—the overall results, as well as the relative proportion of results to career data. This has not changed since Kay's findings in 2015.

The second study found that increasing women's representation in search increases estimates of representation in occupations and impacts the searchers' interest in that occupation, and that participants' own gender and racial identity have the most impact on their searching experience.

The authors discuss their findings. They note that, while one may see a clear way to translate these findings into real-world outcomes, other research has indicated that changing algorithms may result in "occupational feminization," resulting in a decrease in pay and status markers based on increased prevalence of women in a particular job. Additionally, as the results of the study are embedded in a much larger context of "white male privilege," algorithm changes may be insignificant in making social change, without larger simultaneous shifts. The authors note limitations in their study, particularly in the grouping of gender and race into binary categories for the sake of analysis.

The authors conclude by summarizing their findings; they state that:

"We find evidence of marginalized people's underrepresentation in search relative to prevalence in the workforce; we also compare with work by Kay et al., but do not find evidence that gender representation in search has improved in the aftermath of social movements like Black Lives Matter or greater scrutiny to algorithmic bias."

And

"We find that varying these levels of representation changes people's perception of occupations, and also changes people's interest in the occupation, perception of its inclusivity, and perception they would be valued—and that these effects vary (often even more substantially) by participants' own gender and racial identities."

Brief Critical Discussion

This paper highlights an important part of how society and culture is shaped—the understated but large-scale presence of utilities like Search, and how use of these utilities creeps into everyday life to shape perception. The sheer scale of the use of Google, as evidenced by the authors' claim of 90% market penetration, shows how vivid any cause and effect of use of this tool might be. And since the use of a utility is seemingly innocuous, these things are understudied (as is also pointed out by the authors.) Given the very explosive and tense political climate focused around these topics, it's valuable to offer data-driven findings as a more thoughtful, rational way of participating in discussions of cultural values and change.

Studying this, too, is a huge task, and the method the authors selected is a reasonable way to work with something so large, complex, and opaque. As with any study, the method needs to be reductive enough to be operationally manageable; the authors make some important tradeoffs, including the use of static comps compared to real or simulated searching, and mechanical-turk labor rather than rich qualitative usage.

Another limitation of the study, likely driven by operational logistics, is based on the use of images as an input for the study itself. The authors selected Google Image Search to understand race and gender-based bias in search algorithms. Images are powerful; like words, they are open to interpretation, but unlike words, they can be absorbed, processed, and judged in seconds. Additionally, it's likely that, in a world of short videos, many searchers, if given a choice, will select visual media rather than text. But algorithms for image search results work differently than those for other forms of media, as they are "fed" and "trained" in different ways. Some of these differences are in semantic context, based on indicators like surrounding source material on the crawled page. Some differences are elective, in how accessibility tools like alt tags are prioritized. Many are increasingly based on LLM-trained models.

All of these introduce challenges related to actionable use of this study. What should an algorithm designer who is focused explicitly on Google Image Search do, as a result of reading this work? Is there even such a thing as an "algorithm designer," given the nature of how decisions are made at large companies like Google? It's likely that there is a Product Manager over Search, another over Image Search, another over Image Search Algorithms, some sort of cross-functional team across all Algorithms, and so-on.

The authors also may have inadvertently contributed to problems of discussion of race and gender differences through their binary grouping of data. In order to create an experimental context that is actually study-able, they have briefly reinforced the presumed and conservative view that there are two genders, and that race is one thing or another. It's likely difficult for a more liberal reader to value the results if they have deep-seated concerns about this framing, and they may completely disregard the paper's value based on this operational decision.

However, it's unfair to judge this paper solely on its lack of immediate "usefulness," or solely on any potential inadvertent harm, given the simply enormous and intertwined nature of a system like Google, and the equally intertwined nature of a "system" like American culture. It may be simply useful enough in its existence: in helping elevate topics like this as those worth studying, to shift priorities of researchers and research funding.

Article DOI

10.1145/3449100

Read some more

Some reflections on my pre-PhD preparation