Does artificial intelligence reproduce inequality?

Hidden dangers behind training sets and face recognition systems

In contemporary society, artificial intelligence, algorithms, and big data have deeply involved in our daily lives. As Just (2016) pointed out, the selection of algorithms not only affects the behavior of internet users but also reshapes their ideas and way of thinking. As people become increasingly reliant on digital technology, the influence of algorithms is also growing, and they have become an indispensable part of our lives. Through algorithms, we can search for information more conveniently, make decisions more efficiently, and make our lives more efficient and convenient. AI is more efficient in data collection and information processing than manual labor, so some professional analysts believe that using machines to replace humans in decision-making areas will yield better returns, and society will be improved (Andrejevic, 2019). Analysts at information consulting firm Gartner believe that predictive modeling, algorithm systems, and computer simulations are far superior to manual labor because the more information obtained during judgment, the more accurate the results will be, and machines will eliminate all emotional views, making purely rational judgments (Andrejevic, 2019). However, there are also different views. Crawford (2021) believes that artificial intelligence is not an objective, universal, or neutral computing technology. Instead, it comes from the minerals and energy of the earth, the labor of human bodies, and the massive amounts of data generated by human society every day. It is embedded in society, politics, culture, and economics as a power structure that combines infrastructure, capital, and labor, invariably reproducing and expanding existing structural inequalities (Crawford, 2021). This blog will take ImageNet and facial recognition technologies as examples to analyze the risks behind them and the current or emerging laws.

First, we need to understand why artificial intelligence (AI) needs data, why data bias arises, and where this data comes from. With the emergence of the notion that “data is the new oil”, users’ personal privacy data has become a “consumable resource”. Treating data as something abstract and intangible, like a “natural resource”, has to some extent taken away the users’ control and ownership over their own data. (Crawford, 2021, p.113). These data may appear anonymous, but in reality, they can provide a wealth of highly personalized information, severely invading privacy. Furthermore, when treating data as “raw materials” similar to oil, the AI industry tends to detach specific historical contexts, narrative contexts, and human characteristics from the data, which will have serious consequences.

Supervised machine learning systems designed for object or facial recognition are trained based on a large dataset containing different images (Denton, 2021). On the software side, the algorithm statistically analyzes the images and develops a model to recognize differences between different categories (Crawford, 2021). Therefore, training sets are the foundation for building learning machines, and they are also crucial in understanding social issues related to AI because they shape and control how AI systems recognize and interpret the world.

Machine learning systems have developed rapidly in various important social fields, but many examples have shown that the failure of these systems in certain populations is a direct result of the lack of representativeness, false statements, or complete lack of representativeness in the datasets. First, this blog will analyze the creation of the ImageNet dataset and its importance in the field of visual object recognition.


ImageNet is a manually annotated dataset consisting of over 14 million images, divided into approximately 20,000 categories(Denton, 2021). The classification structure of the dataset is derived from a large English word database called WordNet, which is organized into a hierarchical structure based on semantic relationships (Denton, 2021). Despite some setbacks in its creation, the dataset became possible thanks to the emergence of technologies such as digital image sharing and web search engines (Crawford, 2021). ImageNet organizes hundreds of thousands of English words into a massive ontology, demonstrating the relationship between concepts and the visual world by associating each concept with its accompanying image (Denton, 2021). ImageNet is considered one of the most important training sets in AI history and is also seen as a key driver of the deep learning revolution (Birhane, 2021). However, classification creates visibility for some things while rendering others invisible (Birhane, 2021).

ImageNet classifies people into various categories, including race, occupation, gender, economic status, behavior, personality, and even morality (Crawford, 2021). In gender, for example, it is divided into male and female, relying on binary and cisgender perspectives. This simplified view of gender fails to capture the complexity of gender and does not address issues of transgender identity, nor does it indicate whether these gender labels in the classification system refer to gender identity or biological sex (Buolamwini, 2018). This exclusion of transgender people results in harm to them (Birhane, 2021).

Furthermore, as shown in the image, the classification in ImageNet includes a large number of racist and sexist categories.

Meanwhile, the images in the “person” category of ImageNet were collected through image engines, which means that users’ selfies or other photos were stolen without their knowledge and categorized without anyone knowing why their photos were categorized in a certain way. As shown in the pictures, there are many absurd labels, such as a woman’s selfie being labeled as hermaphrodite, a picture of a beach being labeled as kleptomaniac, and a pregnant woman holding her pregnant belly being labeled as snob (Crawford, 2021). This act of ridiculous labeling people based on their physical characteristics has also raised concerns about the pseudoscientific ideologies of physiognomy (Birhane, 2021).

Visual intelligence requires not only low-level pattern recognition, but also the relation of meaning related to vision(Denton, 2021). The object recognition proposed by ImageNet lacks narrativity and restricts the relationship between images and concepts in a non-intermediary, opaque mode (Denton, 2021). Buolamwini (2018) believes that algorithmic fairness is based on different contextual assumptions and accuracy optimizations. Data that lacks norms has negative effects on women, ethnic minorities, races, and marginalized individuals and communities (Birhane, 2021). The categorization, collection, and labeling of data actually involve choices made by society, culture, and politics. Beneath the seemingly fair appearance of these data, power dynamics are at play, and through the use of technology as a façade, these power structures are hidden, and reproduced through the use of artificial intelligence. (Crawford, 2021, p.121).

Facial recognition technologies

Law enforcement agencies are utilizing new technologies to better ensure social and citizen safety, and these new technologies are considered to be more accurate and efficient. However, the rapid development of Facial Recognition Technologies (FRT) has led to complex moral choices for law enforcement agencies regarding citizen privacy rights and providing social safety (Almeida, 2022). This technology to some extent infringes on citizens’ privacy, and therefore, issues of accountability and acceptable privacy limitations are crucial for balancing the rights and responsibilities of FRT (Schuetz, 2021). Accountability systems establish the obligation to explain, demonstrate, and take responsibility for actions. Almeida (2022) argues that the state has a responsibility to be accountable for the choices made regarding the impact of these technologies in specific situations.

Schuetz (2021) argues that machine learning systems exhibit various forms of bias and algorithmic bias, which is also evident in FRT. Almeida (2022) suggests that if FRT is trained on white male faces, there will be biases and recognition errors when applied to data related to non-white and female faces. Therefore, the use of biased FRT by law enforcement agencies may have negative consequences for communities of color (Schuetz, 2021). This leads to further discrimination against specific communities and a loss of trust between citizens and the law.

Currently, there is no specific legislation on FRT in the EU and UK, but other legislation governs the management of FRT, such as the General Data Protection Regulation (GDPR) (Almeida, 2022). The regulation requires that “privacy by design” and “privacy by default” systems be built into any personal data processing, processing must have a clear legal basis, and must be fair and transparent (Almeida, 2022, p.380). More stringent controls are required when processing special categories of personal data, particularly biometric data (Almeida, 2022). Additionally, the balance between law enforcement powers and personal data rights is a point of contention, and data protection impact assessments (DPIAs) are a way to demonstrate and implement privacy by design. A Data Protection Officer (DPO) acts as a whistle-blower and reports data protection issues to the relevant national supervisory authority, establishing a system for oversight, complaints, and investigations (Almeida, 2022).

The United States is a country lacking data protection agencies, meaning there is no agency to actively represent and protect citizens’ interests while possessing the legal and regulatory powers of the state (Almeida, 2022). FRT is widely used in US law enforcement, but if FRT is abused, the means for ordinary citizens to hold them accountable is extremely limited (Schuetz, 2021). “US legislation requires state authorities to be accountable for their policies and actions when they receive requests for information freedom, but this does not affect private companies that are not held accountable in the same way”(Almeida, 2022, p.384). Additionally, due to the lack of oversight from data protection agencies, there is a lack of transparency in the use of FRT (Almeida, 2022). As there is no agency to represent citizens and enforce decisions, any conflicts regarding FRT and related personal data must be litigated in court, which can be a lengthy and costly process (Almeida, 2022, p.384). In order to safeguard their personal privacy, people frequently need the legal assistance of non-profit organisations, and those who are unable to do so may not be able to hold FRT operators or providers responsible (Almeida, 2022).

California has strengthened its existing privacy laws with the California Privacy Act and has started to introduce facial recognition legislation to enhance FRT use in law enforcement, although some cities have already banned the technology (Schuetz, 2021). IBM and Amazon have announced the suspension of FRT sales, while Google has faced criticism following the dismissal of ethical AI researchers. Private businesses and government agencies face issues of transparency and accountability in using FRT (Almeida, 2022). Hearings have called for stronger accountability and transparency, although the final outcome is still pending. Regarding the impact of FRT on society and ethical considerations, the current law does not protect these very well. Almeida (2022) suggests that transparency and challenge mechanisms must be recorded and explained throughout all stages of FRT development and use. Furthermore, broader ethical considerations, including equality, diversity, inclusivity, and more universal human rights issues, must be taken into account in the development and rollout of FRT (Almeida, 2022). “Global regulatory bodies need to have the authority to actively investigate all aspects of FRT development and deployment in specific cases and have the power to intervene, stop, and fine inappropriate FRT development and deployment” (Almeida, 2022, p.386).

Based on the above, it can be found that AI has to some extent reinforced historical inequities, which are covered by “technology neutrality”. Artificial intelligence affects the lives of countless people, but the lack of ethical scrutiny and the failure to adequately consider the social implications are always responded to with technical issues when problems arise. This technocentric framework has created an increasing number of problems. The growth of the AI industry has led to the concentration of power in the hands of a few, while causing great harm to those who are marginalized.Until recently, policymakers have begun to play a role in attempting to address the social problems caused by AI through regulation. However, this is not enough, as Crawford (2021) proposes that a more profound democratic dialogue should be formed, incorporating calls for personal data protection, labor rights, climate justice, and racial equality, and forming a democratic accountability for the AI industry through collective action.


Almeida, D., Shmarko, K., & Lomas, E. (2022). The ethics of facial recognition technologies, surveillance, and accountability in an age of artificial intelligence: a comparative analysis of US, EU, and UK regulatory frameworks. Ai and Ethics (Online)2(3), 377–387.

Birhane, A., & Prabhu, V. U. (2021). Large image datasets: A pyrrhic win for computer vision? 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1536–1546.

Denton, E., Hanna, A., Amironesei, R., Smart, A., & Nicole, H. (2021). On the genealogy of machine learning datasets: A critical history of ImageNet. Big Data & Society8(2), 205395172110359–.

Crawford, K., & Paglen, T. (2021). Excavating AI: the politics of images in machine learning training sets. AI & Society.

Just, N., & Latzer, M. (2017). Governance by algorithms: reality construction by algorithmic selection on the Internet. Media, Culture & Society39(2), 238–258.

Crawford, K. (2021). The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.

Andrejevic, M. (2019). Automated Media. Routledge.

Buolamwini, J., & Gebru, T. (2018, January 21). Gender shades: Intersectional accuracy disparities in commercial gender classification. PMLR.

Schuetz, P. N. . (2021). Fly in the Face of Bias: Algorithmic Bias in Law Enforcement’s Facial Recognition Technology and the Need for an Adaptive Legal Framework. Law & Inequality39(1), 221–.

Be the first to comment

Leave a Reply