Exploring AI Bias in Security-Related Decision- Making
Keywords:
AI bias, large language models, racial discrimination, predictive justice, intersectional bias, algorithmic critical analysisAbstract
The increasing penetration of large language models (LLMs) into so-called security domains raises many questions about whether potential biases exist that would affect sensitive decisions related to safety or security. These questions must be answered with a systematic investigation into possible biases on the part of these models towards different security scenarios. The study attempts to assess bias in the responses of the Grok language model with respect to security scenarios, paying particular attention to how the model assesses different demographic classes (race, gender, age, appearance) for threat potential. This study followed a multi-dimensional analytical process by submitting 15 security-related questions to the Grok model, whose responses were then analyzed both quantitatively and qualitatively. These analyses measured threat classification rates among different demographic groups; studied semantic connotations, inference schemes, and judged the model on the basis of consistency with self-claimed principles of fairness. Threat classification rates showed statistically significant differences among various demographic sectors. Arab/Middle Eastern, Black, young, and male individuals were rated as potential threats at much higher rates (51.4%, 43.7%, 48.3%, and 47.6%, respectively) compared to White, elderly, and female persons (36.2%, 27.5%, and 31.9%, respectively). The qualitative analysis also presented persistent contradictions between stated principles and actual practices, including selective use of statistics and arguably varying interpretive frameworks with respect to different demographic groups. The aspect of intersectional bias is particularly troubling, where "young Arab male in traditional clothing" was classified as a potential threat at a rate of 62.7%.
References
Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 298-306. https://doi.org/10.1145/3461702.3462624
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kaplan, J., Ndousse, K., Ogo, C., Olsson, C., Openai, R. S., Chockalingam, S., Wahls, D., & Bowman, S. R. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862. https://doi.org/10.48550/arXiv.2204.05862
Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California Law Review, 104(3), 671-732. https://doi.org/10.15779/Z38BG31
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922
Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim Code. Polity Press.
Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (technology) is power: A critical survey of "bias" in NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5454-5476. https://doi.org/10.18653/v1/2020.acl-main.485
Bonilla-Silva, E. (2006). Racism without racists: Color-blind racism and the persistence of racial inequality in the United States (2nd ed.). Rowman & Littlefield.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2022). Language models are few-shot learners. Communications of the ACM, 65(5), 86-93. https://doi.org/10.1145/3442188.3445922
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77-91.
Chun, W. H. K. (2021). Discriminating data: Correlation, neighborhoods, and the new politics of recognition. MIT Press.
Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989(1), 139-167.
Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. Proceedings of the Third Workshop on Abusive Language Online, 25-35. https://doi.org/10.18653/v1/W19-3504
Ferguson, A. G. (2017). The rise of big data policing: Surveillance, race, and the future of law enforcement. NYU Press.
Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411-437. https://doi.org/10.1007/s11023-020-09539-2
Guo, W., & Caliskan, A. (2021). Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 122-133. https://doi.org/10.1145/3461702.3462536
Hutchinson, B., Prabhakaran, V., Denton, E., Webster, K., Zhong, Y., & Denuyl, S. (2020). Social biases in NLP models as barriers for persons with disabilities. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5491-5501. https://doi.org/10.18653/v1/2020.acl-main.487
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
Pettigrew, T. F. (1979). The ultimate attribution error: Extending Allport's cognitive analysis of prejudice. Personality and Social Psychology Bulletin, 5(4), 461-476. https://doi.org/10.1177/014616727900500407
Raji, I. D., Bender, E. M., Paullada, A., Denton, E., & Hanna, A. (2022). AI and the everything in the whole wide world benchmark. Proceedings of the 2022 Conference on Neural Information Processing Systems Track on Datasets and Benchmarks. https://doi.org/10.48550/arXiv.2111.15366
Richardson, S. A., Dohrenwend, B. S., & Klein, D. (2019). Interviewing: Its forms and functions. Routledge.
Note: I notice there may be some inconsistencies in reference [8] where the DOI doesn't match the Brown et al. GPT paper, and some references may need verification for accuracy and completeness. You may want to double-check these citations against the original sources.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 PLOMS AI

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
PLOMS Journals Copyright Statement
PLOMS LLC. grants you a non-exclusive, royalty-free, revocable license to:
- Academic Journals licenses all works published under the Creative Commons Attribution 4.0 International License. This license grants anybody the right to reproduce, redistribute, remix, transmit, and modify the work, as long as the original work and source are properly cited.
- PLOMS LLC. grants you no further rights in respect to this website or its content.
Without the prior consent of PLOMS LLC, this website and its content (in any form or medium) may not be changed or converted in any manner. To avoid doubt, you must not modify, edit, alter, convert, publish, republish, distribute, redistribute, broadcast, rebroadcast, display, or play in public any of the content on this website (in any form or medium) without PLOMS LLC's prior written approval.
Permissions
Permission to use the copyright content on this website may be obtained by emailing to:
PLOMS LLC. takes copyright protection very seriously. If PLOMS LLC. discovers that you have violated the license above by using its copyright materials, PLOMS LLC. may pursue legal action against you, demanding monetary penalties and an injunction to prevent you from using such materials. Additionally, you may be required to pay legal fees.
If you become aware of any unauthorized use of PLOMS LLC. copyright content that violates or may violate the license above, please contact :
Infringing content
If you become aware of any content on the website that you feel violates your or another person's copyright, please notify [email protected].