Exploring AI Bias in Security-Related Decision- Making

Razina  Mohammed Al-Hosni; Rabie Ramadan

Authors

Razina Mohammed Al-Hosni Department of Information Systems, College of Economics, Management, and Information Systems, University of Nizwa, Nizwa, Sultanate of Oman.
Prof. Rabie A. Ramadan AI Applications Chair, University of Nizwa, Nizwa 616, Sultanate of Oman.

Keywords:

AI bias, large language models, racial discrimination, predictive justice, intersectional bias, algorithmic critical analysis

Abstract

The increasing penetration of large language models (LLMs) into so-called security domains raises many questions about whether potential biases exist that would affect sensitive decisions related to safety or security. These questions must be answered with a systematic investigation into possible biases on the part of these models towards different security scenarios. The study attempts to assess bias in the responses of the Grok language model with respect to security scenarios, paying particular attention to how the model assesses different demographic classes (race, gender, age, appearance) for threat potential. This study followed a multi-dimensional analytical process by submitting 15 security-related questions to the Grok model, whose responses were then analyzed both quantitatively and qualitatively. These analyses measured threat classification rates among different demographic groups; studied semantic connotations, inference schemes, and judged the model on the basis of consistency with self-claimed principles of fairness. Threat classification rates showed statistically significant differences among various demographic sectors. Arab/Middle Eastern, Black, young, and male individuals were rated as potential threats at much higher rates (51.4%, 43.7%, 48.3%, and 47.6%, respectively) compared to White, elderly, and female persons (36.2%, 27.5%, and 31.9%, respectively). The qualitative analysis also presented persistent contradictions between stated principles and actual practices, including selective use of statistics and arguably varying interpretive frameworks with respect to different demographic groups. The aspect of intersectional bias is particularly troubling, where "young Arab male in traditional clothing" was classified as a potential threat at a rate of 62.7%.

References

Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 298-306. https://doi.org/10.1145/3461702.3462624

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kaplan, J., Ndousse, K., Ogo, C., Olsson, C., Openai, R. S., Chockalingam, S., Wahls, D., & Bowman, S. R. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862. https://doi.org/10.48550/arXiv.2204.05862

Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California Law Review, 104(3), 671-732. https://doi.org/10.15779/Z38BG31

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922

Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim Code. Polity Press.

Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (technology) is power: A critical survey of "bias" in NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5454-5476. https://doi.org/10.18653/v1/2020.acl-main.485

Bonilla-Silva, E. (2006). Racism without racists: Color-blind racism and the persistence of racial inequality in the United States (2nd ed.). Rowman & Littlefield.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2022). Language models are few-shot learners. Communications of the ACM, 65(5), 86-93. https://doi.org/10.1145/3442188.3445922

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77-91.

Chun, W. H. K. (2021). Discriminating data: Correlation, neighborhoods, and the new politics of recognition. MIT Press.

Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989(1), 139-167.

Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. Proceedings of the Third Workshop on Abusive Language Online, 25-35. https://doi.org/10.18653/v1/W19-3504

Ferguson, A. G. (2017). The rise of big data policing: Surveillance, race, and the future of law enforcement. NYU Press.

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411-437. https://doi.org/10.1007/s11023-020-09539-2

Guo, W., & Caliskan, A. (2021). Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 122-133. https://doi.org/10.1145/3461702.3462536

Hutchinson, B., Prabhakaran, V., Denton, E., Webster, K., Zhong, Y., & Denuyl, S. (2020). Social biases in NLP models as barriers for persons with disabilities. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5491-5501. https://doi.org/10.18653/v1/2020.acl-main.487

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

Pettigrew, T. F. (1979). The ultimate attribution error: Extending Allport's cognitive analysis of prejudice. Personality and Social Psychology Bulletin, 5(4), 461-476. https://doi.org/10.1177/014616727900500407

Raji, I. D., Bender, E. M., Paullada, A., Denton, E., & Hanna, A. (2022). AI and the everything in the whole wide world benchmark. Proceedings of the 2022 Conference on Neural Information Processing Systems Track on Datasets and Benchmarks. https://doi.org/10.48550/arXiv.2111.15366

Richardson, S. A., Dohrenwend, B. S., & Klein, D. (2019). Interviewing: Its forms and functions. Routledge.

Note: I notice there may be some inconsistencies in reference [8] where the DOI doesn't match the Brown et al. GPT paper, and some references may need verification for accuracy and completeness. You may want to double-check these citations against the original sources.

Exploring AI Bias in Security-Related Decision- Making

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Subscription