Research Question

Hate speech detection is relatively well explored in machine learning literature. There is broad agreement in the inherently somewhat ambiguous nature of the issue - what is hate speech, who gets to define it, for what (legal?) purpose do we need the definition - and that this significantly complicates any attempt to have machines detect it. The annotation of hate speech datasets tend to be marked by much disagreement, depending highly on who is asked to classify it, be that online community, anti-discrimination experts or legal professionals (Waseem, 2016). Automatic language detection struggles to separate hate speech from offensive language under various conditions (Davidson et al., 2017) and so do humans (Sue et al., 2007). Importantly, imbalances in this training data, if used for algorithms to detect hate speech, will naturally be reflected in respective biases in their output (Dixon et al., 2018). Therefore, considering specifically the study done and dataset created by Mathew et al., 2022 to train BERT for hate speech, this project asks the question:

How does dataset annotation influence hate speech classification and what does this imply for automated content moderation?

BERT is adapted to hate speech classification through supervised learning - comparing the model’s predictions to what is the ‘correct’ label provided in the dataset and successively correcting itself, so what is conceived as this ‘correct’ answer in the dataset is crucial to the classifications the model ultimately makes - because “supervised machine learning […] is only as good as the quality of the data” (Geiger et al., 2021).

Citations

Davidson, T., Warmsley, D., Macy, M. and Weber, I., 2017, May. Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media (Vol. 11, No. 1, pp. 512-515).

Geiger, R.S., Cope, D., Ip, J., Lotosh, M., Shah, A., Weng, J., Tang, R., 2021. “Garbage In, Garbage Out” Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data? Quantitative Science Studies 2, 795–827. https://doi.org/10.1162/qss_a_00144

Sue, D.W., Capodilupo, C.M., Torino, G.C., Bucceri, J.M., Holder, A., Nadal, K.L. and Esquilin, M., 2007. Racial microaggressions in everyday life: Implications for clinical practice. American Psychologist, 62(4).

Waseem, Z., 2016, November. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the first workshop on NLP and computational social science (pp. 138-142).