Skip to the content.

What is hate speech?

A critical primary distinction necessary for this investigation is a working definition of hate speech itself. Hate speech is generally understood as any form of expression that attacks, uses pejorative language, or discriminates against a specific person or group based on protected characteristics such as religion, race, ethnicity, or gender (United Nations, 2024). Within online spaces, this phenomenon is amplified by what Suler (2004) defined as the “Online Disinhibition Effect”. As Suler proposed, factors such as dissociative anonymity, invisibility, and asynchronicity allow individuals to behave more intensely and with reduced self-constraint than they otherwise would in person (as cited in Gan et al., 2024). This ‘disinhibition’ facilitates the normalization of extremist thinking and group polarization, making the online environment an unique catalyst for antisocial behavior (Gan et al., 2024).

A central tension in our analysis, which also had an adverse effect on our categorization of the data, arises from the differing legal traditions of the human annotators as well as the platforms being considered. Most of our annotators operate from an European background, where hate speech is more strictly categorized, often subject to criminal penalties, and deeply rooted in a legal history that protects fundamental rights such as human dignity and social welfare. In contrast, the platforms whose content we analyze (such as X) are primarily governed by U.S. legal and social standards. Under the First Amendment of the U.S. Constitution, the “marketplace of ideas” theory first coined by the U.S. Supreme Court Justice Oliver Wendell Holmes Jr. (who himself was known as a foundational free speech advocate) protects most speech that the rest of the world regulates. Unless speech reaches a threshold of “incitement to imminent lawless action” or a genuine threat to personal or public safety, it is generally constitutionally protected in the United States. This creates a significant gap, as “the vast majority of non-American laws prohibiting the incitement to racial hatred would be unconstitutional in the United States, as would be the overwhelming proportion of actual legal actions brought under those laws” (Caplan, 2015, p. 30).

While the law sets a general ‘floor’ - the basic standard for legality - platform governance through Terms of Service (TOS) often looks very different. Platforms oscillate between more laissez-faire and strict moderation based on ownership, political climate, regulatory pressure, or legal shifts (such as the EU DSA). Under its current leadership, X, for example, has pivoted toward a “free speech absolutist” approach that generally allows speech which falls just short of direct incitement, and hence aligning more closely with U.S. First Amendment standards than EU regulatory standards. More notable for its hands-off approach, Gab’s dataset, which is often used in AI training, shows an amalgamation of different legal and psychological definitions that are centered around “hate-based rhetoric” which includes violent or dehumanizing language (Kennedy et al., 2022).

For the purpose of our decoding study, we acknowledge the constraints of these various factors and have settled on a working definition that centers on incitement. Although pejorative language is common online, we categorize speech as “hate” if it crosses the threshold from offensive expression to the incitement of hostility or violence against a protected group. Examples used to guide annotators in their preparation to categorize the unique IDs included:

  • Calling a black man a n*gger is offensive speech.
  • Calling to hang all black men is hate speech.
  • Calling women wh*res is offensive speech.
  • Calling for the rape of all women is hate speech.

Citations

Caplan, L. (2015) ‘The embattled 1st Amendment’, The American Scholar, 84(2), pp. 18–30. Available at: https://www.jstor.org/stable/26755616

Cortiz, D. and Zubiaga, A. (2021) ‘Ethical and technical challenges of AI in tackling hate speech’, The International Review of Information Ethics, 29. Available at: https://doi.org/10.29173/irie416

Gan, W., Chen, Z., Wu, Z., Huang, X. and Wang, F. (2024) ‘Aggression in online gaming: the role of online disinhibition, social dominance orientation, moral disengagement and gender traits among Chinese university students’, Frontiers in Public Health, 12. Available at: https://doi.org/10.3389/fpubh.2024.1459696

Kennedy, B., Atari, M., Davani, A.M., Yeh, L., Omrani, A., Kim, Y. et al. (2022) ‘Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale’, Language Resources and Evaluation, 56, pp. 79–108. Available at: https://doi.org/10.1007/s10579-021-09569-x

Suler, J. (2004) ‘The online disinhibition effect’, ResearchGate. Available at: https://www.researchgate.net/publication/8451443_The_Online_Disinhibition_Effect

United Nations (2024) Report – A conceptual analysis of the overlaps and differences between hate speech, misinformation and disinformation. Available at: https://peacekeeping.un.org/sites/default/files/report_-a_conceptual_analysis_of_the_overlaps_and_differences_between_hate_speech_misinformation_and_disinformation_june_2024_qrupdate.pdf