Hate and hate speech on the internet – how AI can help recognise dangers early

By on 23/02/2022 | Updated on 23/02/2022

Hate and hate speech on the internet is noticeably increasing. At the same time, the number of politically motivated acts of violence is on the rise. Are there connections between hate speech online and acts of violence? And what possibilities are there to uncover potential connections?

With the help of artificial intelligence, sources of hatred can be identified and networks uncovered that encourage politically motivated acts of violence.

Taking action

The following describes how freely accessible social media data can be evaluated and sources of hate and hate speech identified. Content published and distributed on social networks consists of data. This data can be collected and evaluated and serves as the basis for our analysis. On the one hand, the collected data can be used to extract metadata that provide additional information: who wrote a post? When was the post written? Where was the message sent from? Who is the sender in contact with?

But the content of the posts can also be analysed. Natural language processing methods can be used to identify information such as places or people mentioned in texts. Content-related topics can also be recorded and assigned with the help of natural language processing. In this way, collected texts are automated and, if necessary, manually refined and classified into subject areas. This can be used, for example, to identify whether a post contains calls for violence, whether meetings are being organised or whether messages are being distributed by other groups.

Images and videos also provide valuable information that can be evaluated using computer vision methods, which enable people, objects, and texts in pictures and videos to be recognised.

Furthermore, it makes sense to include a time dimension in the analysis of the data: how is the number of followers of a user developing or with what frequency are posts created or forwarded to other groups? How does the content change over time? The search for influencers in groups and an assessment of how dangerous they are can be based on the collected and processed data.

In order for the information to be evaluated effectively, it should be linked to existing information. Existing data from police systems plays an important role here, telling us in which networks there are violent criminals, dangerous people, and people already known to the police. Who do these people follow and in which networks are they mostly active? Here, central figures in the networks are identified, who are often followed by conspicuous people.

Checking the content of the posts provides added value when searching for dangerous influencers on the internet. This allows calls for violence in individual posts or in posts by connected people to be identified and evaluated. In addition, endangered people in public life can be discovered and protected if necessary.

So, what should be considered in order to provide an efficient basis for investigations in this environment?

  • The amount of data is always increasing. Huge amounts of data are available, especially in the context of social media and the internet. This data cannot be efficiently evaluated manually. Therefore, a high degree of automation is important to support the investigators’ work.
  • Open-source intelligence (OSINT) data should be connected to inventory data. This ensures that the existing information is not lost and that added value is created. This, for example, reveals previously hidden connections between people or groups.
  • In order to professionally prepare the data for investigators, entities should be created according to the POLE method. Here, people, objects, locations, and events are extracted from the data as entities in order to provide a reliable basis for investigations.
  • Automated extraction of information from unstructured data such as text, images, and videos helps investigators to search through and evaluate a large amount of data in a targeted manner.
  • Networks are of great importance for determinations, because they show the connections between all of this information. In networks, connections are identified that are not obvious at first glance. Network analyses are also carried out to interpret the data and can be used to determine the roles of entities in groups.
  • An intuitive interface is designed to support investigators in their work.
  • In order to cope with the large amounts of data, it helps to use hybrid analytics. Various methods are combined here to create a score that is used to determine risk factors. For example, machine learning models are used to identify and summarise conspicuous behaviour of social media groups. Alerts are generated based on the risk factors found: these alerts are displayed in the investigation interface and reported to the responsible investigators as information. This helps investigators to prioritise and filter large amounts of available data. Investigators must also be able to create rules for alerts. An interface should be available that allows generated alerts to be processed, checked, and tracked.
  • In order to ensure agile and smooth cooperation, it is advantageous if analysts and investigators work on the same platform and with the same database. This makes it easier to communicate the findings of the analysts to the investigators and the decision-making processes for the automated examination are based on the experiences of the investigators and the findings of the analysts.

Recommendations

A platform for online hate and hate speech analysis and investigation should provide the following:

  • Customisable without redevelopment
  • Privacy
  • Transparency
  • Traceability
  • Advanced search and filter options
  • Network analysis
  • Easy integration of expertise
  • An interface tailored to the discovery process
  • Ease of deployment

Conclusion

Data analysis, assisted by AI, can help to identify hate speech online and counter the acts of violence they promote.

Based on a hybrid analytics approach, alerts can be generated that give investigators the opportunity to focus their work, to gain a holistic view of the data via an interface, and to uncover previously hidden relationships through networks. SAS can support the implementation of the described requirements.

The methods described here help to identify violent groups and individuals, as well as those at risk of radicalisation and potential victims who are mentioned in posts, enabling the relevant authorities to place them under protection where appropriate.

Find out more about open-source intelligence and social media monitoring.

Authors

Johannes Pretsch
Senior associate systems engineer analytics, SAS

Johannes has been with SAS since 2014. First as a dual student in business informatics in cooperation with the DHBW Mannheim, then as a working student during his master’s degree in business informatics with a focus on data science and consulting at the HWG Ludwigshafen. After completing his master’s degree, he was a participant at the SAS Customer Advisory Academy in the USA until the end of 2019. Since then he has worked as an Associate Systems Engineer in the pre-sales team for the public sector.

Georg Rau
Account advisor, national security, SAS

Georg Rau is an account advisor at SAS Germany supporting organisations in the area of national security. Georg has over 25 years of professional experience in analytical consulting, business strategy, and delivering innovative solutions. He has supported customers in different sectors, but his focus has always been on the public sector. Georg holds a degree in computer science and business administration. Prior to SAS, he worked for a European aircraft manufacturer.

About Partner Content

This content is brought to you by a Global Government Forum, Knowledge Partner.