Using New Technology to Address Harmful Content on Facebook

Published November 20, 2020, 3:00 PM

by Jonathan Castillo

Every day, 1.82 billion people access Facebook. They come from all around the world, speak thousands of different languages and dialects, and post billions of pieces of content every day – from photos to videos, to text. 

Over a decade ago when Facebook started reviewing content, they relied on the users to report posts they thought were inappropriate. The report would then head over to Facebook’s Global Operations team to manually review them and remove the posts if it has indeed broke the social media’s policies.

As Facebook’s users grew, this kind of approach is no longer sustainable. Facebook then developed how to moderate the content posted in their platform by having three teams, each with a vital role.

  1. Content Policy: This team writes Facebook’s Community Standards, which are the rules which outline what is and isn’t allowed on Facebook. It includes people with expertise in topics like terrorism, child safety and human rights, from fields as diverse as academia, law, law enforcement, and government. 
  2. Community Integrity: This team is responsible for building the technology which helps us enforce Facebook’s Community Standards at scale.
  3. Global Operations: This team enforces our Community Standards through human review. Facebook has around 15,000 content reviewers that review content in over 50 languages. This team is based in over 20 sites globally, covering every major timezone. This means they can review reports 24/7.

Facebook itself is a technology, used for communications for people around the world. And so, it is only natural as a piece (albeit, a massive piece) of technology, Facebook needs to further improve, specifically in areas where they can detect harmful content without relying on reports alone.

Some of the other ways technology works to moderate content include:

  • Recognizing duplicate reports from Facebook users, so that if 1,000 people report a piece of content, Facebook does not have 1,000 people reviewing the same piece of content.
  • Ensuring Facebook quickly get reports to reviewers who have the right subject matter or language expertise.
  • Identifying nude and pornographic photos and videos that have previously been removed for violating our Community Standards, and automatically removing this content if it appears in other places on Facebook or Instagram. Facebook also use this same image matching technology to prevent the upload of child exploitation material and non-consensual intimate images
  • Stopping the spread of certain types of content that very clearly violates our Community Standards such as commercial spam or links to commercial pornography sites.
  • Preventing the upload of terrorist content through content matching, which allows us to identify copies of known bad material and assess whether it’s likely to violate our policies.

The vast majority of content Facebook removed is now detected proactively.

Between April and June this year 99.6% of fake accounts, 99.8% of spam, 99.5% of violent and graphic content, 98.5% of terrorist content, and 99.3% of child nudity and sexual exploitation content, 95% of the content Facebook removed from Facebook was identified and removed by our technology – without needing someone to report to us. 

While Facebook always claims to prioritize the most harmful types of content for review (such as suicide, child exploitation or terrorism), other types of content were traditionally sent for human review in chronological order, with user reports prioritized over content flagged proactively by Facebook’s technology.

However due to advances in technology in recent years, Facebook can now able to prioritize the content that needs reviewing, after considering several different factors:

  • Virality: Content that is potentially violating that’s being quickly shared will be given greater weight than content that is getting no shares or views.
  • Severity: Content that’s related to real-world harm such as suicide and self-injury or child exploitation will prioritised over less harmful types of content such as Spam. 
  • Likelihood of violating: Content that has signals which indicate that it may be similar to other content that violated our policies will be prioritised over content which does not appear to have violated our policies previously.

Prioritizing content in this way, regardless of when it was shared on Facebook or whether it was reported by a user or detected by our technology, allows us to get to the highest severity content first.

It also means the reviewers in our Global Operations team spend more time on complex content issues where judgment is required, and less time on lower severity reports that technology is capable of handling. 

Although technology is playing an increasing role in the way Facebook moderate content on Facebook, they still use a combination of technology + reports from our community + human review to identify and review content against our Community Standards. This is because technology has traditionally not been as good as people  when it comes to understanding context.  

When a person from our Global Operations team looks at a post on Facebook, they look at the post in its entirety – including the picture, accompanying text, and comments, to determine if it violates our Community Standards. 

Until recently, most of the technology Facebook used to moderate content looked at each part of a post separately on two dimensions: content type and violation type. One classifier would look at the photo for violations of our nudity policy, and another classifier would look for evidence of violence. A separate set of classifiers might look at the text of the post, or the comments. This can make it challenging to understand the full context of the post.

To get a more holistic understanding of the content, Facebook created technology called Whole Post Integrity Embeddings or WPIE. In simple terms, this technology looks at a post in its entirety, whether it’s images, video and text. It also looks for various policy violations simultaneously using one classifier, instead of multiple different classifiers for different content and violation types. 

Traditionally when training technology to moderate content in different languages, Facebook would need to train one classifier per language. With thousands of languages and dialects spoken around the world, and millions of content examples required to train a classifier, this is not a scalable approach.

XLM-R is a new technology Facebook developed that can understand text in multiple languages. This model is trained in one language and then used with other languages without the need for additional training data or content examples. 

With people on Facebook posting content in more than 160 languages, XLM-R represents an important step toward our vision of being able to moderate content globally. It helps us transition toward a one-classifier-for-many-languages approach — as opposed to one classifier per language.

This is particularly important for less common languages where there may not be large volumes of data available to train the algorithm.

Artificial intelligence (AI): An broad term which refers to the science of making intelligent machines. AI makes it possible for machines to learn from experience, and perform specific tasks (such as content moderation).

Machine learning (ML): A type of AI that allows machines to learn from data. This means the machines do not need to be programmed. 

Algorithm: A process or set of instructions followed by a machine to solve a problem.

Classifier: A classifier is the end product after the machine learning has been completed. The classifier runs on an ongoing basis to ‘classify’ (or sort) data into different categories. 

Supervised learning: A type of machine learning where a human trains the algorithm using large volumes of data. This data is labeled to help the machine learn and once trained, the system can then apply these labels to new data. e.g. categorising emails as spam and non-spam.

Unsupervised learning: A type of machine learning where the algorithm tries to identify patterns in data, looking for similarities that can be used to categorise that data. The algorithm isn’t trained in advance to pick out specific types of data, it simply looks for data that can be grouped by its similarities. e.g. Google News grouping together stories on similar topics each day.

How it all fits together: Machine learning involves training algorithms with labelled data, to become a classifier which can automatically classify (or sort) data on an ongoing basis.