Sentiment and emotion lexicons

Overview of the tool

The NRC Sentiment and Emotion Lexicons is a collection of seven lexicons, including the widely used Word-Emotion Association Lexicon. The lexicons have been developed with a wide range of applications in mind; they can be used in a multitude of contexts such as sentiment analysis, product marketing, consumer behaviour analysis, and even political campaign analysis. Each lexicon has a list of words and their associations with certain categories of interest such as emotions (joy, sadness, fear, etc.), sentiment (positive and negative), or colour (red, blue, black, etc.). All of the lexicons include entries for English words and can be used to analyze English texts. Also provided are automatic translations of the entries in the NRC Word-Emotion Association Lexicon to 40 other languages, including French, Arabic, Chinese, and Spanish.

Benefits to users

We call upon computers and algorithms to assist us in sifting through enormous amounts of data and also to understand the content—for example, "What is being said about a certain target entity?" (Common target entities include a company, product, policy, person, and country). Lately, we are going further, and also asking questions such as: "Is something good or bad being said about the target entity?" and "Is the speaker happy with, angry at, or fearful of the target?". Leveraging advanced expertise in Text Analytics, the NRC has developed Sentiment and Emotion Lexicons that enable the creation of software for automatic sentiment analysis, providing a deeper understanding of the underlying sentiments and emotions contained within information).

The lexicons have many uses, including:

  • Improving customer relation models
  • Identifying what evokes strong emotions in people
  • Tracking sentiment towards politicians, movies, products
  • Detecting happiness and well-being
  • Improving automatic dialogue and tutoring systems
  • Detect how emotional words and metaphors are used to persuade and constrain
  • Developing affect-sensitive characters in computer games.

Targeted audience

The Sentiment and Emotion Lexicons are aimed at different types of audiences, such as:

  • Industry interested in creating commercial products
  • Researchers (Computational Linguists, Linguists, Psychologists, Data Analysts, etc.) interested in studying emotions
  • Government and other research institutions interested in creating products for public welfare.

Technical tool description

The NRC Sentiment and Emotion Lexicons included in this distribution:

Lexicon Description Number of terms Association scores
Word-Emotion Association (a.k.a. NRC Emotion Lexicon)

Lists associations of words with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). Created by manual annotation on a crowdsourcing platform.

Automatic Translations available in 40 languages.

>14,000 unigrams (words)

~25,000 word senses

Binary (associated or not)
Hashtag Emotion Lists associations of words with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) generated automatically from tweets using emotion-word hashtags such as #happy and #anger. >16,000 unigrams (words) Real-valued
Hashtag Sentiment Lists associations of words with positive (negative) sentiment. Generated automatically from tweets with sentiment-bearing hashtagged words such as #amazing and #terrible.

>54,000 unigrams

>316,000 bigrams

>308,000 pairs

Real-valued
Hashtag Affirmative Context Sentiment and Hashtag Negated Context Sentiment Lists associations of words with positive (negative) sentiment in affirmative or negated contexts. Generated automatically from tweets with sentiment- bearing hashtagged words such as #amazing and #terrible.

Affirmative contexts:

>36,000 unigrams

>159,000 bigrams

Negated contexts:

>7,000 unigrams

>23,000 bigrams

Real-valued
Emoticon (a.k.a. Sentiment140) Lists associations of words with positive (negative) sentiment. Generated automatically from tweets with emoticons such as :) and :(.

>62,000 unigrams

>677,000 bigrams

>480,000 pairs

Real-valued
Emoticon Affirmative Context and Emoticon Negated Context Lists associations of words with positive (negative) sentiment in affirmative or negated contexts. Generated automatically from tweets with emoticons such as :) and :(.

Affirmative contexts:

>45,000 unigrams

>240,000 bigrams

Negated contexts:

>9,000 unigrams

>34,000 bigrams

Real-valued
Word-Colour Association Lists associations of words with colours. Created by manual annotation on a crowdsourcing platform.

>14,000 unigrams (words)

~25,000 word senses

Binary (associated or not)

Relevant Research Articles (available in English only)

System requirements

The lexicons are distributed as textual files that provide emotion or sentiment scores for a set of words. These textual files can be viewed with any text editor on any platform.

Fees

The lexicons are available free of charge for research purposes. They are also available for commercial applications via a perpetual commercial licence for a nominal one-time fee (available on the NRC virtual store). Contact us to find out more.

Contact us

Technical enquiries

Saif M. Mohammad, Senior Research Officer
Telephone: 613-993-0620
Email: Saif.Mohammad@nrc-cnrc.gc.ca

Business enquiries

Pierre Charron, Business Development Officer
Telephone: 613-990-0336
Email: Pierre.Charron@nrc-cnrc.gc.ca