Overview of the tool
The NRC Sentiment and Emotion Lexicons is a collection of seven lexicons, including the widely used Word-Emotion Association Lexicon. The lexicons have been developed with a wide range of applications in mind; they can be used in a multitude of contexts such as sentiment analysis, product marketing, consumer behaviour analysis, and even political campaign analysis. Each lexicon has a list of words and their associations with certain categories of interest such as emotions (joy, sadness, fear, etc.), sentiment (positive and negative), or colour (red, blue, black, etc.). All of the lexicons include entries for English words and can be used to analyze English texts. Also provided are automatic translations of the entries in the NRC Word-Emotion Association Lexicon to 40 other languages, including French, Arabic, Chinese, and Spanish.
Benefits to users
We call upon computers and algorithms to assist us in sifting through enormous amounts of data and also to understand the content—for example, "What is being said about a certain target entity?" (Common target entities include a company, product, policy, person, and country). Lately, we are going further, and also asking questions such as: "Is something good or bad being said about the target entity?" and "Is the speaker happy with, angry at, or fearful of the target?". Leveraging advanced expertise in Text Analytics, the NRC has developed Sentiment and Emotion Lexicons that enable the creation of software for automatic sentiment analysis, providing a deeper understanding of the underlying sentiments and emotions contained within information).
The lexicons have many uses, including:
- Improving customer relation models
- Identifying what evokes strong emotions in people
- Tracking sentiment towards politicians, movies, products
- Detecting happiness and well-being
- Improving automatic dialogue and tutoring systems
- Detect how emotional words and metaphors are used to persuade and constrain
- Developing affect-sensitive characters in computer games.
Targeted audience
The Sentiment and Emotion Lexicons are aimed at different types of audiences, such as:
- Industry interested in creating commercial products
- Researchers (Computational Linguists, Linguists, Psychologists, Data Analysts, etc.) interested in studying emotions
- Government and other research institutions interested in creating products for public welfare.
Lexicon | Description | Number of terms | Association scores |
---|---|---|---|
Word-Emotion Association (a.k.a. NRC Emotion Lexicon) |
Lists associations of words with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). Created by manual annotation on a crowdsourcing platform. Automatic Translations available in 40 languages. |
>14,000 unigrams (words) ~25,000 word senses |
Binary (associated or not) |
Hashtag Emotion | Lists associations of words with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) generated automatically from tweets using emotion-word hashtags such as #happy and #anger. | >16,000 unigrams (words) | Real-valued |
Hashtag Sentiment | Lists associations of words with positive (negative) sentiment. Generated automatically from tweets with sentiment-bearing hashtagged words such as #amazing and #terrible. |
>54,000 unigrams >316,000 bigrams >308,000 pairs |
Real-valued |
Hashtag Affirmative Context Sentiment and Hashtag Negated Context Sentiment | Lists associations of words with positive (negative) sentiment in affirmative or negated contexts. Generated automatically from tweets with sentiment- bearing hashtagged words such as #amazing and #terrible. |
Affirmative contexts: >36,000 unigrams >159,000 bigrams Negated contexts: >7,000 unigrams >23,000 bigrams |
Real-valued |
Emoticon (a.k.a. Sentiment140) | Lists associations of words with positive (negative) sentiment. Generated automatically from tweets with emoticons such as :) and :(. |
>62,000 unigrams >677,000 bigrams >480,000 pairs |
Real-valued |
Emoticon Affirmative Context and Emoticon Negated Context | Lists associations of words with positive (negative) sentiment in affirmative or negated contexts. Generated automatically from tweets with emoticons such as :) and :(. |
Affirmative contexts: >45,000 unigrams >240,000 bigrams Negated contexts: >9,000 unigrams >34,000 bigrams |
Real-valued |
Word-Colour Association | Lists associations of words with colours. Created by manual annotation on a crowdsourcing platform. |
>14,000 unigrams (words) ~25,000 word senses |
Binary (associated or not) |
Relevant Research Articles (available in English only)
- Sentiment analysis of short informal texts, Svetlana Kiritchenko, Xiaodan Zhu and Saif Mohammad, Journal of Artificial Intelligence Research, Volume 50, August 2014, Pages 723-762.
- Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turne, Computational Intelligence, Volume 29, Issue 3, 2013, Pages 436-465.
- From Once Upon a Time to Happily Ever After: Tracking Emotions in Mail and Books, Saif Mohammad, Decision Support Systems, Volume 53, Issue 4, November 2012, Pages 730–741.
- Even the Abstract have Colour: Consensus in Word Colour Associations, Saif Mohammad, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, Portland, OR .
System requirements
The lexicons are distributed as textual files that provide emotion or sentiment scores for a set of words. These textual files can be viewed with any text editor on any platform.
Fees
The lexicons are available free of charge for research purposes. They are also available for commercial applications via a perpetual commercial licence for a nominal one-time fee (available on the NRC virtual store). Contact us to find out more.
Contact us
Technical enquiries
Saif M. Mohammad, Senior Research Officer
Telephone: 613-993-0620
Email: Saif.Mohammad@nrc-cnrc.gc.ca
Business enquiries
Pierre Charron, Business Development Officer
Telephone: 613-990-0336
Email: Pierre.Charron@nrc-cnrc.gc.ca