The National Research Council of Canada's (NRC) multilingual text processing team carries out research and development in multilingual natural language processing (NLP). This includes machine translation and other language technologies for multilingual contexts.
In particular, we collaborate with government, industry, academia, and other partners on language technologies to support Canada's official languages and the revitalization of Indigenous languages. We also conduct foundational research and excel in international competitions where the calibre of our research and technology is benchmarked against other leaders in the field.
What we offer
Housed within the NRC's Digital Technologies Research Centre, our team's core competencies include:
- computer-assisted translation
- machine learning for natural language applications
- machine translation
- multilingual text mining
- social media analysis and modelling
- translation quality evaluation
We apply our expertise to:
- translation and language service providers, in support of the Government of Canada's Policy on Official Languages:
- computer-assisted translation with the Translation Bureau, Courts Administration Services, and private sector language service providers
- machine translation quality evaluation and estimation with the Translation Bureau
- parallel corpus filtering and cleaning with the Translation Bureau and the Université de Montréal
- translation routing with the Translation Bureau
- translation equivalence error detection with the Public Service Commission of Canada
- learning technologies:
- automatic language proficiency assessment and modelling
- Indigenous Languages Technology Project: software and tools to support Indigenous language schools, educators, students, communities, and technology developers, with multiple partners
- Language Comprehension Tool, a second language reading assistant for Canadian government employees, with the Translation Bureau
- machine translation for second-language writing with Dublin City University and the Université du Québec en Outaouais
- intelligence, monitoring, and security:
- detection of changes within an unfolding event in real time from news articles or social media
- machine translation of social media contents for business and security intelligence
Software and applications
- Portage statistical and neural network automatic translation software
- YiSi semantic machine translation evaluation metric software
- Multi-source translation for Sockeye neural machine translation system
- Document categorization toolbox
- Bayesian online change point detection – package for R programming language
Why work with us
Our team is a unique mix of world-class researchers with backgrounds in computational linguistics, engineering and machine learning, combined with strong, savvy software developers. Our collaborators appreciate our deep technical knowledge, our ability to deliver software components that are easy to integrate, and the state-of-the art results and models we can deliver from their data.
We can take translation and other language technologies from research concepts all the way to products suitable for distributors and end users. Past examples of language technologies we have developed and delivered include word alignment for terminology extraction, statistical machine translation for language comprehension, and cross-lingual semantic similarity for detecting translation errors.
International competitions and shared tasks
Our team is a regular participant and top performer in several tasks at the annual Conference on Machine Translation (formerly called Workshop on Machine Translation or WMT). We are also a leading participant in the International Workshops on Semantic Evaluation (SemEval), the Discriminating Similar Languages series, and the Native Language Identification evaluations.
Team results: WMT 2019
Team results: WMT 2018
Team results: SemEval
- Cross-lingual textual similarity 2016 and 2017, task 1
- Cross-lingual word sense disambiguation 2013, task 10
- Second Language writing assistant 2014 task
Team results: Discriminating Similar Languages series
Team results: Native Language Identification evaluations
Team members
Aidan Pine
Anna Kazantseva
Chi-kiu (Jackie) Lo
Cyril Goutte
Darlene Stewart
Eddie Santos
Éric Joanis
Gabriel Bernier-Colborne
Marc Tessier
Michel Simard
Patrick Littell
Rebecca Knowles
Roland Kuhn
Samuel Larkin
Serge Léger
Sowmya Vajjala
Yunli Wang
Image gallery
Contact us
Interested in applying our multilingual text processing expertise to your project? Contact our experts today!
Cyril Goutte
Team Leader, Multilingual Text Processing
Email: Cyril.Goutte@nrc-cnrc.gc.ca
Targeted industries
Information and communications technology; Analytics; Learning systems.
Locations
- Moncton
- Montréal Decelles
- Ottawa Montreal Road
- Edmonton
- Victoria
Selected publications
- Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: the NRC supervised submissions to the Parallel Corpus Filtering task
- Indigenous language technologies in Canada: assessment, challenges, and successes.
- Real-time change point detection using on-line topic models
- Cost weighting for neural machine translation domain adaptation
- A challenge set approach to evaluating machine translation
- Transferring markup tags in statistical machine translation: a two-stream approach
- Feature space selection and combination for native language identification.
- The trouble with SMT consistency
- Statistical Phrase-based Post-editing