Roles and responsibilities
I am a Research Council Officer in the Multilingual Text Processing (MTP) team at NRC Digital Technologies. Since joining NRC in June 2005, I have been able to combine my twin passions: computational linguistics and software engineering. Broadly speaking, my responsibilities are to ensure the high quality of software produced by the MTP team – that involves not only writing code myself, but testing, debugging and optimizing code written by others, as well as making it more robust and documenting it – and to support the researchers in my team in using the computing and software infrastructure we have at the NRC. For many years I was the head software developer for NRC's Statistical Machine Translation system, Portage. Most recently, I have been working with the Indigenous Languages Technology (ILT) project at NRC.
Current research and/or projects
The Indigenous Languages Technology (ILT) project began in 2017. Within this project, I worked on the publication of the Nunavut Hansard Inuktitut–English Parallel Corpus 3.0, which has become a key resource for researchers in machine translation (MT) between Inuktitut and English across the world – for instance, it was a key component of the evaluation of MT systems for this language pair at the WMT 2020 conference (Findings of the 2020 Conference on Machine Translation (WMT20) - ACL Anthology). I currently lead the Read-Along Studio subproject within ILT. This subproject, a collaboration with Carleton University and David Huggins-Daines, allows language communities to publish their own interactive aligned text/audio stories. The system is an end-to-end forced text/audio aligner and a web component for visualizing the output. The subproject has worked with educators representing over 20 language communities; we are constantly receiving new requests from educators who are interested in Read-Alongs for their own languages. For details, see ReadAlong Studio: Application for Indigenous audiobooks and videos project - National Research Council Canada
Research and/or project statements
My long-term interests include machine translation, high-performance computing, software optimization and parallelization, and computation with very large corpora, models, and data structures. However, my current work consists in software development for technologies that support language teachers and activists in Indigenous communities in Canada. Although I was already interested in applying my expertise in computational linguistics to Indigenous languages, an excellent course called “Indigenous Canada” offered by the University of Alberta on Coursera that I took in 2020-2021 strengthened my motivation for working in this fascinating and socially worthwhile field.
Education
- MSc Computational Linguistics, University of Toronto, 2002
- BMath Computer Science, University of Waterloo, 1996
Affiliations
Association for Computational Linguistics
Awards
A list of the awards I'm most proud of:
- 2021: NRC Intellectual Property Achievement Award (IPAA) for impactful open-source software contributions by the ILT team. This was a team award, but it was great to be part of this socially valuable work on Indigenous language technology.
- 2009: NRC-IIT Outstanding Achievements Award – General Award to the Portage Project Team, 2009
- 2007: NRC-IIT Director’s Award for Technology Transfer
- 2007: NRC-IIT Outstanding Achievements Award – Inventors and Innovators Award
- 1998: University of Toronto Computer Science Graduate Entrance Award
- 1992–1996: University of Waterloo Dean’s Honour List and Graduating Dean’s Honour List
- 1992: First in Quebec in the American Mathematics Competition
Key publications
Indigenous Languages Technology
- Gᵢ2Pᵢ: rule-based, index-preserving grapheme-to-phoneme transformations, A. Pine et al, ComputEL 2022.
- ReadAlong studio: practical zero-shot text-speech alignment for indigenous language audiobooks, P. Littell et al, LREC 2022.
- The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0 with preliminary machine translation results, E. Joanis et al, LREC 2020.
Portage Statistical Machine Translation
- Coarse “split and lump” bilingual language models for richer source information in SMT, D. Stewart, R. Kuhn, E. Joanis and G. Foster, AMTA 2014.
- Transferring markup tags in statistical machine translation: a two-stream approach, E. Joanis, D. Stewart, S. Larkin and R. Kuhn, MT Summit 2013.
- Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too, U. Germann, E. Joanis and S. Larkin, SETQA-NLP 2009.
Statistical Lexical Semantics
- A general feature space for automatic verb classification, E. Joanis, S. Stevenson and D. James, Natural Language Engineering, 2008.
Previous work experience
In 1996, after receiving my BMath, I joined a small company called Televitesse and worked on automatic categorization and topic segmentation of television news clips using the closed-captioning stream. After receiving my MSc in computational linguistics in 2002, I worked as a research assistant for one year at the University of Toronto and two years at the University of Geneva. In both places, I worked on statistical lexical semantics: classification and token-wise disambiguation of verbs.
Outside work, I've been volunteering as a Scouter with the 1st Aylmer Cub Scout Pack since 2016.