Jonne Sälevä

News

2026

Organizing SIGTURK workshop at EACL 2026 Workshop
I am serving as the Publicity Chair for the Second Workshop on Natural Language Processing for Turkic Languages which will be held at EACL 2026 in Rabat, Morocco.
This year the workshop will also feature a Shared Task on Terminology-Aware Machine Translation for English–Turkish Scientific Texts. I encourage everyone to participate!

2025

LLM uncertainty quantification paper accepted to ICJNLP-AACL 2025 Paper
My new paper, From tests to effect sizes: Quantifying uncertainty and statistical variability in multilingual and multitask NLP evaluation benchmarks was accepted to ICJNLP-AACL 2025.
The paper shows that conventional evaluation paradigms wildly underestimate the uncertainty and statistical variability of model performance, which can lead to overconfidence in rankings and overestimates of statistical significance.
Multilingual NER paper published at EMNLP 2025 Paper
Our lab’s collaborative NER benchmarking paper, OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages will be published at EMNLP 2025.
Thanks to Chester for letting me use the data for my evaluation paper.
Paper on morphological reasoning skills of LLMs accepted to NAACL 2025 Paper
Our collaboration paper, Evaluating Morphological Compositional Generalization in Large Language Models was accepted to NAACL 2025. In it we show that LLMs struggle significantly when presented with out-of-distribution linguistic reasoning tasks.
Thanks to Mete, Defne, Duygu and everyone else for the great collaboration!

2024

Organizing Multilingual Representation Learning workshop at EMNLP 2024 Workshop
I am co-organizing the Fourth Workshop on Multilingual Representation Learning at EMNLP 2024. The workshop brings together researchers working on multilingual representation learning with applications to a variety of tasks.
Organizing SIGTURK workshop at ACL 2024 Workshop
I am co-organizing the First Workshop on Natural Language Processing for Turkic Languages at ACL 2024. The workshop brings together researchers working on NLP for Turkic languages and morphologically rich languages more generally.
Northern Sámi NMT paper published at ACL Findings 2024 Paper
Our paper, Language Model Priors and Data Augmentation Strategies for Low-resource Machine Translation: A Case Study Using Finnish to Northern Sámi was published at Findings of the ACL 2024. Looking forward to presenting this at ACL 2024 in Bangkok!
Thanks to Finnish Cultural Foundation and the Lapland Regional Fund for their support!
ParaNames published at LREC-COLING 2024 Workshop
Our paper, ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata, was accepted to LREC-COLING 2024. Check out the resource for free on Github.
Shared task paper at VarDial 2024 Paper
Our team took part in the DSL-ML - Multi-label classification of similar languages shared task at VarDial 2024 and placed 1st on all the languages we participated in! Big thanks to Chester Palen-Michel for collaborating on this with me!

2023

Internship at Google DeepMind Summer
This summer/fall, I’ll be working at Google DeepMind as a Student Researcher, focusing on out-of-distribution detection-style topics. Very excited to be working with LLMs in addition to my low-resource NLP work! Thanks GDM!
Best Paper Award at Insights 2023 Paper
Our paper, What changes when you randomly choose BPE merge operations? Not much. was accepted to the Fourth Workshop on Insights from Negative Results in NLP, held in conjunction with EACL 2023.
Organizing CoCo4MT Shared Task at MTSummit 2023 Workshop
I am co-organizing CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation, held in conjuction with MTSummit 2023.
Extended abstract accepted at SIGTYP 2022 Paper
Our extended abstract, ParaNames: A Massively Multilingual Entity Name Corpus was accepted to SIGTYP 2022, held in conjunction with NAACL 2022.
The abstract describes the work our lab has been doing on ParaNames, a multilingual entity name corpus that covers over 400 languages and 18 million entities.
Feel free to also take a look at the preprint and the Github repository.

2022

Organizing the DCLRL workshop at LREC 2022 Workshop
I’ll be co-organizing the first Workshop on Dataset Creation for Lower-Resourced Languages held at LREC 2022. Hope to see you there!
Internship at USC Information Sciences Institute Summer
This summer, I’ll be joining the Information Sciences Institute at the University of Southern California as a Visiting Research Assistant.
Looking forward to spending the summer in sunny California!
One paper accepted at Findings of the ACL 2022 Paper
Our new position paper, Toward More Meaningful Resources for Lower-resourced Languages, was accepted to Findings of the ACL for ACL 2022.
Recommended reading for anyone working on lower-resourced languages, as well as anyone thinking of using Wikidata or WikiAnn out-of-the-box.

2021

Two workshop papers accepted at EACL 2021 Workshop
My paper on morphology and low-resource NMT, The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation, was accepted to the Student Research Workshop.
Another paper of mine, Mining Wikidata for Name Resources for African Languages, was also accepted to the AfricaNLP Workshop.

2020

Starting as a PhD student at Brandeis. :)

Deep Learning Summer School at MILA Summer
I’ll be joining the Montreal Institute of Learning Algorithms for Deep Learning & Reinforcement Learning Summer school virtually this year.
Yiddish corpus paper accepted at LREC 2020 Paper
My paper A Multi-Orthography Parallel Corpus of Yiddish Nouns was accepted at LREC 2020.
Sadly, due to COVID, there was no opportunity to present the work in Marseille.
You can still find the paper in the proceedings, though!

2019

Yay, starting as a MS student at Brandeis. :)