News
2026
- Organizing SIGTURK workshop at EACL 2026 Workshop
- I am serving as the Publicity Chair for the Second Workshop on Natural Language Processing for Turkic Languages which will be held at EACL 2026 in Rabat, Morocco.
- This year the workshop will also feature a Shared Task on Terminology-Aware Machine Translation for English–Turkish Scientific Texts. I encourage everyone to participate!
2025
- LLM uncertainty quantification paper accepted to ICJNLP-AACL 2025 Paper
- My new paper, From tests to effect sizes: Quantifying uncertainty and statistical variability in multilingual and multitask NLP evaluation benchmarks was accepted to ICJNLP-AACL 2025.
- The paper shows that conventional evaluation paradigms wildly underestimate the uncertainty and statistical variability of model performance, which can lead to overconfidence in rankings and overestimates of statistical significance.
- Multilingual NER paper published at EMNLP 2025 Paper
- Our lab’s collaborative NER benchmarking paper, OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages will be published at EMNLP 2025.
- Thanks to Chester for letting me use the data for my evaluation paper.
- Paper on morphological reasoning skills of LLMs accepted to NAACL 2025 Paper
- Our collaboration paper, Evaluating Morphological Compositional Generalization in Large Language Models was accepted to NAACL 2025. In it we show that LLMs struggle significantly when presented with out-of-distribution linguistic reasoning tasks.
- Thanks to Mete, Defne, Duygu and everyone else for the great collaboration!
2024
- Organizing Multilingual Representation Learning workshop at EMNLP 2024 Workshop
- I am co-organizing the Fourth Workshop on Multilingual Representation Learning at EMNLP 2024. The workshop brings together researchers working on multilingual representation learning with applications to a variety of tasks.
- Organizing SIGTURK workshop at ACL 2024 Workshop
- I am co-organizing the First Workshop on Natural Language Processing for Turkic Languages at ACL 2024. The workshop brings together researchers working on NLP for Turkic languages and morphologically rich languages more generally.
- Northern Sámi NMT paper published at ACL Findings 2024 Paper
- Our paper, Language Model Priors and Data Augmentation Strategies for Low-resource Machine Translation: A Case Study Using Finnish to Northern Sámi was published at Findings of the ACL 2024. Looking forward to presenting this at ACL 2024 in Bangkok!
- Thanks to Finnish Cultural Foundation and the Lapland Regional Fund for their support!
- ParaNames published at LREC-COLING 2024 Workshop
- Our paper, ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata, was accepted to LREC-COLING 2024. Check out the resource for free on Github.
- Shared task paper at VarDial 2024 Paper
- Our team took part in the DSL-ML - Multi-label classification of similar languages shared task at VarDial 2024 and placed 1st on all the languages we participated in! Big thanks to Chester Palen-Michel for collaborating on this with me!
2023
- Internship at Google DeepMind Summer
- This summer/fall, I’ll be working at Google DeepMind as a Student Researcher, focusing on out-of-distribution detection-style topics. Very excited to be working with LLMs in addition to my low-resource NLP work! Thanks GDM!
- Best Paper Award at Insights 2023 Paper
- Our paper, What changes when you randomly choose BPE merge operations? Not much. was accepted to the Fourth Workshop on Insights from Negative Results in NLP, held in conjunction with EACL 2023.
- Organizing CoCo4MT Shared Task at MTSummit 2023 Workshop
- I am co-organizing CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation, held in conjuction with MTSummit 2023.
- Extended abstract accepted at SIGTYP 2022 Paper
- Our extended abstract, ParaNames: A Massively Multilingual Entity Name Corpus was accepted to SIGTYP 2022, held in conjunction with NAACL 2022.
- The abstract describes the work our lab has been doing on ParaNames, a multilingual entity name corpus that covers over 400 languages and 18 million entities.
- Feel free to also take a look at the preprint and the Github repository.
2022
- Organizing the DCLRL workshop at LREC 2022 Workshop
- I’ll be co-organizing the first Workshop on Dataset Creation for Lower-Resourced Languages held at LREC 2022. Hope to see you there!
- Internship at USC Information Sciences Institute Summer
- This summer, I’ll be joining the Information Sciences Institute at the University of Southern California as a Visiting Research Assistant.
- Looking forward to spending the summer in sunny California!
- One paper accepted at Findings of the ACL 2022 Paper
- Our new position paper, Toward More Meaningful Resources for Lower-resourced Languages, was accepted to Findings of the ACL for ACL 2022.
- Recommended reading for anyone working on lower-resourced languages, as well as anyone thinking of using Wikidata or WikiAnn out-of-the-box.
2021
- Two workshop papers accepted at EACL 2021 Workshop
- My paper on morphology and low-resource NMT, The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation, was accepted to the Student Research Workshop.
- Another paper of mine, Mining Wikidata for Name Resources for African Languages, was also accepted to the AfricaNLP Workshop.
2020
Starting as a PhD student at Brandeis. :)
- Deep Learning Summer School at MILA Summer
- I’ll be joining the Montreal Institute of Learning Algorithms for Deep Learning & Reinforcement Learning Summer school virtually this year.
- Yiddish corpus paper accepted at LREC 2020 Paper
- My paper A Multi-Orthography Parallel Corpus of Yiddish Nouns was accepted at LREC 2020.
- Sadly, due to COVID, there was no opportunity to present the work in Marseille.
- You can still find the paper in the proceedings, though!
2019
Yay, starting as a MS student at Brandeis. :)