Jonne Sälevä

Research

My research centers around low-resource and multilingual NLP/AI. I’m motivated by the observation that only about 100 out of 7,000 existing languages are supported by current AI and language technology.

In addition to model building and resource creation, I seek to understand what types of tasks and languages our models perform well at, what they struggle with and what factors explain performance disparities.

Finally, I see low-resource NLP as a case study in generalization. Can models learn to perform well in settings which do not resemble their training data but are still related to it in some way, e.g. linguistically?

Concretely, my research can be divided into a few main themes and questions:

  1. Learning in low-resource environments and bridging the linguistic divide:
    • How can we build models for low-resource languages/domains for a variety of NLP/AI tasks, such as machine translation, NER, and language identification?
    • How can we create high-quality benchmarks for such low-resource languages/domains at a reasonable annotation cost?
    • How can we generate synthetic data or inject external knowledge into our models when training data is unavailable?
  2. Large language models and their evaluation:
    • How can we build LLMs that are able to use compositional reasoning to generalize beyond the training data distribution?
    • How can we identify failure modes where LLM performance degrades due to simple transformations that would not confuse humans?
      • Ex: prompt rewriting, MCQ ordering, transliteration, etc.
  3. Uncertainty quantification in benchmarks and leaderboards:
    • How should we quantify the (statistical) significance of performance differences we observe when comparing models empirically?
    • How can we generalize based on multiple task-specific measurements and assess the “overall performance” of different models?
    • Do factors like test set size and leaderboard task choice impact the statistical precision of our results? How much?