Applied AI Research

AI for the long tail.

We research multilingual model architectures, data representation, and controlled generation for the long tail of languages and domains beyond today's state-of-the-art models.

Apr 2026 Paper "Grammar as Control: Modular Language Generation for the Long Tail" accepted at ACL 2026 PDF →
Mar 2026 Kwanyama language reference and learning platform Visit →
Feb 2026 Paper on Kwanyama LLMs (1B, 3B, 8B) accepted at LREC 2026 PDF →
Jul 2025 Outstanding Paper Award at ACL 2025 for typology-guided multilingual adaptation PDF →

Research

Foundational research on multilingual NLP and long-tail language modeling. Selected publications:

ACL 2026

Grammar as Control: Modular Language Generation for the Long Tail

LREC 2026

Sentiment Analysis and Language Models for Oshikwanyama

1B, 3B, and 8B parameter LLMs for Oshikwanyama

ACL 2025 Outstanding Paper Award

Typology-Guided Adaptation in Multilingual Models

All publications →

Models

Language models and other machine learning systems for the long tail of languages and domains. One example from our current work:

OkaLM

Kwanyama Language Models

OkaLM is the first family of publicly available large language models for Kwanyama. Available in three sizes (1B, 3B, 8B parameters) to suit different use cases, from lightweight applications to more capable generation.

๐Ÿค— View on Hugging Face →
3 Model Sizes
1Bโ€“8B Parameters

Tools and Data

Data and interactive tools built on top of our models and datasets. One example from our current work:

OkaLex

Kwanyama Language Reference and Learning Platform

OkaLex is a Kwanyama language reference and interactive learning platform. It features a bilingual dictionary with translations, definitions, parts of speech, and example sentences.

The platform includes interactive quizzes, flashcards, and a word-matching game for vocabulary practice, plus nearly 50 grammar modules for Kwanyama learners.

For schools, linguists, and anyone exploring Kwanyama.

Visit OkaLex →

Try it โ€” search a word

Applications

Application areas we work on for the long tail of languages and domains.

Machine Translation

Translation across long-tail language pairs, including morphologically rich and under-tokenized languages.

Question Answering

Open-domain and grounded QA across long-tail languages, with retrieval and generation adapted to sparse-data regimes.

Structured Data Representation

Extraction, alignment, and modeling of structured knowledge โ€” entities, relations, and lexicons โ€” across long-tail languages and domains.

Who we are

Okalai AI was founded by Ndapa Nakashole, who serves as Chief Scientist. Ndapa is an Associate Professor of Computer Science at the University of California, San Diego (UCSD). Her research focuses on Natural Language Processing (NLP), and Artificial Intelligence (AI) more broadly.

Get in touch: hello@okalai.org

Contact Us