AI for the long tail.
We research multilingual model architectures, data representation, and controlled generation for the long tail of languages and domains beyond today's state-of-the-art models.
Foundational research that defines the principles and methods behind our systems.
Language models and other machine learning systems for the long tail of languages and domains.
Data and interactive tools built on top of our models and datasets, for learning, exploration, and practical use.
Application areas including machine translation, question answering, and structured data representation.
Research
Foundational research on multilingual NLP and long-tail language modeling. Selected publications:
Sentiment Analysis and Language Models for Oshikwanyama
1B, 3B, and 8B parameter LLMs for Oshikwanyama
Models
Language models and other machine learning systems for the long tail of languages and domains. One example from our current work:
OkaLM
Kwanyama Language Models
OkaLM is the first family of publicly available large language models for Kwanyama. Available in three sizes (1B, 3B, 8B parameters) to suit different use cases, from lightweight applications to more capable generation.
๐ค View on Hugging Face →Tools and Data
Data and interactive tools built on top of our models and datasets. One example from our current work:
OkaLex
Kwanyama Language Reference and Learning Platform
OkaLex is a Kwanyama language reference and interactive learning platform. It features a bilingual dictionary with translations, definitions, parts of speech, and example sentences.
The platform includes interactive quizzes, flashcards, and a word-matching game for vocabulary practice, plus nearly 50 grammar modules for Kwanyama learners.
For schools, linguists, and anyone exploring Kwanyama.
Visit OkaLex →Try it โ search a word
Applications
Application areas we work on for the long tail of languages and domains.
Machine Translation
Question Answering
Structured Data Representation
Who we are
Founded in 2021, Okalai held its first AI school in 2022 and grew from those schools into a research program on long-tail languages and domains. We have trained the first-ever LLMs and machine-translation systems for several languages that previously had none.
Okalai AI was founded by Ndapa Nakashole, who serves as Chief Scientist. Ndapa is an Associate Professor of Computer Science at the University of California, San Diego (UCSD). Her research focuses on Natural Language Processing (NLP), and Artificial Intelligence (AI) more broadly.
Get in touch: hello@okalai.org