AI for the long tail.
We research multilingual model architectures, data representation, and controlled generation for the long tail of languages and domains beyond today's state-of-the-art models.
Foundational research that defines the principles and methods behind our systems.
Language models and other machine learning systems for the long tail of languages and domains.
Data and interactive tools built on top of our models and datasets, for learning, exploration, and practical use.
Application areas including machine translation, question answering, and structured data representation.
Research
Foundational research on multilingual NLP and long-tail language modeling. Selected publications:
Sentiment Analysis and Language Models for Oshikwanyama
1B, 3B, and 8B parameter LLMs for Oshikwanyama
Models
Language models and other machine learning systems for the long tail of languages and domains. One example from our current work:
OkaLM
Kwanyama Language Models
OkaLM is the first family of publicly available large language models for Kwanyama. Available in three sizes (1B, 3B, 8B parameters) to suit different use cases, from lightweight applications to more capable generation.
๐ค View on Hugging Face →Tools and Data
Data and interactive tools built on top of our models and datasets. One example from our current work:
OkaLex
Kwanyama Language Reference and Learning Platform
OkaLex is a Kwanyama language reference and interactive learning platform. It features a bilingual dictionary with translations, definitions, parts of speech, and example sentences.
The platform includes interactive quizzes, flashcards, and a word-matching game for vocabulary practice, plus nearly 50 grammar modules for Kwanyama learners.
For schools, linguists, and anyone exploring Kwanyama.
Visit OkaLex →Try it โ search a word
Applications
Application areas we work on for the long tail of languages and domains.
Machine Translation
Translation across long-tail language pairs, including morphologically rich and under-tokenized languages.
Question Answering
Open-domain and grounded QA across long-tail languages, with retrieval and generation adapted to sparse-data regimes.
Structured Data Representation
Extraction, alignment, and modeling of structured knowledge โ entities, relations, and lexicons โ across long-tail languages and domains.
Who we are
Okalai AI was founded by Ndapa Nakashole, who serves as Chief Scientist. Ndapa is an Associate Professor of Computer Science at the University of California, San Diego (UCSD). Her research focuses on Natural Language Processing (NLP), and Artificial Intelligence (AI) more broadly.
Get in touch: hello@okalai.org