Muhammad Dehan Al Kautsar

I am a Research Engineer at MBZUAI working on multilinguality and dialogue in NLP. I like tweaking and tinkering with tokenization and representations to understand how LLMs handle underrepresented languages. I am also interested in language code-mixing and code-switching in NLP. My goal is to make those models more inclusive, especially for languages across the Global South.

I am also playing piano and sports in my spare time. If you’re interested in collaborating or discussing the research (or anything), feel free to get in touch!

NLP: Multilinguality in LLM NLP: Dialogue System Data Science

News

Recent updates & highlights.

Apr 2026

Several papers were accepted!
- [1] Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms
- [2] Vision Language Models are Confused Tourists
- [3] Cultural Benchmarking of LLMs in MSA and Arabic Dialectal Dialogue
were accepted at AIED 2026 Main Conference^[1] (in Seoul, South Korea), CVPR 2026 Findings^[2] (in Denver, USA), and ACL 2026 Main Conference^[3] (in San Diego, USA). 😎🗽

Oct 2025

Our papers titled
- Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
- Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
was accepted at AACL-IJCNLP and Eval4NLP 2025 in Mumbai, India!

Jul 2025

Our papers titled
- IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages
- What Do Indonesians Really Need from Language Technology? A Nationwide Survey
was accepted at EMNLP 2025 Main. I presented the later in Suzhou, China. Nice to meet you there!🧧

Nov 2024

Started a new position as Research Associate I (now: Research Engineer II) at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) under supervision of Fajri Koto.

Nov 2023

Our paper titled 'IndoToD: A multi-domain Indonesian benchmark for end-to-end task-oriented dialogue systems' is accepted and selected as the Best Paper on SEALP 2023 Workshop, co-located with AACL-IJCNLP 2023 in Bali, Indonesia.🏖️🏆

Oct 2023

I went to Tokyo, Japan to become a delegate of Institut Teknologi Bandung in a technology and cultural exchange program hosted by The University of Electro-Communications (UEC).🌸

Nov 2021

Became a finalist of Pusat Prestasi Nasional GEMASTIK - Smart City Division. We developed 'Virtual Hospital', an IoT-based technology to monitor the patient virtually because of the impact of COVID-19.

Education

M.Sc. in Informatics — Institut Teknologi Bandung

2023 - 2024
Bandung, Indonesia

Final GPA: 3.96 / 4.00

Thesis: “End-to-end Fused Dialogue System in Open-Source Large Language Model”.
Supervised by Ayu Purwarianti, Samuel Cahyawijaya, and Genta Indra Winata.

B.Sc. in Informatics — Institut Teknologi Bandung

2019 – 2023
Bandung, Indonesia

Final GPA: 3.93 / 4.00

Final Task: “End-to-end Task-oritented Dialogue System in Indonesia”.
Supervised by Ayu Purwarianti, Samuel Cahyawijaya, and Genta Indra Winata.

Working Experiences

Research Engineer II — Mohamed bin Zayed University of Artificial Intelligence

Nov 2024 - Present
Abu Dhabi, UAE

Dept: Natural Language Processing

AI Engineer Intern — GLAIR

Nov 2022 - Aug 2023
Jakarta, Indonesia

Dept: Computer Vision

AI Engineer Intern — Prosa.ai

May - Oct 2022
Bandung, Indonesia

Dept: Natural Language Processing

Academic & Laboratory Assistant — Institut Teknologi Bandung

2021 – 2024
Bandung, Indonesia

Courses: Natural Language Processing, Programming Fundamentals, Introduction to Computation

Publications

Selected papers & preprints (chronological order).

Grounding AI-in-Education Development in Teachers' Voices: Findings from a National Survey in Indonesia

Nurul Aisyah, Muhammad Dehan Al Kautsar, Arif Hidayat, Fajri Koto. (2026).

Preprint — arXiv:2604.01630

PDF

Cultural Benchmarking of LLMs in MSA and Arabic Dialectal Dialogue

Muhammad Dehan Al Kautsar, Saeed Almheiri, Momina Ahsan, Bilal Elbouardi, ... , Zhuohan Xie, Junhong Liang, Mohammad Rustom Al Nasar, Preslav Nakov, Fajri Koto. (2026).

In: ACL 2026 Main

PDF (Coming soon)

Vision Language Models are Confused Tourists

Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji. (2025).

In: CVPR 2026 Findings

PDF

What Do Indonesians Really Need from Language Technology? A Nationwide Survey

Muhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya, Fajri Koto. (2025).

In: EMNLP 2025 Main

PDF

IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages

Muhammad Falensi Azmi, Muhammad Dehan Al Kautsar, Alfan Wicaksono, Fajri Koto. (2025).

In: EMNLP 2025 Main

PDF

Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer

Muhammad Dehan Al Kautsar, Fajri Koto. (2025).

Preprint — arXiv:2510.06128

PDF

SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages

Muhammad Dehan Al Kautsar, Aswin Candra, Muhammad Alif Al Hakim, Maxalmina Satria Kahfi, Fajri Koto, Alham Fikri Aji, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Genta Indra Winata. (2025).

Preprint — arXiv:2508.07069

PDF

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations

Saeed Almheiri, Yerulan Kongrat, Adrian Santosh, Ruslan Tasmukhanov, Josemaria Loza Vera, Muhammad Dehan Al Kautsar, Fajri Koto. (2025).

In: AACL-IJCNLP 2025 Main

PDF

Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

Nurul Aisyah, Muhammad Dehan Al Kautsar, Arif Hidayat, Raqib Chowdhury, Fajri Koto. (2025).

In: AIED 2026 Main Conference

PDF

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation

Naila Shafirni Hidayat, Muhammad Dehan Al Kautsar, Alfan Wicaksono, Fajri Koto. (2025).

In: Eval4NLP 2025 Workshop (Co-located with AACL-IJCNLP 2025)

PDF

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James Validad Miranda, ... , Muhammad Dehan Al Kautsar, ... , Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng-Xin Yong, Samuel Cahyawijaya (2024).

In: EMNLP 2024 Main

PDF

IndoToD: A Multi-Domain Indonesian Benchmark for End-to-End Task-Oriented Dialogue Systems

Muhammad Dehan Al Kautsar, Rahmah Khoirussyifa' Nurdini, Samuel Cahyawijaya, Genta Indra Winata, Ayu Purwarianti. (2023).

In: SEALP 2023 Workshop (Co-located with AACL-IJCNLP 2023). BEST PAPER

PDF

Contact

You can reach me via email or follow me on social media.

muhammad.dehan@mbzuai.ac.ae

Location

Abu Dhabi, United Arab Emirates

Profiles