Indian Multilingual Jailbreaking
Built Indic-JailbreakBench and proposed four jailbreak techniques achieving up to 35% higher ASR in Indic languages than English.
CS626 — Speech, NLP, and the Web · IIT Bombay · Guide: Prof. Pushpak Bhattacharyya · Aug 2024 – May 2025
GitHub: JailbreakinLLMs
Overview
LLM safety research has overwhelmingly focused on English. This project asked: how much weaker are safety filters in Indic languages? The answer — significantly weaker.
Key Contributions
- Indic-JailbreakBench: a new dataset of 1,668 multilingual malicious prompts across 12 harm categories, covering major Indic languages
- Four novel jailbreak techniques specifically designed for Indic language morphology and code-switching patterns
- Up to 35% higher Attack Success Rate (ASR) in Indic languages compared to equivalent English prompts
- Multi-judge ASR framework evaluated across six state-of-the-art LLMs for robust cross-lingual safety assessment
Stack
Python · Hugging Face Transformers · Multilingual LLMs · Prompt Engineering · ASR Evaluation Framework