ArabicNLP 2026 Shared Task

ArGuard

Harmful Content Detection in Arabic Memes and LLM Prompts

A two-track benchmark for Arabic content safety research, combining multimodal hateful meme understanding with harmful prompt detection for LLM interactions.

Overview

Arabic safety evaluation across modalities

Harmful Arabic content appears in multiple forms: memes that combine visual cues with embedded text, and prompts that seek unsafe responses from large language models. ArGuard evaluates both settings with a shared design that moves from binary detection to fine-grained categorization.

The task emphasizes Arabic-specific challenges such as dialect use, sarcasm, code-switching, cultural references, and the interaction between language and visual context.

Tasks

Two complementary tracks

Participants may work on either track or both. Each track includes a binary classification subtask and a fine-grained categorization subtask.

Track A

Multimodal Hateful Meme Detection

Systems receive an Arabic meme image with extracted text and predict whether the meme is hateful, then identify the fine-grained label set associated with the content.

  • A1: Hateful vs. Not Hateful classification.
  • A2: Multi-label fine-grained category prediction.

Labels include mocking, incitement, dehumanization, slurs, contempt, inferiority, exclusion, stereotyping, extremism, threat, insults, historical, humor, sarcasm, and other categories.

Task Details
Track B

Textual Harmful Prompt Detection

Systems receive an Arabic prompt directed at an LLM and decide whether it is safe or unsafe, then classify unsafe prompts by harm domain.

  • B1: Safe vs. Unsafe prompt detection.
  • B2: Harm category classification.

Harm domains include self-harm, harm to others, harassment, adult content, bullying, hate speech, and fraud or illegal activities.

Task Details

Data

Dataset summary

Training, development, and test splits are prepared for release after task acceptance. Baselines and evaluation scripts will be published with the starter kit.

Track A: Summary (Binary Labels)

Set Train Dev Test Total
Hateful 1,717 263 493 2,473
Not Hateful 2,177 312 761 3,250
Total 3,894 575 1,254 5,723

Binary high-level distribution of label assignments in the 5K meme dataset.

Track A: Fine-Grained (Multilabel)

Label Train Dev Test
Hateful Categories
Mocking 706 90 211
Incitement 320 51 85
Dehumanization 247 42 58
Slurs 252 42 47
Contempt 107 18 50
Inferiority 57 14 32
Exclusion 10 4 3
Other (Hateful) 18 2 7
Not Hateful Categories
Sarcasm 934 126 333
Humor 863 136 332
Other (Not Hateful) 380 50 96

Breakdown of fine-grained categories. A single meme often exhibits multiple instances, hence why fine-grained counts exceed the dataset size.

Track B: Prompt Safety (Subtask B1)

Category Train Dev Test Total
Safe 2,789 348 350 3,487
Unsafe 17,267 2,158 2,161 21,586
Total 20,056 2,506 2,511 25,073

Baseline classification distribution determining whether an LLM prompt request is safe or unsafe.

Track B: Harmful Categories (Subtask B2)

Category Train Dev Test Total
Adult Content 5,249 656 656 6,561
Self-Harm 3,932 491 492 4,915
Harming Others 3,473 434 435 4,342
Harassment 2,574 322 322 3,218
Fraud & Deception 2,016 252 252 2,520
Bully 1,734 217 217 2,168
Hate Speech 1,078 134 135 1,347

Complete taxonomy of unsafe prompts filtered by their specific harmful intent type.

Timeline

Important dates

All dates are tentative and aligned with the ArabicNLP 2026 shared task schedule.

  1. Registration deadline and beginning of the evaluation cycle (test sets release).
  2. End of the evaluation cycle (run submission).
  3. Release leaderboard.
  4. Shared task papers due date.
  5. Notification of acceptance.
  6. Camera-ready papers due.
  7. ArabicNLP conference.

Resources

Links for participants

Core participant resources will be updated here as data, starter-kit material, baselines, and the CodaBench competition become available.

Organizers

Task organizing team

Firoj Alam

Qatar Computing Research Institute, HBKU

Md. Rafiul Biswas

Hamad Bin Khalifa University

Mohamed Bayan Kmainasi

Qatar Computing Research Institute, HBKU

Ali Ezzat Shahroor

Qatar Computing Research Institute, HBKU

Hamdy Mubarak

Qatar Computing Research Institute, HBKU

Wajdi Zaghouani

Northwestern University in Qatar

Acknowledgments

Acknowledgments

The contributions of this work were funded by the NPRP grant 14C-0916-210015, which is provided by the Qatar National Research Fund (a member of Qatar Foundation).

Contact

Questions?

For task updates, questions, and coordination, contact the organizers by email or slack.