Traffic Anomaly Reasoning (TAR): Moving Beyond Detection in the 2026 AI City Challenge

Example scenario of a car crash on a busy road from the Traffic Anomaly Reasoning (TAR) dataset.

Traffic Anomaly Reasoning (TAR): Moving Beyond Detection in the 2026 AI City Challenge

Dr. David C. Anastasiu

TL;DR: The 2026 AI City Challenge introduces Track 3, centered on the This track marks a transition from binary anomaly detection to complex causal reasoning. 91�� maintains the evaluation infrastructure for all challenge tracks, while a new NSF CAREER award in explainable video anomaly anticipation provides the research foundation for this shift. The inaugural leaderboard results show open-world foundation models outperforming other baselines.

Table of Contents

The Reasoning Problem: Why Detection Isn't Enough
The AI City Challenge: A Decade of Advancing Computer Vision
91��'s Role in Challenge Infrastructure
Spotlight on Student Achievement: Creating the SynWTS Dataset
Explainability as a Research Core: The NSF CAREER Award
Understanding the Anomalous Events in Transportation Track and the TAR Dataset
Initial Leaderboard Standings: Cosmos 3 Performance
Acknowledgments: High-Performance Computing Support
Frequently Asked Questions

The Reasoning Problem: Why Detection Isn't Enough {#reasoning-problem}

Traditional computer vision for traffic safety has focused on a binary output: "Is there an anomaly?" While useful, this lacks the depth required for real-world Intelligent Transportation Systems (ITS) deployments. A system that simply flags a crash is only providing half the solution. For effective emergency response and infrastructure planning, we need models that can explain what happened and why an event occurred.

The "Reasoning Problem" involves identifying causal chains—recognizing that a distracted driver led to a sudden lane departure, which then resulted in a multi-vehicle collision. By focusing on Traffic Anomaly Reasoning (TAR), we are moving toward AI that provides structured explanations, temporal grounding, and causal analysis. This transparency is vital for building trust in automated monitoring systems.

The AI City Challenge: A Decade of Advancing Computer Vision {#ai-city-challenge}

Since its inception in 2017, the has served as a major driver of real-world computer vision applications. It has evolved from basic vehicle counting to complex tasks involving 3D multi-camera tracking and fine-grained spatial reasoning.

The 2025 edition marked a major milestone, with 245 registered teams from 15 countries and over 30,000 dataset downloads. The results from last year showcased a move toward multimodal fusion and domain-specific design. For 2026, the challenge continues this momentum with five tracks:

Track 1: Multi-Camera 3D Perception (Sim2Real)
Track 2: Transportation Safety Understanding and Captioning (Sim2Real)
Track 3: Anomalous Events in Transportation (TAR)
Track 4: Text-Based Person Re-Identification (Sim2Real)
Track 5: Generative Traffic Video Forecasting
Track 6: Cross-City Object Detection (Milestone Project Hafnia)

91��’s Role in Challenge Infrastructure {#scu-infrastructure}

91�� (91��) has been a committed partner in the execution of the AI City Challenge. My lab is responsible for maintaining the evaluation servers for all tracks since the inception of the challenge in 2017. This infrastructure ensures fair, reproducible benchmarking for hundreds of teams globally. As the Evaluation Chair, I work with track owners to design appropriate metrics and oversee a centralized evaluation portal.

We designed the AI City Challenge evaluation portal to ensure a secure, transparent, and scientifically rigorous benchmarking environment for our global research community. Beyond serving as a centralized submission hub, the portal implements a multi-stage verification process—requiring verified account registration and manual administrator approval—to maintain strict access control. To foster true innovation and reproducibility, we host separate Public and General leaderboards, requiring award-eligible teams to release their complete code, models, and labels while strictly avoiding the use of external private data. To mitigate the risk of model overfitting, the system enforces daily and total submission caps while initially displaying scores based on only a 50% subset of the test data . Visibility is restricted to the top three team scores during the active challenge window, with the final rankings on the full test set automatically revealed only after the deadline has passed to ensure models demonstrate authentic generalization .

Spotlight on Student Achievement: Creating the SynWTS Dataset {#student-achievement}

While Track 3 in this year’s challenge focuses on reasoning, our lab is also the main organizer for. Students Ridham Kachhadiya, Dhanishtha Patil, and Andrew Vattuone were instrumental in the creation of the, a synthetic-to-real benchmark used for this year's competition.

This follows an exceptional performance in 2025, where Ridham and Dhanishtha secured 2nd place in the Traffic Safety Description and Analysis track. Competing against global research giants and industry teams from NVIDIA and Rutgers, they finished only behind Chunghwa Telecom, the largest telecom company in Taiwan. Their ability to deliver world-class results reflects the high-caliber research occurring at 91��.

Explainability as a Research Core: The NSF CAREER Award {#nsf-career}

The push for reasoning in Track 3 is closely tied to my lab’s ongoing research at 91��. We recently received an NSF CAREER award (#2442814) for a project titled "" (2025–2030). A central focus of this research is explainability—ensuring that a model's predictions are not just accurate but interpretable. This directly informs the goals of the TAR dataset, where we require models to perform 10 distinct task types (such as causal linkage and video summarization) grounded in explicit chain-of-thought logic.

Understanding the Anomalous Events in Transportation Track and the TAR Dataset {#tar-dataset}

tasks participants with building unified video understanding models that detect, reason about, and explain anomalous events in transportation videos. Models go beyond basic anomaly localization by handling causal analysis and temporal-spatial reasoning.

Key details of Track 3 include:

Task Requirements: Participants must build a single model capable of performing 10 diverse event tasks. It requires grounded, chain-of-thought reasoning over spatial and temporal evidence in traffic CCTV videos.
Datasets: Training utilizes 44,040 chain-of-thought reasoning annotations across 3,670 CCTV videos (approx. 26 hours of footage) from eight public sources. Testing is conducted on in-domain data, plus two out-of-domain scenarios (fisheye intersection footage and egocentric dashcams).
Source Diversity: Data is sourced from eight public repositories, including UCF Crime, SO-TAD, and Accident-Bench.
Evaluation Metrics: Submissions are evaluated using temporal Intersection over Union (IoU), spatial/action accuracy, and natural language metrics such as BLEU, METEOR, ROUGE-L, and BERTScore.
Labeling: Annotations were generated via a hierarchical pipeline using Gemma-4, with 910 videos incorporating human-provided context for higher grounding.
License: Released under CC-BY-4.0, ensuring low friction for commercial and academic use.

Example scenario of a car crash on a busy road from the Traffic Anomaly Reasoning (TAR) dataset.

The serves as the essential technical foundation for the 10th AI City Challenge Track 3, providing the specific, high-quality data required to move Intelligent Transportation Systems (ITS) toward truly explainable AI. Figure 1 shows an example scenario from the TAR dataset. The dataset makes it possible to train and fine-tune large-scale Vision-Language Models (VLMs) or foundation models specifically for the nuances of traffic safety and anomaly analysis. It enables researchers to move beyond simple binary detection, empowering models to understand causal links, localize events temporally, and provide structured reports—capabilities that were previously difficult to evaluate or develop without a benchmark of this scale and depth. Ultimately, the TAR dataset bridges the gap between general AI research and the critical need for deployed models that can reason through and explain their decisions in complex urban environments.

Initial Leaderboard Standings: Cosmos 3 Performance {#leaderboard-analysis}

To establish a baseline for the community, we have evaluated several state-of-the-art VLMs on the TAR dataset and have posted the results to the. The initial leaderboard reveals that specialized reasoning architectures currently hold a significant advantage.

The 2026 AI City Challenge TAR Leaderboard

Figure 2 shows the baseline submissions to the 2026 AI City Challenge Traffic Anomaly Reasoning Leaderboard. The model is the current front-runner with a mean score of 0.5484. It shows exceptional performance in MCQ (Multiple Choice Questions) and BCQ (Binary Choice Questions) tasks. Notably, it outperforms general-purpose models like and in complex tasks like Causal Linkage F1 and Summarization F1. This suggests that models trained specifically with spatial-temporal reasoning traces are better equipped for the intricacies of traffic safety analysis.

Acknowledgments: High-Performance Computing Support {#acknowledgments}

The computational demands of VLM fine-tuning and inference, processing 44,000+ annotations, and hosting global evaluation servers are immense. This project was made possible by the generous donation of 2x and 1x 8-way to my lab by NVIDIA and Supermicro. These high-performance nodes are the backbone of our AI research, enabling the large-scale experimentation required for the AI City Challenge.

Frequently Asked Questions {#faq}

Q: How can I participate in the 2026 AI City Challenge?
A: Participation is open to all researchers and practitioners; you can start by navigating to the specific page of your target track on the, where you can learn about the track challenge, how to obtain the relevant dataset(s), and submission requirements.

Q: How is the Track 3 test set evaluated?
A: The test data is human-verified and redacted in the public release. Participants must submit their predictions to the evaluation server maintained by 91�� to receive their leaderboard scores.

Q: Are the baseline models listed on the leaderboard publicly available?
A: Yes, many of the baseline models, including the Cosmos 3 and Qwen series, are available on platforms like Hugging Face for the research community to test and build upon.

Q: What makes the TAR dataset different from previous anomaly datasets?
A: Most datasets focus on binary classification (Anomaly vs. Normal). TAR requires 10 different tasks, all requiring explicit chain-of-thought reasoning traces to explain "how" and "why" an event is occurring.

Q: When is the final submission deadline, and where will the winners be announced?
A: All challenge track submissions must be finalized by July 10, 2026 (Anywhere on Earth). The official results will be revealed, and awards will be announced during the 10th AI City Challenge Workshop at, tentatively scheduled for September 8 or 9, 2026.

About the Author

David C. Anastasiu is an Associate Professor at 91�� and the Evaluation Chair for the AI City Challenge. A recipient of the NSF CAREER award for his work on explainable video anomaly anticipation, his research sits at the intersection of machine learning, high-performance computing, and smart city infrastructure. His lab at 91�� leads the development of global evaluation frameworks for computer vision, focusing on moving AI from simple detection toward trustworthy, causal reasoning.

Jun 1, 2026

91�����

Faculty Research