Hi there Welcome to my Homepage!

Hi! I am a junior (3rd-year) undergraduate student at Xidian University.

My research interests include Multimodal learning, Efficient AI and Human-centric AI and other emerging areas in AI. My methodological preference is for approaches that are simple, clear, and extensible. I enjoy exploring diverse research directions and collaborating with researchers across different fields.

Feel free to reach out if you are interested in collaboration or potential opportunities.

News

2026.05 🎉🎉 I begin my internship at Alibaba Group.
2026.04 🎉🎉 Two papers accepted to ACL 2026.
2026.02 🎉🎉 One paper accepted to CVPR 2026.
2026.01 🎉🎉 I begin my internship at TeleAI.
2025.08 🎉🎉 One paper accepted to PRCV 2025.
2025.07 🎉🎉 One paper accepted to ICCV 2025.
2024.10 🎉🎉 I joined the HPC-AI Lab at NUS as a Research Assistant.

Experience

Alibaba Group
2026.05 - Present
MLLM Engineer Intern advised by Liang Ding and Xintong Wang

Institute of Artificial Intelligence of China Telecom
2026.01 - 2026.03
Start My Journey in LLM

National University of Singapore
2024.10 - 2025.10
Research Assistant at HPC-AI Lab advised by Yang You, Wangbo Zhao and Pengfei Zhou

Xidian University
2023.09 - Present
Rank 5/99, B.E at School of Telecommunication Engineering & Research Assistant advised by Xiumei Wang

Publications

(* equal contribution · † corresponding author · ‡ project leader)

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs
Weidong Tang, Jierui Li, Yueling Hou, Zihan Mei, Zhigang Tian, Weicheng Jiao, Can Zhang, Xinyan Wan, Zhiyuan Liang, Pengfei Zhou†, Yang You, Wangbo Zhao†
We propose GroupToM-Bench and show that current models fail at nonlinear group reasoning despite strong individual-level ToM, exposing a clear group cognitive gap.
ACL 2026 [Poster] [arXiv] [code]

MagicBench: Diagnosing Visual Agency Loss and Semantic Dependency in Multimodal LLMs
Tang Da Huang, Weidong Tang‡, Wen Qi Xu, Xianpeng Guo†
The authors introduce MagicBench, a video dataset of magic tricks, to test if Multimodal LLMs actually rely on visual physics or if they are overly dependent on language narratives (which, in magic tricks, are deliberately deceptive).
ACL 2026 [arXiv] [code]

Efficient Video Object Segmentation and Tracking with Recurrent Dynamic Submodel
Weidong Tang, Zhiyuan Liang, Xinyan Wan, Chen Zhu, Zhaopan Xu, Pengfei Zhou, Yan Song, Yang You, Wangbo Zhao†
Proposed a Recurrent Dynamic Submodel for efficient Video Object Segmentation and Tracking. By integrating temporal-prior-guided global dynamic routing and Importance-aware LoRA, it achieves an optimal trade-off between performance and speed using minimal trainable parameters and training data.
CVPR 2026 [Poster] [arXiv] [code]

Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples
Weidong Tang, Xinyan Wan, Xiumei Wang†
Explored inference-time scaling in discrete spaces by mapping them to continuous spaces to obtain density distributions, thereby optimizing the sampling of early coarse-scale features.
PRCV 2025 [Poster] [arXiv] [code]

Projects

WowPage
Weidong Tang, Yue Su.
In collaboration with Yue Su, I refined and improved his original homepage template. A clean standalone template version is coming soon.
Project [code]

Weidong Tang 汤维栋

Hi there Welcome to my Homepage!

News

Experience

Publications

Projects

Awards

Services

Talks