👋 About Me

I’m currently a 2nd-year PhD student at Tsinghua University Shenzhen International Graduate School, supervised by Prof. Yansong Tang and Prof. Jiwen Lu. I got my bachelor’s degree from the Department of Automation, Tsinghua University in 2023.

My research interests lie in Computer Vision, such as Video Generation, Video Understanding.

Email / Github


✨ News


  • 2024-07: One paper on Embodied Vision is accepted to ECCV 2024
  • 2024-03: One paper on video understanding (Narrative Action Evaluation) is accepted to CVPR 2024
  • 2023-03: One paper on video understanding (Action Quality Assessment) is accepted to CVPR 2023

🔬 Research


  • indicates equal contribution
diseFlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Shiyi Zhang*, Junhao Zhuang*, Zhaoyang Zhang, Yansong Tang
Preprint

We achieve action transfer in heterogeneous scenarios with varying spatial structures or cross-domain subjects.

diseKV-Edit: Training-Free Image Editing for Precise Background Preservation
Tianrui Zhu*, Shiyi Zhang*, Jiawei Shao, Yansong Tang
Preprint
[PDF] [Project Page]

We propose KV-Edit to address the challenge of background preservation in image editing by preserving the key-value pairs of the background, which effectively handles common semantic editing.

diseNarrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang*, Sule Bai*, Guangyi Chen, Lei Chen, Jiwen Lu, Junle Wang, Yansong Tang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[PDF] [Project Page]

We investigate a new problem called narrative action evaluation (NAE) and propose a prompt-guided multimodal interaction framework.

diseLOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[PDF] [Project Page]

LOGO is a new multi-person long-form video dataset for action quality assessment.

diseManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu and Yansong Tang.
European Conference on Computer Vision (ECCV), 2024
[PDF] [Project Page]

We propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction.