Source Themes

FaceSnap: Enhanced ID-Fidelity Network forTuning-Free Portrait Customization

Benefiting from the significant advancements in text-to-image diffusion models, research in personalized image generation, particularly …

Benxiang Zhai(翟本祥), Yifang Xu(徐一舫), Guofeng Zhang, Yang Li(李杨), Sidan Du(都思丹)

HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion

Recent advancements in diffusion-based technologies have made significant strides, particularly in identity-preserved portrait …

Yifang Xu(徐一舫), Benxiang Zhai(翟本祥), 孙运卓, Ming Li(李明), Yang Li(李杨), Sidan Du(都思丹)

ATHENA - Autonomous Vehicle Trajectory Planning Considered Human Action Awareness

Large language models have brought revolutionary changes to autonomous driving algorithms, ushering them into the era of multimodality. …

Jinghao Cao(曹靖豪), Sheng Liu(刘晟), Chaofan Wu(武超凡), Yang Li(李杨), Sidan Du(都思丹)

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models

The target of video moment retrieval (VMR) is predicting temporal spans within a video that semantically match a given linguistic …

Yifang Xu(徐一舫), 孙运卓, Benxiang Zhai(翟本祥), Wenxin Liang, Yang Li(李杨), Sidan Du(都思丹)

HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion

In this paper, we propose a global gaze following method using the patched‐based multi‐task multi‐scale reborn network (MMRGaze360) …

Jingzhao Dai(戴京昭), Yang Li(李杨), Sidan Du(都思丹)

HP3: Tuning-Free Head-Preserving Portrait Personalization Via 3D-Controlled Diffusion Models

Portrait personalization (PP) has garnered considerable attention recently due to its potential applications. However, existing methods …

Yifang Xu(徐一舫), Chenyu Zhang, Benxiang Zhai(翟本祥), Sidan Du(都思丹)

An efficient action proposal processing approach for temporal action detection

Temporal action detection is a fundamental yet challenging task in video understanding. It is important to process the action proposals …

Xuejiao Hu(胡雪娇), Jingzhao Dai(戴京昭), Ming Li(李明), Yang Li(李杨), Sidan Du(都思丹)

CASSC: Context-aware method for depth guided semantic scene completion

Semantic scene completion is a crucial end-to-end 3D perception task, and the 3D information perception subjects is vital for …

Jinghao Cao(曹靖豪), Ming Li(李明), Sheng Liu(刘晟), Yang Li(李杨), Sidan Du(都思丹)

ARES: Text-Driven Automatic Realistic Simulator for Autonomous Traffic

The large-scale generation of real-world scenario datasets is a pivotal task in the field of autonomous driving. Existing methods …

Jinghao Cao(曹靖豪), Sheng Liu(刘晟), Xiong Yang(杨雄), Yang Li(李杨), Sidan Du(都思丹)

Modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection

Given a video and a linguistic query, video moment retrieval and highlight detection (MR&HD) aim to locate all the relevant spans, …

Yifang Xu(徐一舫), Yunzhuo Sun, Benxiang Zhai(翟本祥), Zien Xie(谢子恩), Youyao Jia, Sidan Du(都思丹)