Publications

(2023). Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression. In ECCV 2024.

PDF Cite Blog

(2023). Context Diffusion: In-Context Aware Image Generation. In ECCV 2024.

PDF Cite Project

(2023). GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation. In CVPR 2024.

PDF Cite Project

(2023). Gen2Det: Generate to Detect. In CVPRW 2024.

PDF Cite

(2023). Unaligned Video-Text Pre-training using Iterative Alignment. Under Review.

PDF

(2022). FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. In EMNLP 2022.

PDF Cite

(2022). CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval. In KDD 2022.

PDF Cite Poster Blog

(2021). Large-Scale Attribute-Object Compositions. In arxiv:2105.11373.

PDF Cite Blog