Selected Publication List See full list in Google Scholar

Research areas: Generative AI - Robust and Trustworthy Deep Learning - Video or Multimodal Understanding - Curriculum Learning


Generative AI


An Overview of Lu's Research on Image and Video Generation using Transformer.
graph LR; BLT[BLT
ECCV'22] --> |image| MaskGIT[MaskGIT
CVPR'22] MaskGIT[MaskGIT
CVPR'22] --> |text-to-image | MUSE[MUSE
ICML'23] MaskGIT[MaskGIT
CVPR'22] --> |video| MAGVIT[MAGVIT
CVPR'23] MUSE[MUSE
ICML'23] --> StyleDrop[StyleDrop
NeurIPS'23] MAGVIT[MAGVIT
CVPR'23] --> MAGVITv2[MAGVIT-v2
ICLR'24] MAGVIT[MAGVIT
CVPR'23] --> |+semantics| SPAE[SPAE
NeurIPS'23] MAGVITv2[MAGVIT-v2
ICLR'24] --> |LLM| VideoPoet[VideoPoet] MAGVITv2[MAGVIT-v2
ICLR'24] --> |Diffusion| WALT[WALT] click BLT "https://arxiv.org/abs/2112.05112" click MaskGIT "https://arxiv.org/abs/2202.04200" click MUSE "https://muse-model.github.io/" click MAGVIT "https://magvit.cs.cmu.edu/" click MAGVITv2 "https://magvit.cs.cmu.edu/v2/" click SPAE "https://arxiv.org/abs/2306.17842" click VideoPoet "https://sites.research.google/videopoet/" click StyleDrop "https://styledrop.github.io/" click WALT "https://walt-video-diffusion.github.io/"
  • VideoPoet: A Large Language Model for Zero-Shot Video Generation
    Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, Lu Jiang
    preprint 2023 [pdf], [project page],

  • Photorealistic Video Generation with Diffusion Models
    Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
    preprint 2023 [pdf], [project page known as WALT],

  • Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
    Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
    ICLR 2024 [pdf], [project page known as MAGVIT-v2],

  • SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs [Spotlight]
    Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang
    NeurIPS 2023 [pdf], [code (coming)],

  • StyleDrop: Text-to-Image Generation in Any Style
    Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan
    NeurIPS 2023 [pdf], [project page],

  • MAGVIT: Masked Generative Video Transformer [Highlight]
    Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
    CVPR 2023 [pdf], [code],

  • Muse: Text-To-Image Generation via Masked Generative Transformers
    Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
    ICML 2023 [pdf], [project page],

  • MaskGIT: Masked Generative Image Transformer
    Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
    CVPR 2022 [pdf], [code]

  • Visual Prompt Tuning for Generative Transfer Learning
    Kihyuk Sohn, Huiwen Chang, José Lezama, Luisa Polania, , Han Zhang, Yuan Hao, Irfan Essa, Lu Jiang
    CVPR 2023 [pdf], [code]

  • Auditing Gender Presentation Differences in Text-to-Image Models
    Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang
    arXiv preprint arXiv:2302.03675 [pdf], [code]

  • ViTGAN: Training gans with vision transformers [Spotlight]
    Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
    ICLR 2022 [pdf], [code]

  • Discrete Predictor-Corrector Diffusion Models for Image Synthesis
    Jose Lezama, Tim Salimans, Lu Jiang, Huiwen Chang, Jonathan Ho, Irfan Essa
    ICLR 2023 [pdf]

  • Improved Masked Image Generation with Token-Critic
    Jose Lezama, Huiwen Chang, Lu Jiang, Irfan Essa
    ECCV 2022 [pdf], [project page]

  • Regularizing Generative Adversarial Networks under Limited Data
    Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang
    CVPR 2021 [pdf], [code]

  • Text as Neural Operator: Image Manipulation by Text Instruction
    Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa
    ACM Multimedia 2021 [pdf], [code], [patent]

  • RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
    Hung-Yu Tseng*, Hsin-Ying Lee*, Lu Jiang, Ming-Hsuan Yang, Weilong Yang
    ECCV 2020 (*equal contribution) [pdf]

  • BLT: Bidirectional Layout Transformer for Controllable Layout Generation
    Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa
    ECCV 2022 [pdf], [code]

  • Neural Design Network: Graphic Layout Generation with Constraints [Spotlight]
    Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang
    ECCV 2020 [pdf]


Robust and Trustworthy Deep Learning


  • Pyramid Adversarial Training Improves ViT Performance [Oral, Best paper finalist]
    Charles Herrmann*, Kyle Sargent*, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun
    CVPR 2022 (*equal contribution) [pdf]

  • Robust Neural Machine Translation with Doubly Adversarial Inputs [Best paper candidate]
    Yong Cheng, Lu Jiang, Wolfgang Macherey
    ACL 2019 [pdf], [Google AI blog]

  • Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels
    Lu Jiang, Di Huang, Mason Liu, Weilong Yang
    ICML 2020 [pdf], [slides], [video], [Google AI blog], [dataset, tfds], [code]

  • MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
    Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
    ICML 2018 [pdf], [supplementary materials], [code], [slides]

  • Confident Learning: Estimating Uncertainty in Dataset Labels
    Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang
    Journal of Artificial Intelligence Research (JAIR) 2021 [pdf], [code(cleanlab)]

  • Regularizing Generative Adversarial Networks under Limited Data
    Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang
    CVPR 2021 [pdf], [code]

  • Faster Meta Update Strategy for Noise-Robust Deep Learning [Oral]
    Youjiang Xu, Linchao Zhu, Lu Jiang, Yi Yang
    CVPR 2021 [pdf], [code]

  • Discrete Representations Strengthen Vision Transformer Robustness
    Chengzhi Mao, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa
    ICLR 2022 [pdf], [code]

  • Contrastive Adaptation Network for Single- and Multi-Source Domain Adaptation
    Guoliang Kang, Lu Jiang, Yunchao Wei, Yi Yang, Alexander G. Hauptmann
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021 [pdf], [code].


Video or Multimodal Understanding


  • Peeking into the future: Predicting Future Person Activities and Locations in Videos
    Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, Li Fei-Fei
    CVPR 2019 [pdf], [code], [demo video]

  • The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
    Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann
    CVPR 2020 [pdf], [code], [blog post]

  • Composing Text and Image for Image Retrieval - An Empirical Odyssey [Oral]
    Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays
    CVPR 2019 [pdf], [code]

  • Eidetic 3D LSTM: A Model for Video Prediction and Beyond
    Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei
    ICLR 2019 [pdf], [code]

  • Focal Visual-Text Attention for Memex Question Answering.
    Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, Alexander Hauptmann
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2019 [pdf], [dataset], [code], [demo video]

  • Focal Visual-Text Attention for Visual Question Answering [Spotlight]
    Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann
    CVPR 2018 [pdf], [project page]

  • Revisiting EmbodiedQA: A Simple Baseline and Beyond
    Yu Wu, Lu Jiang, Yi Yang
    IEEE Transactions on Image Processing 2020 [pdf]

  • Graph Distillation for Action Detection with Privileged Information
    Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei
    ECCV 2018 [pdf], [project page]

  • Switchable Novel Object Captioner
    Yu Wu, Lu Jiang, Yi Yang
    TPAMI 2022 [pdf] and ACM MM 2018 [pdf]

  • Delving Deep into Personal Photo and Video Search
    Lu Jiang, Yannis Kalantidis, Liangliang Cao, Sachin, Farfade, Jiliang Tang, Alex Hauptmann
    WSDM 2017 [pdf]

  • Revealing Event Saliency in Unconstrained Video Collection
    Dingwen Zhang, Junwei Han, Lu Jiang, Senmao Ye, Xiaojun Chang
    IEEE Transactions on Image Processing 26(4): 1746-1758 2017 [pdf]

  • Web-scale Multimedia Search for Internet Video Content
    Lu Jiang
    WWW 2016 [pdf], [full thesis]

  • Fast and Accurate Content-based Semantic Search in 100M Internet Videos
    Lu Jiang, Shoou-I Yu, Deyu Meng, Yi Yang, Teruko Mitamura, Alexander Hauptmann
    ACM MM 2015, [pdf], [slides], [project page]

  • Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos [Best paper candidate]
    Lu Jiang, Shoou-I Yu, Deyu Meng, Teruko Mitamura, Alexander Hauptmann
    ICMR 2015, [pdf], [slides], [project page]
  • Content-Based Video Search over 1 Million Videos with 1 Core in 1 Second
    Shoou-I Yu, Lu Jiang, Zhongwen Xu, Yi Yang and Alexander Hauptmann
    ICMR 2015, [pdf]

  • Zero-Example Event Search using MultiModal Pseudo Relevance Feedback
    Lu Jiang, Teruko Mitamura, Shoou-I Yu, Alexander Hauptmann
    ICMR 2014, [pdf], [slides]

  • Leveraging High-level and Low-level Features for Multimedia Event Detection
    Lu Jiang, Alexander Hauptmann, Guang Xiang
    ACM MM 2012, [pdf], [slides]

  • CMU-Informedia@ TRECVID 2014 [Top performer in the TRECVID multimedia event detection task 2014]
    Shoou-I Yu, Lu Jiang, Zhongwen Xu, Zhenzhong Lan, Shicheng Xu, Xiaojun Chang, Xuanchong Li, Zexi Mao, Chuang Gan, Yajie Miao, Xingzhong Du,Yang Cai, Lara Martin, Nikolas Wolfe, Anurag Kumar, Huan Li, Ming Lin, Zhigang Ma, Yi Yang, Deyu Meng, Shiguang Shan, Pinar Duygulu Sahin, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Teruko Mitamura, Richard Stern and Alexander Hauptmann.
    NIST TRECVID 2014 [pdf]

  • CMU-Informedia@ TRECVID 2013 [Top performer in the TRECVID multimedia event search task 2013]
    Zhen-Zhong Lan, Lu Jiang, Shoou-I Yu, Shourabh Rawat, Yang Cai, Chenqiang Gao, Shicheng Xu, Haoquan Shen, Xuanchong Li, Yipei Wang, Waito Sze, Yan Yan, Zhigang Ma, Nicolas Ballas, Deyu Meng, Wei Tong, Yi Yang, Susanne Burger, Florian Metze, Rita Singh, Bhiksha Raj, Richard Stern, Teruko Mitamura, Eric Nyberg, and Alexander Hauptmann
    NIST TRECVID 2013 [pdf]

  • Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations [Best paper candidate]
    Lu Jiang, Wei Tong, Deyu Meng, Alexander Hauptmann
    ICMR 2014, [pdf], [slides]


Curriculum Learning


  • MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
    Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
    ICML 2018 [pdf], [supplementary materials], [code], [slides]

  • Self-paced Curriculum Learning. [Oral]
    Lu Jiang, Deyu Meng, Qian Zhao, Shiguang Shan, Alexander Hauptmann
    AAAI 2015, [pdf], [supplementary materials], [demo code]

  • Self-paced Learning with Diversity
    Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann
    NIPS 2014 (NeurIPS 2014), [pdf], [supplementary materials], [project page]

  • Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search
    Lu Jiang, Deyu Meng, Teruko Mitamura, Alexander Hauptmann
    ACM MM 2014, [pdf], [slides]

  • Self-paced Learning for Matrix Factorization
    Qian Zhao, Deyu Meng, Lu Jiang, Qi Xie, Zongben Xu, Alexander Hauptmann
    AAAI 2015, [pdf], [supplementary materials]


Miscellaneous


  • Improvements to Speaker Adaptive Training of Deep Neural Networks [Best poster at SLT]
    Yajie Miao, Lu Jiang, Hao Zhang, Florian Metze
    SLT 2014, [pdf], [project page]

  • Mining Learning-Dependency between Knowledge Units from Text
    Jun Liu, Lu Jiang, Zhaohui Wu, Qinghua Zheng, Yanan Qian
    The VLDB Journal, 20(3): 335-345, 2011, [pdf]