I’m currently working on Diffusion-based Large Avatar Model. I got my Ph.D. at Zhejiang University, advised by Prof. Zhou Zhao (赵洲). My acedamic experience include 3D Talking Face Generation and Text-to-Speech (TTS). I have also deeply investigated Deep Reinforcement Learning (DRL) and Multi-Agent Systems (MAS). I have published 15+ papers in high-impact conference/journals, including ICLR, IJCAI, ACL, IEEE TMC, etc.

📝 Publications

🦸 Digital Avatar

NeurIPS 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes
Zhenhui Ye, Tianyun Zhong, Yi Ren, Ziyue Jiang, Jiawei Huang, Rongjie Huang, Jinglin Liu, Chen Zhang, Zehan Wang, Xize Chen, Xiang Yin, Zhou Zhao

NeurIPS 2024

Project Page

MimicTalk aims at training a high-quality personalized digital avatar in several minutes.
By ICS-A2M Model, we can in-context mimicking the talking style of the target ID.
By fine-tuning Real3D-Portrait (Our previous large-scale talking face model), we can achieve fast adaptation in several minutes.

ICLR 2024 Spotlight

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun MA, Zhou Zhao

ICLR 2024 Spotlight

Project Page

Real3D-Portrait is the first one-shot NeRF-based talking face system with realistic head, torso, and background segments.
It facilitates both audio / video-driven one-shot talking face generation.

Arxiv

Geneface++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiangwei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Under Review

Project Page)

GeneFace++ is a modern talking face system that aims to achieve the goal of generalized lip synchronization, good video quality, and high system efficiency.
It greatly improves the stability and efficiency of NeRF-based methods.

ICLR 2023

Geneface: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis
Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao

ICLR 2023 Poster

Project Page

GeneFace is a NeRF-based talking face system that generalizes well to various OOD audios.
It first utilizes a generative model to model the audio-to-motion mapping.

🎙 Speech Synthesis

ACL 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training
Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Zhou Zhao

ACL 2023 Poster

Project Page

CLAPSpeech is the first cross-modal contrastive learning method that focus on extracting prosody-related text representation for text-to-speech (TTS).
It provides a convenient plug-in text encoder applicable for all TTS models to improve prosody.

IJCAI 2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
Zhenhui Ye, Zhou Zhao, Yi Ren, Fei Wu

IJCAI 2022 Poster

Project Page

SyntaSpeech is the first syntax-aware non-autoregressive TTS acoustic model.
We design a syntatic graph encoder to extract syntax-related prosody from text.

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias, Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao, ICLR 2024

📚 Deep Reinforcement Learning

IEEE TMC 2022

Soft-DRGN: Multi-UAV Navigation for Partially Observable Communication Coverage by Graph Reinforcement Learning
Zhenhui Ye, Ke Wang, Yining Chen, Xiaohong Jiang, Guanghua Song.

IEEE transactions on Mobile Computing 2022

Project Page

We propose Soft-DRGN to learn robust stochastic policies for large-scale multi-agent cooperation.
We propose to utilize graph attention network to learn the inter-agent communication.

Applied Intelligence 2022

Improving Sample Efficiency in Multi-Agent Actor-Critic Methods

Zhenhui Ye, Yining Chen, Xiaohong Jiang, Guanghua Song,

Applied Intelligence 2022

We propose Experience Augmentation (EA) to improve the sample efficiency for homogeneous MARL tasks.
We propose a sample-efficient training pipeline called PEDMA.

Multi-agent Deep Reinforcement Learning for Voltage Control with Coordinated Active and Reactive Power Optimization, Daner Hu, Zhenhui Ye, Yuanqi Gao, Zuzhao Ye, Yonggang Peng, Napeng Yu, IEEE transactions on Smart Grid 2022

🎖 Honors and Awards

2022.12 Runner-up in China Graduate AI Innovation Competition (2/1217)
2022.10 Tecent Scholarship (as Ph.D Student) (top 1%)
2021.10 National Scholarship (as Master Student) (Top 1%)
2020.6 Outstanding Graduate of Zhejiang University (as Undergraduate Student) (Top 5%)

📖 Educations

2021.9 - 2025.6 Ph.D student, College of Computer Science and Technology, Zhejiang University, Hangzhou.
2020.06 - 2021.9, Master student, School of Aerospace and Astronautics, Zhejiang University, Hangzhou.
2016.09 - 2020.06, Undergraduate, School of Aerospace and Astronautics, Zhejiang Univeristy, Hangzhou.

Academic Services

Conference Reviewer: ICLR 2023, EMNLP 2023, NeurIPS 2023, ACL 2024, ICLR 2024, CVPR 2024

Zhenhui Ye (叶振辉)

📝 Publications

🦸 Digital Avatar

🎙 Speech Synthesis

📚 Deep Reinforcement Learning

🎖 Honors and Awards

📖 Educations

Academic Services