I am currently a Master's student in Stanford University studying Computer Science.
I finished my Bachelor's degree with a double-major in Computer Science and Mathematics in Hong Kong University of Science and Technology, with a GPA of 4.012/4.3. I also had an exchange semester in Georgia Institute of Technology, with a GPA of 4.0/4.0. My resume can be downloaded here.
My research interests include: Computer Vision, Deep Learning, Statistical Machine Learning and Artificial Intelligence.
My strengths are in the following fields:
• Optical Flow, Scene Flow
• Medical Imaging
• Human Pose Estimation, Human Pose/Motion Generation
• Generative Models, Video Generation
Haoye Cai*, Chunyan Bai*, Yu-Wing Tai, and Chi-Keung Tang
European Conference on Computer Vision (ECCV), 2018
We propose a two-stage generative model to solve human action video generation, prediction and completion uniformly. Our method can generate better videos than existing state-of-the-art methods both qualitatively and quantitatively.
Paper available at: Here
Links: Project Page | Video Result Demo (Highlight!)
- In process of submission, First Author
We propose a novel framework (for cardiac motion flow estimation) that utilizes motion correspondence from another modality DENSE as supervision to learn cardiac motion flow in ordinary SSFP MRI images. Our method outperforms existing state-of-the-art optical flow algorithms applied on this medical imaging domain.
Links: Project Page | Video Demo (Highlight!) | Slides
This is the solution project for CodeIT Suisse 2016 hackathon competition, where we won the championship as a group of five. In this project, we built a high frequency arbitrage trading solution for several stock market using a master-slave architecture designed by ourselves to enhance concurrency.
Skills used: Nodejs cluster, Redis Queue, AMI, Firebase, D3.js, etc.
Links: Project Page | CodeIT Suisse
In this project, we built a fully-functional micro operating system JOS with extended paging system. We implemented paging to disk so that virtual memory could exceed RAM. Furthermore, we proposed a novel paging heuristic in order to enhance the performance of paging system, and also explored the influence of process scheduling policy on paging system
Skills used: C, x86 Assembly.
Links: Project Page
This is the final project for COMP3111H Honors Software Engineering. In this project, we built a easy-to-use, good-looking website for team forming with full functionalities. After registering and logging in to the website, a user can view, create, or join a team and invite team members. All information can be easily viewed and all operations can be easily done within our interface.
We also developed a complete user system with a set of access rules.
Skills used: Angularjs, Ionic(for ios app development), Bootstrap, AMI, Firebase, Karma, etc.
Links: Project Page
I built text recognition pipeline using CRNN and attention model. I also built end-to-end text detection-recognition pipeline, combining two tasks in one model. In this pipeline, I implemented feature transformation to enable our recognition network to reuse features obtained by the detection network. We achieved state-of-the-art text recognition accuracy.
I participated in a summer internship in Algorithm Research under Depth and Reconstruction Team, and studied the topic about 3D human pose estimation for monocular images. I first reproduced prior work in ICCV 2017using fully-connected neural nets to learn 2D-to-3D pose regression. Then I proposed and implemented two potential improvements: First, I built a DenseNet to extract features from raw images, and concatenated the features with 2D poses in multi-stage fashion to compensate for the ambiguity in 2D space. Second, I viewed this problem differently as dimensionality followed by reconstruction, and thus tried PCA space instead of 2D pose space. The results achieved are state-of-the-art. All frameworks are built in multi-gpu mode and deployed on clusters.