Haoye CAI

Master of Science in Computer Science, Stanford University | hcaiaa@stanford.edu

I am currently a Master's student in Stanford University studying Computer Science. I finished my Bachelor's degree with a double-major in Computer Science and Mathematics in Hong Kong University of Science and Technology, with a GPA of 4.012/4.3. I also had an exchange semester in Georgia Institute of Technology, with a GPA of 4.0/4.0. My resume can be downloaded here.

My research interests include: Computer Vision, Deep Learning, Statistical Machine Learning and Artificial Intelligence.
My strengths are in the following fields:
     Optical Flow, Scene Flow
     Medical Imaging
     Human Pose Estimation, Human Pose/Motion Generation
     Generative Models, Video Generation


Deep Video Generation, Prediction and Completion of Human Action Sequences

Haoye Cai*, Chunyan Bai*, Yu-Wing Tai, and Chi-Keung Tang
European Conference on Computer Vision (ECCV), 2018

Computer Vision, Video Generation, Generative Models

We propose a two-stage generative model to solve human action video generation, prediction and completion uniformly. Our method can generate better videos than existing state-of-the-art methods both qualitatively and quantitatively.

Paper available at: Here
Links: Project Page | Video Result Demo (Highlight!)

June -- November 2017

Cross-modality Training to Learn Cardiac Motion Flow for SSFP MRI Images

Computer Vision, Medical Imaging, Optical Flow

- In process of submission, First Author
We propose a novel framework (for cardiac motion flow estimation) that utilizes motion correspondence from another modality DENSE as supervision to learn cardiac motion flow in ordinary SSFP MRI images. Our method outperforms existing state-of-the-art optical flow algorithms applied on this medical imaging domain.

Links: Project Page | Video Demo (Highlight!) | Slides

January -- May 2017


CodeIT Suisse 2016

Web Development, Backend Development, Fintech

This is the solution project for CodeIT Suisse 2016 hackathon competition, where we won the championship as a group of five. In this project, we built a high frequency arbitrage trading solution for several stock market using a master-slave architecture designed by ourselves to enhance concurrency.
Skills used: Nodejs cluster, Redis Queue, AMI, Firebase, D3.js, etc.
Links: Project Page | CodeIT Suisse

October 2016

JOS with extended paging system

Operating System, Kernel Design

In this project, we built a fully-functional micro operating system JOS with extended paging system. We implemented paging to disk so that virtual memory could exceed RAM. Furthermore, we proposed a novel paging heuristic in order to enhance the performance of paging system, and also explored the influence of process scheduling policy on paging system
Skills used: C, x86 Assembly.
Links: Project Page

April 2017

Team-Forming Website

Web Development, Backend/Frontend Development

This is the final project for COMP3111H Honors Software Engineering. In this project, we built a easy-to-use, good-looking website for team forming with full functionalities. After registering and logging in to the website, a user can view, create, or join a team and invite team members. All information can be easily viewed and all operations can be easily done within our interface. We also developed a complete user system with a set of access rules.
Skills used: Angularjs, Ionic(for ios app development), Bootstrap, AMI, Firebase, Karma, etc.
Links: Project Page

November 2016


Tencent YouTu Lab

Text Detection and Recognition

I built text recognition pipeline using CRNN and attention model. I also built end-to-end text detection-recognition pipeline, combining two tasks in one model. In this pipeline, I implemented feature transformation to enable our recognition network to reuse features obtained by the detection network. We achieved state-of-the-art text recognition accuracy.

December 2017 - February 2018

SenseTime Group Limited, Hong Kong

3D Human Pose Estimation for Monocular Images

I participated in a summer internship in Algorithm Research under Depth and Reconstruction Team, and studied the topic about 3D human pose estimation for monocular images. I first reproduced prior work in ICCV 2017using fully-connected neural nets to learn 2D-to-3D pose regression. Then I proposed and implemented two potential improvements: First, I built a DenseNet to extract features from raw images, and concatenated the features with 2D poses in multi-stage fashion to compensate for the ambiguity in 2D space. Second, I viewed this problem differently as dimensionality followed by reconstruction, and thus tried PCA space instead of 2D pose space. The results achieved are state-of-the-art. All frameworks are built in multi-gpu mode and deployed on clusters.

June 2017 - August 2017


  • First Place in CodeIT Suisse Coding Challenge, Credit Suisse
  • Hong Kong University of Science and Technology Academic Achievement Medal
  • Dean's List (for each semester), HKUST
  • The Hong Kong Electric Co. Ltd. Scholarship, HKUST
  • The Cheng Foundation Scholarship for Chinese Mainland Undergraduate Students
  • University's Scholarship Scheme for Continuing Undergraduate Students, HKUST
  • HKSAR Government Scholarship Fund - Reaching Out Award, HKUST
  • Second prize in National Olympiad in Informatics in Provinces, CCF