Ph. D. Student of
University of Southern California
I am a first year Ph. D. student in the University of Southern California, under the supervision of Professor Hao Li. Previously, I was in the Machine Learning Group in School of Software, Tsinghua University, under the supervision of Professor Mingsheng Long. I spent 12 wonderful weeks as research assistant in Prof. Kilian Q. Weinberger's lab in Cornell University during the summer of 2017, working with Postdoc Gao Huang. I am currently focusing on unsupervised 3D object reconstruction through rendering. I am also interested in transfer learning and neural architecture design.
Here is my CV
Abstract: Rendering is the process of generating 2D images from 3D assets, simulated in a virtual environment, typically with a graphics pipeline. By inverting such renderer, one can think of a learning approach to predict a 3D shape from an input image. However, standard rendering pipelines involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence suitable for learning. We present the first non-parametric and truly differentiable rasterizer based on silhouettes. Our method enables unsupervised learning for high-quality 3D mesh reconstruction from a single image. We call our framework `soft rasterizer' as it provides an accurate soft approximation of the standard rasterizer. The key idea is to fuse the probabilistic contributions of all mesh triangles with respect to the rendered pixels. When combined with a mesh generator in a deep neural network, our soft rasterizer is able to generate an approximated silhouette of the generated polygon mesh in the forward pass. The rendering loss is back-propagated to supervise the mesh generation without the need of 3D training data. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art unsupervised techniques, both quantitatively and qualitatively. We also show that our soft rasterizer can achieve comparable results to the cutting-edge supervised learning method and in various cases even better ones, especially for real-world data.
Abstract: A technical challenge of deep learning is recognizing target classes without seen data. Zero-shot learning leverages semantic representations such as attributes or class prototypes to bridge source and target classes. Existing standard zero-shot learning methods may be prone to overfitting the seen data of source classes as they are blind to the semantic representations of target classes. In this paper, we study generalized zero-shot learning that assumes accessible to target classes for unseen data during training, and prediction on unseen data is made by searching on both source and target classes. We propose a novel Deep Calibration Network (DCN) approach towards this generalized zero-shot learning paradigm, which enables simultaneous calibration of deep networks on the confidence of source classes and uncertainty of target classes. Our approach maps visual features of images and semantic representations of class prototypes to a common embedding space such that the compatibility of seen data to both source and target classes are maximized. We show superior accuracy of our approach over the state of the art on benchmark datasets for generalized zero-shot learning, including AwA, CUB, SUN, and aPY.
Abstract: The high accuracy of convolutional networks (CNNs) in visual recognition tasks, such as image classification, has fueled the desire to deploy these networks on platforms with limited computational resources, e.g., in robotics, self-driving cars, and on mobile devices. Unfortunately, the most accurate deep CNNs, such as the winners of the ImageNet and COCO challenges, were designed without taking strict compute restrictions into consideration. As a result, these models cannot be used to perform real-time inference on low-compute devices.
Abstract: Compact coding has been widely applied to approximate nearest neighbor search for large-scale image retrieval, due to its computation efficiency and retrieval quality. This paper presents a compact coding solution with a focus on the deep learning to quantization approach, which improves retrieval quality by end-to-end representation learning and compact encoding and has already shown the superior performance over the hashing solutions for similarity retrieval. We propose Deep Visual-Semantic Quantization (DVSQ), which is the first approach to learning deep quantization models from labeled image data as well as the semantic information underlying general text domains. The main contribution lies in jointly learning deep visual-semantic em- beddings and visual-semantic quantizers using carefully-designed hybrid networks and well-specified loss functions. DVSQ enables efficient and effective image retrieval by supporting maximum inner-product search, which is computed based on learned codebooks with fast distance table lookup. Comprehensive empirical evidence shows that DVSQ can generate compact binary codes and yield state-of-the-art similarity retrieval performance on standard benchmarks.
Abstract: Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quan- tization (CDQ) approach, which is the first attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid net- works and well-specified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the cross-modal correlation can be substantially enhanced. CDQ enables efficient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.
- Lisp interpreter and compiler on Heskell [Compiler] (Functional Programming)
- C to LLVM/Python compiler [Compiler] (Principle of Compiler, best project)
- FTP Server [Server] (Computer Networking, best project)
- Carstructor [Game] (Web Front-end Technology)
- Cellular Automata (Software Engineering)
- 3rd prize in 19th Intelligent body contest
- Science and technology Scholarship, 2015
- Science and technology Scholarship, 2016
- Artistic Scholarship, 2016
- Qualcomm Scholarship, 2016
- Sensetime Scholarship, 2017
- Best project in Web Front-end Technology course, Aug. 2016
- Best project in Computer Networking course, Jan. 2017
- From Caffe to Tensorflow
- How to implement a progression bar in python
- Guide on installation of auctex in emacs
- My comprehension about Web front-end programming