- Advisor: Shree K. Nayar
- Published at: Columbia University, 2024.
- Link: ProQuest Open Access
I hold a
Ph.D. in Computer Science
from Columbia University, where I specialized in audio-visual deep learning. I build and
study systems that work in real time, scale to deployment, and create meaningful user
experiences. My work lies at the intersection of
audio, vision, and human–AI interaction,
advancing both research and applications.
My focus is on
closing the theory-to-product gap: connecting algorithm design and deep
models with practical concerns of data, latency,
deployment, and usability. I believe in a
pluralistic approach to system design
—integrating classical signal processing, modern learning, and human-in-the-loop
interaction—so intelligent systems are not only effective in theory but also responsive
to human needs and creativity.
My Research Interests
Deep Learning Speech Enhancement Audio-Visual Learning Multimodal Representation Cross-Modal Generation Real-Time Systems Human–AI Interaction
Education
Doctor of Philosophy in Computer Science
2018 - 2024
Columbia University, New York, NY
Master of Science in Computer Science
2015 - 2017
Columbia University, New York, NY
Bachelor of Science in Computer Science
2010 - 2014
University of Illinois at Urbana-Champaign, Champaign, IL
Industry / Research Experience
Ph.D. Researcher
2023 - 2025
Snap Inc., New York, NY
Columbia University, New York, NY
- Designed and developed DanceCraft, a real-time, music-reactive 3D dance improv system that trades scripted choreography for spontaneous, engaging improvisation in response to live audio.
- Built a hybrid pipeline: music descriptors (tempo/energy/beat) → graph-based selection of motion segments → state-of-the-art motion in-betweening network for seamless transitions and realism.
- Curated an 8+ hour 3D dance dataset, spanning diverse genres, tempi, and energy levels, enriched with idle behaviors and facial expressions to enhance expressiveness.
- Shipped production features for interactivity & personalization: users (or DJs) drive the dance with any live music; Bitmoji avatar support (used by 250M+ Snapchat users) for personal embodiment.
- Deployed at Snap as a production-ready service, adaptable from kiosks to large-scale online events; showcased at Billie Eilish events (120M+ followers) .
- Evaluated through user studies, demonstrating engaging and immersive experiences.
- Presented at ACM MOCO 2024 .
Ph.D. Researcher
2021 - 2023
Snap Inc., New York, NY
Columbia University, New York, NY
- Proposed an environment- & speaker-specific dereverberation method with a one-time personalization step (measuring a representative RIR + user reading while moving for a short duration).
- Designed a two-stage pipeline (classical Wiener filtering → neural refinement) for robust dereverberation while preserving high-frequency detail.
- Outperformed classical and learned baselines on PESQ/STOI/SRMR; user studies showed strong preference for our results.
- Integrated components into Snap's internal audio enhancement pipeline for immersive/AR and creative tools.
- Presented at INTERSPEECH 2023 .
Ph.D. Researcher
2019 - 2022
SoftBank Group Corp., Tokyo, Japan
Columbia University, New York, NY
- Conducted research on generative denoising and inpainting of everyday soundscapes, reconstructing missing or obscured audio to restore ambient context and temporal continuity.
- Developed a deep generative model with a signal-processing front-end, capable of inferring plausible background textures and transients from partial or noisy inputs; designed and implemented dataset curation, training, and evaluation pipelines.
- Achieved state-of-the-art naturalness and continuity over baselines (objective metrics + perceptual studies), with results showcased in public audio demos and project documentation.
- Outcomes informed subsequent multimodal alignment work (e.g., music–motion synchronization in DanceCraft ).
- Presented at NeurIPS 2020 .
- Follow-up: Extended the approach to real-time/streaming denoising, building a low-latency pipeline suitable for interactive use; presented at ICASSP 2022 .
Ph.D. Researcher
2017 - 2019
Columbia University, New York, NY
- Proposed a planar-mirror "light trap" combined with pulsed time-of-flight (ToF) and first-return measurements to induce multiple ray bounces, mitigate multipath, and enable single-scan, surround 3D capture of geometrically complex shapes.
- Conducted extensive simulations and theoretical analysis, showing that light rays can reach 99.9% of surface area after a few bounces; pyramid trap configurations achieved 99% coverage across diverse objects with ~3 reflections.
- Implemented a fully-working hardware prototype (pulsed ToF + planar mirrors) with bespoke calibration and reconstruction, producing sharper edges and more accurate depth recovery in challenging scenes.
- Presented at CVPR 2018 .
Student Researcher
2016 - 2016
Columbia University, New York, NY
- Developed anchor frame detection algorithms (C++/OpenCV) leveraging facial recognition, color histograms, and a novel adaptive background-based method to improve efficiency and accuracy.
- Processed Chinese video metadata using Python (JieBa, TextBlob) to generate keyword tags with TF-IDF–based weighting ; automated reporting with PrettyTable.
- Presented at ICALIP 2016 .
Earlier Industry Experience
Software Engineer
2014 - 2015
Foxit Software, Fremont, CA
Business Analyst Co-op
2013 - 2013
Monsanto Company, St. Louis, MO
Software Engineer Intern
2012 - 2012
Foxit Software, Fremont, CA
Programming Languages
Python JavaScript C# Java C/C++ HTML/CSS MATLAB LaTeX Docker
Machine Learning & Deep Learning
PyTorch TensorFlow Keras torchaudio librosa SciPy scikit-learn Matplotlib pandas OpenCV
Harmonizing Audio and Human Interaction: Enhancement, Analysis, and Application of Audio Signals via Machine Learning Approaches
Ph.D. Dissertation
DanceCraft: A Music-Reactive Real-time Dance Improv System
Conference Paper
- Authors: Ruilin Xu, Vu An Tran, Shree K. Nayar, and Gurunandan Krishnan
- Published at: In Proceedings of the 9th International Conference on Movement and Computing (MOCO 2024) .
- Link: ACM Digital Library
Neural-network-based approach for speech denoising
US Patent
- Authors: Changxi Zheng, Ruilin Xu, Rundi Wu, Carl Vondrick, and Yuko Ishiwaka
- Patent Info: US Patent 11894012, 2024.
- Link: Google Patents
Personalized Dereverberation of Speech
Conference Paper
- Authors: Ruilin Xu, Gurunandan Krishnan, Changxi Zheng, and Shree K. Nayar
- Published at: In Proceedings of the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) .
- Link: ISCA Archive
Dynamic Sliding Window for Realtime Denoising Networks
Conference Paper
- Authors: Jinxu Xiang, Yuyang Zhu, Rundi Wu, Ruilin Xu, Changxi Zheng
- Published at: In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022).
- Link: IEEE Xplore
Listening to Sounds of Silence for Speech Denoising
Conference Paper
- Authors: Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, and Changxi Zheng
- Published at: In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020) .
- Link: ACM Digital Library
Trapping Light for Time of Flight
Conference Paper
- Authors: Ruilin Xu, Mohit Gupta, and Shree K. Nayar
- Published at: In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) .
- Link: IEEE Xplore
News event understanding by mining latent factors from multimodal tensors
Conference Paper
- Authors: Chun-Yu Tsai, Ruilin Xu, Robert E Colgan, and John R Kender
- Published at: In Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion (iV&L-MM 2016).
- Link: ACM Digital Library
An adaptive anchor frame detection algorithm based on background detection for news video analysis
Conference Paper
- Authors: Ruilin Xu, Chun-Yu Tsai, and John R Kender
- Published at: In Proceedings of the 2016 International Conference on Audio, Language and Image Processing (ICALIP 2016) .
- Link: IEEE Xplore