I am a PhD Student in Stanford University's Computer Science Department, advised by Professor Jiajun Wu. Previously, I was a Graduate Research Assistant in Carnegie Mellon's Robotics Institute, co-advised by Professors Chris Atkeson and Oliver Kroemer. Recently, my research has been in the realm of learning for manipulation, especially how robots can use sound during manipulation.
We record thousands of room impulse responses and music clips in different real rooms with humans standing in different positions in the room. Learning-based models can use these minute differences in the room's acoustics to track, identify, or detect humans in the room. Our data can be used to develop more robust and sample-efficient methods, with applications in home assistants, security, and robotics.
We introduce a framework for accomplishing long-horizon tasks in soft-body manipulation and show that it can learn to make dumplings with a variety of tools, with very little training data for each tool. We also show that our framework can learn to use tools to achieve other tasks in soft-body manipulation, such as shaping dough into target shapes, autonomously selecting tools for each step of the task.
We collect 150,000 annotated recordings of impacts of 50 everyday objects, recorded from 600 distinct microphone locations. We show how our data can be used to tune and validate acoustic simulations, or used directly in interesting downstream audio and audiovisual tasks.
Differentiable physics-based models provide a useful bias for learning from impact sounds to solve both forward and backward problems on impact audio. We show we can both infer models from data in the wild, and then use these models to perform source separation better than generic learning-based alternatives.
Deep learning-based data-driven models can both predict the effects of a scooping operation on a granular material using vision and can learn to use audio for feedback on scooping and pouring granular materials.
Deep learning-based data-driven models can accurately predict the amount of granular materials a robot pours or shakes, based only on audio recordings. With machine learning, we can use recordings from a $3 microphone to outperform the measurement resolution of a $3,000 wrist force-torque sensor.
We took the Matrix Cube, a tool for visualizing time-evolving graphs, and developed a new 3D interface controlled by hand gestures. Gestures were captured with a Leap Motion device.
With data collected from a user's brief interactions with common UI elements, such as checkboxes and sliders, machine learning models can uniquely identify the user. Such a system could authenticate a mobile device's user continuously and seamlessly, without many of the vulnerabilities common to traditional authentication methods.
This project was the precursor to my paper on predicting the effects of scooping. I attempted to learn a scooping policy with deep reinforcement learning, posing scooping a goal mass as a Contextual Bandit Problem. I adapted different popular techniques such as Actor-Critic and Cross-Entropy Method. What I learned from my results was very helpful in refining my approach for my later publication.
We designed a mask to walk an untrained rescuer through performing standard-of-care CPR to a cardiac arrest victim, in real time. The mask is equipped with sensors for monitoring the state of the victim and the quality of the CPR and uses a speaker and LEDs to give instructions and cues to the rescuer.
GPA: 4.17/4.33
GPA: 4.0/4.0