Projects

PhD Projects

Audiovisual Singing Voice Separation

Separating a song into vocal and accompaniment components is an active research topic, and recent years witnessed an increased performance from supervised training using deep learning techniques. We propose to apply the visual information corresponding to the singers’ vocal activities to further improve the quality of the separated vocal signals.




URSing Dataset

We introduce a dataset for facilitating audio-visual analysis of singing performances. The dataset comprises a number of singing performances as audio and video recordings. Each song contains the isolated track of solo singing voice and the mixure with accompaniment track. We anticipate that the dataset will be useful for multi-modal analysis of singing performances, such as audiovisual singing voice separation, and serve as ground-truth for evaluations.



Query by Video: Cross-modal Music Retrieval

We present the results where given an unconstrained video we recommend music from a large catalog based on the deep emotion representations learned from the two modalities.





Skeleton Plays Piano: Generating Pianist Movement from MIDI Data

We train a model to take the input of MIDI data, and output the visual performance as expressive body movements for pianist. It can be used for demonstration purpose for music learners, or immersive music enjoyment system, or human-computer interactions in automatic accompaniment systems. We show all the demo videos of the generated visual performance.



Audio-visual Analysis of Music Performance

We propose to leverage visual information captured from music performance videos to advance several music information retrieval (MIR) tasks, such as source association, multi-pitch analysis, and vibrato analysis.





URMP Dataset

We create an audio-visual, multi-track, and multi-instrument music performance dataset that comprises a number of chamber music assembled from coordinated but separately recorded performances of individual tracks. With ground-truth pitch/note annotations and clean individual audio tracks available, this can be used for multi-modal analysis of music performance.



Score Following For Expressive Piano Performance

We address the "sustained effect" in piano music performance, caused by the usage of sustained pedal or legato articulations. Due to this effect, the mixture of energy between the sustained and following notes (non-notated in the score) always results in delay erros in score following systems. We propose to modify the audio feature representations to reduce the sustained effect and enhance the robustness of score following systems.



Automatic Lyrics Display For A Live Chorus Performance

Live musical performances (e.g., choruses, concerts, and operas) often require the display of lyrics for the convenience of the audience. We propose a computational system to automate this real-time lyrics display process using signal processing techniques