Skeleton Plays Piano: Online Generation of Pianist Body Movements from MIDI Performance

Bochen Li, Akira Maezawa, and Zhiyao Duan

This project is in collaboration with the Yamaha Corporation. This project is partially supported by the National Science Foundation under grant No. 1741472, titled "BIGDATA: F: Audio-Visual Scene Understanding".
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

 

Publication

  • Bochen Li, Akira Maezawa, and Zhiyao Duan, Skeleton plays piano: online generation of pianist body movements from MIDI performance, in Proc. International Society for Music Information Retrieval (ISMIR), 2018. <pdf> <slides>

  • Akira Maezawa and Bochen Li, Information processing method, U.S. Patent 16/983,341, November 2020.


  • What is the problem?

    We aim to train a system to generate a virtual pianist animation with expressive performance motions given a symbolic music in MIDI format.

    • Input: a live data stream of key depression actions and the corresponding metric structure (optional)
    • Ouput: a time sequence of body joint coordinates

    Motivation

    • Generating expressive body movement is important for music interactions
    • Most existing framework cannot incorporate music context information for whole-body expressive movement generation

    Applications

    • Demonstration for music learners by replicating a musician's body interpretations of music
    • More immersive music enjoyment experience
    • Visual interactions in automatic computer accompaniment


    What is our approach?

    We first use two CNN structures to parse the raw input of the MIDI note stream and the metric structure, and then feed the extracted feature representations to an LSTM network to generate the body movements, as a sequence of upper-body joint coordinates forming a skeleton.



    Our Results

    Subjective Evaluations

    We conduct subjective evaluations to rate the expressiveness and naturalness of the generated skeleton movements compared with the ones extracted from real human players. More specifically, we recruit 18 subjects from Yamaha company to watch 32 10-sec video excerpts of "skeleton plays piano", 16 from the generated ones, and 16 from the real ones. The rating result is plotted in the following figure, where the tracks with significant different ratings are marked with "*".



    Demo Videos

    All the generated skeleton movements (compared with real human) for the 16 tracks are listed here:




    Visit the YouTube playlist for the above 16 demo videos <here>

    Visit the YouTube playlist for demo videos without comparing with real human <here>