Real-Time Score Following for Computer Accompaniment
Project Goals
- Realtime score follower for computer accompaniment
- score is comprised of note events and markup
- note events are generated from MIDI score
- Robust against performer variations/improvisations/mistakes
- adaptive, predictive beat tracking
- score position (measure, beat) sent with timestamp in pseudo real time via OSC
- ultra low latency/offset
Assumptions
Structural
- Score is comprised of sections
- Sections are ordered, but variation is allowed
Unforeseeable Events
- Repeat Section
- Skip Section
- note skip/add/change
- temporal compression/expansion of melody
- tempo variation
Methods
Markup
- Section beginning/end
- time signature
Feature Extraction (Observation)
- Based on Arshia Cont's thorough analysis [1], a low level HMM is effective in modeling individual note events
- Features used are Log of Energy, Spectral Balance, and Peak Structure Match [5]
- States are attack, sustain, and rest

Figure from [5]

Figure from [2]
Alignment
- HMM: each note in the score is a state
- Ghost states are used to model local mismatches (note skipped/note added/wrong note) [4]
- States at the end of a section transition to the beginnings of all potential other sections [6]
- Estimated timestamp of next beat is sent slightly before it is estimated to occur, to account for algorithmic and transmission latency

Figure from [4]
Beat Tracking
- Because we're given the score and time signature, estimating beat position and tempo from observed note events is relatively simple
Corpus
- Annotated texts available for use in MIREX
- Doesn't test structural variation
Evaluation
- Work has been done by Cont, Schwarz, Schnell, and Raphael [3] to develop a quantitative metric for score follower evaluation
- Systems are evaluated by:
- error - time between estimated event time and reference event time
- latency - time between estimated event time and decision reporting time
- offset - time between reference time and decision reporting time
- missed notes - notes not reported
- misaligned notes - notes reported incorrectly (error is beyond a threshold e.g. 300ms)
- Piece Completion - percentage of events followed before the follower got lost
- Acting predictively allows us to tolerate larger latency and offset times while still triggering events in the virtual accompanist that occur synchronously with the actions of the human performer
- Alternatively, for comparison in the framework of Cont et al. we could consider using the predicted beat divisions to time-warp the given score, in which case the event decisions are made before the events actually happen, resulting in negative latency and offset as defined in [3].
Refrences
1. Cont, A. (2004). Improvement of Observation Modeling for Score Following. Dea atiam thesis, IRCAM.
2. Cont, A. (2006). Score Following at Ircam. Proceedings of Mirex.
3. Cont, Arshia, D. Schwarz, N. Schnell, C. Raphael (2007). Evaluation of Real-Time Audio-to-Score Alignment. Proceedings of the International Conference on Music Information Retrieval.
4. Orio, N. and F. Dechelle (2001). Score Following Using Spectral Analysis and Hidden Markov Models. Proceedings of the International Computer Music Conference.
5. Orio, N. and D. Schwarz (2001). Alignment of Monophonic and Polyphonic Music to a Score. Proceedings of the ICMC.
6. Pardo, B. and W. Birmingham (2005). Modeling Form for On-line Following of Musical Performances. Proceedings of the Twentieth National Conference on Artificial Intelligence.





