Transformer-Based Unpaired Piano Accompaniment Style Transfer
Hsin Ai and Yi-Hsuan Yang National Taiwan University, Taiwan
Abstract
Arranger-specific style transfer for pop piano covers requires effective content-style disentanglement. To address this, we propose a framework that uses a lead sheet (namely, melody and chords) as a style-agnostic content anchor, enabling precise style manipulation without requiring paired data. We then systematically compare several Transformer-based architectures to evaluate the efficacy of a direct token-based conditioning strategy versus more complicated embedding-based methods. While all approaches effectively capture the target style, our evaluation shows that the simpler token-based model achieves superior performance in both objective and subjective assessments of content preservation and style matching. This finding provides empirical evidences that a robust, explicit content representation (i.e., the lead sheet) is highly effective for this task, offering a practical benchmark for controllable music generation.
The following piano generations are conditioned on two arrangers: arranger 1 and arranger 2, representing different accompaniment styles.
Arranger 1 features a simpler accompaniment style with fewer variations in the notes, while arranger 2 is more complex, often involving dense arrangements and multiple notes played simultaneously.
Task 1: whole song style transfer
Model 1: STYLE-TOK + CONTENT-TOK
lead sheet + style token (song-level)
[Demo 1 - Arranger 1]
average rythmic intensity: 0.36
average polyphony: 4.13
pitch range: 36
melodic fidelity: 1.0
[Demo 1 - Arranger 2]
average rythmic intensity: 0.51
average polyphony: 5.07
pitch range: 36
melodic fidelity: 1.0
Ablation: lead sheet (no chord) + style token (song-level)