Piano Accompaniment Style Transfer

Abstract

Arranger-specific style transfer for pop piano covers requires effective content-style disentanglement. To address this, we propose a framework that uses a lead sheet (namely, melody and chords) as a style-agnostic content anchor, enabling precise style manipulation without requiring paired data. We then systematically compare several Transformer-based architectures to evaluate the efficacy of a direct token-based conditioning strategy versus more complicated embedding-based methods. While all approaches effectively capture the target style, our evaluation shows that the simpler token-based model achieves superior performance in both objective and subjective assessments of content preservation and style matching. This finding provides empirical evidences that a robust, explicit content representation (i.e., the lead sheet) is highly effective for this task, offering a practical benchmark for controllable music generation.

The following piano generations are conditioned on two arrangers: arranger 1 and arranger 2, representing different accompaniment styles.
Arranger 1 features a simpler accompaniment style with fewer variations in the notes, while arranger 2 is more complex, often involving dense arrangements and multiple notes played simultaneously.

Task 1: whole song style transfer

Model 1: STYLE-TOK + CONTENT-TOK

lead sheet + style token (song-level)

[Demo 1 - Arranger 1]

average rythmic intensity: 0.36
average polyphony: 4.13
pitch range: 36
melodic fidelity: 1.0

[Demo 1 - Arranger 2]

average rythmic intensity: 0.51
average polyphony: 5.07
pitch range: 36
melodic fidelity: 1.0

Ablation: lead sheet (no chord) + style token (song-level)

[Demo 2 - Arranger 1]

average rythmic intensity: 0.32
average polyphony: 3.98
pitch range: 29
melodic fidelity: 1.0

[Demo 2 - Arranger 2]

average rythmic intensity: 0.5
average polyphony: 6.43
pitch range: 48
melodic fidelity: 0.98

Model 3: STYLE-EMB + CONTENT-TOK-

style reference: same segments

[Demo 2 - Arranger 1]

average rythmic intensity: 0.45
average polyphony: 4.39
pitch range:
melodic fidelity: 1.0

[Demo 2 - Arranger 2]

average rythmic intensity: 0.42
average polyphony: 5.59
pitch range:
melodic fidelity: 0.39

style reference: adjacent segments

[Demo 2 - Arranger 1]

average rythmic intensity: 0.43
average polyphony: 3.66
pitch range:
melodic fidelity: 0.97

[Demo 2 - Arranger 2]

average rythmic intensity: 0.47
average polyphony: 5.36
pitch range:
melodic fidelity: 0.98

Task 2: style change in one song

Arranger 1 -> 2, change after bar 8

average rythmic intensity: 0.48 -> 0.52
average polyphony: 5.5 -> 6.1
pitch range: 43 -> 44

Arranger 2 -> 1, change after bar 8

average rythmic intensity: 0.6 -> 0.42
average polyphony: 6.68 -> 4.4
pitch range: 48 -> 34