Transformer-Based Unpaired
Piano Accompaniment Style Transfer

Hsin Ai and Yi-Hsuan Yang
National Taiwan University, Taiwan

Abstract

Arranger-specific style transfer for pop piano covers requires effective content-style disentanglement. To address this, we propose a framework that uses a lead sheet (namely, melody and chords) as a style-agnostic content anchor, enabling precise style manipulation without requiring paired data. We then systematically compare several Transformer-based architectures to evaluate the efficacy of a direct token-based conditioning strategy versus more complicated embedding-based methods. While all approaches effectively capture the target style, our evaluation shows that the simpler token-based model achieves superior performance in both objective and subjective assessments of content preservation and style matching. This finding provides empirical evidences that a robust, explicit content representation (i.e., the lead sheet) is highly effective for this task, offering a practical benchmark for controllable music generation.


The following piano generations are conditioned on two arrangers: arranger 1 and arranger 2, representing different accompaniment styles.
Arranger 1 features a simpler accompaniment style with fewer variations in the notes, while arranger 2 is more complex, often involving dense arrangements and multiple notes played simultaneously.

Task 1: whole song style transfer

Model 1: STYLE-TOK + CONTENT-TOK

lead sheet + style token (song-level)

[Demo 1 - Arranger 1]

  • average rythmic intensity: 0.36
  • average polyphony: 4.13
  • pitch range: 36
  • melodic fidelity: 1.0

[Demo 1 - Arranger 2]

  • average rythmic intensity: 0.51
  • average polyphony: 5.07
  • pitch range: 36
  • melodic fidelity: 1.0

Ablation: lead sheet (no chord) + style token (song-level)

[Demo 2 - Arranger 1]

  • average rythmic intensity: 0.32
  • average polyphony: 3.98
  • pitch range: 29
  • melodic fidelity: 1.0

[Demo 2 - Arranger 2]

  • average rythmic intensity: 0.5
  • average polyphony: 6.43
  • pitch range: 48
  • melodic fidelity: 0.98

Model 3: STYLE-EMB + CONTENT-TOK-

style reference: same segments

[Demo 2 - Arranger 1]

  • average rythmic intensity: 0.45
  • average polyphony: 4.39
  • pitch range:
  • melodic fidelity: 1.0

[Demo 2 - Arranger 2]

  • average rythmic intensity: 0.42
  • average polyphony: 5.59
  • pitch range:
  • melodic fidelity: 0.39

style reference: adjacent segments

[Demo 2 - Arranger 1]

  • average rythmic intensity: 0.43
  • average polyphony: 3.66
  • pitch range:
  • melodic fidelity: 0.97

[Demo 2 - Arranger 2]

  • average rythmic intensity: 0.47
  • average polyphony: 5.36
  • pitch range:
  • melodic fidelity: 0.98

Task 2: style change in one song

Arranger 1 -> 2, change after bar 8

  • average rythmic intensity: 0.48 -> 0.52
  • average polyphony: 5.5 -> 6.1
  • pitch range: 43 -> 44

Arranger 2 -> 1, change after bar 8

  • average rythmic intensity: 0.6 -> 0.42
  • average polyphony: 6.68 -> 4.4
  • pitch range: 48 -> 34