Abstract

In the domain of music production and audio processing, the implementation of automatic pitch correction of the singing voice, also known as Auto-Tune, has significantly transformed the landscape of vocal performance. While autotuning technology has offered musicians the ability to tune their vocal pitches and achieve a desired level of precision, its use has also sparked debates regarding its impact on authenticity and artistic integrity. As a result, detecting and analyzing AutoTuned vocals in music recordings has become valuable for music scholars, producers, and listeners. However, to the best of our knowledge, no prior effort has been made in this direction. This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips. The experimental results demonstrate the superiority of the proposed method in terms of both accuracy and robustness when compared to two baseline models: Rawnet2, an end-to-end model proposed for anti-spoofing and widely used for other audio forensic tasks, and a Graph Attention Transformer-based approach specifically designed for singing vocal deepfake detection.

Performance on some external tracks

Ed Sheeran - Photograph (Acoustic Version): This song doesn’t contain any type of auto-tuning and the vocals are raw and untuned.

praddy · Ed Sheeran - Photograph (Acoustic Version)

The Detector Predictions:

Segment Numbers	Auto-Tuned Segments (Number)	Auto-Tuned Segments (%)	Average Likelihood
23	0	0%	0.04%

You & I feat. Kata Kozma: There is low amount of pitch correction in limited segments of the song.

Flux Pavilion · You & I feat. Kata Kozma

The Detector Predictions:

Segment Numbers	Auto-Tuned Segments (Number)	Auto-Tuned Segments (%)	Average Likelihood
16	3	18.75%	18.67%

Timmy Trumpet, SwedishRedElephant, 22Bullets - The City: This song has undergone more intense auto-tuning; however it remains barely noticble for non-professional listeners.

SINPHONY · Timmy Trumpet, SwedishRedElephant, 22Bullets - The City

The Detector Predictions:

Segment Numbers	Auto-Tuned Segments (Number)	Auto-Tuned Segments (%)	Average Likelihood
13	3	23.07%	21.95%

Lua freestyle2: In this track, there is a significant presence of intense auto-tuning throughout. The effect of auto-tuning is fully audible even to non-professional listeners.

CookboyTheo · Lua freestyle 2

The Detector Predictions:

Segment Numbers	Auto-Tuned Segments (Number)	Auto-Tuned Segments (%)	Average Likelihood
9	8	88.88%	88.85%

Runtime evaluation

The average runtime performance of the proposed method on the test dataset (10-second segmetns), evaluated across various backbone architectures, is summarized in the table below. The experiments were conducted on a system with the following specifications:

CPU: AMD EPYC 7742 64-Core Processor
RAM: 32 GB
GPU: NVIDIA A100
Operating System: Ubuntu 22.04.4 LTS

Backbones	Feature Extractor (ms)	Classifier (ms)	Total (ms)
ResNeXt	10.562	0.285	10.847
EfficientNet	21.350	0.285	21.635
ResNet18	5.741	0.285	6.026
ResNet50	10.457	0.285	10.742
B01	-	-	13.379
B02	-	-	10.601

Auto-Tune Detection

This is a deep learning and spectrogram-based tool to detect Auto-Tuned vocals within music recordings.

Abstract

Performance on some external tracks

Runtime evaluation