An
Effective Framework for Speech and Music Segregation
Sidra Sajid, Ali Javed, and Aun Irtaza
Department
of Software Engineering, University of Engineering and Technology Taxila,
Pakistan
Abstract: Speech and music segregation from a single channel is
a challenging task due to background interference and intermingled signals of
voice and music channels. It is of immense importance due to its utility in
wide range of applications such as music information retrieval, singer
identification, lyrics recognition and alignment. This paper presents an
effective method for speech and music segregation. Considering the repeating
nature of music, we first detect the local repeating structures in the signal
using a locally defined window for each segment. After detecting the repeating
structure, we extract them and perform separation using a soft time-frequency
mask. We apply an ideal binary mask to enhance the speech and music
intelligibility. We evaluated the proposed method on the mixtures set at -5 dB,
0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset.
Experimental results demonstrate that the proposed method for speech and music
segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion
Ratio (GNSDR) values.
Keywords: Ideal binary
mask, source segregation, repeating pattern, spectrogram,
speech intelligibility.
Received December 7, 2017; accepted October 28, 2018
https://doi.org/10.34028/iajit/17/4/9