An Effective Framework for Speech and Music Segregation

An Effective Framework for Speech and Music Segregation

Sidra Sajid, Ali Javed, and Aun Irtaza

Department of Software Engineering, University of Engineering and Technology Taxila, Pakistan

Abstract: Speech and music segregation from a single channel is a challenging task due to background interference and intermingled signals of voice and music channels. It is of immense importance due to its utility in wide range of applications such as music information retrieval, singer identification, lyrics recognition and alignment. This paper presents an effective method for speech and music segregation. Considering the repeating nature of music, we first detect the local repeating structures in the signal using a locally defined window for each segment. After detecting the repeating structure, we extract them and perform separation using a soft time-frequency mask. We apply an ideal binary mask to enhance the speech and music intelligibility. We evaluated the proposed method on the mixtures set at -5 dB, 0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset. Experimental results demonstrate that the proposed method for speech and music segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion Ratio (GNSDR) values.

Keywords: Ideal binary mask, source segregation, repeating pattern, spectrogram, speech intelligibility.

Received December 7, 2017; accepted October 28, 2018 

https://doi.org/10.34028/iajit/17/4/9

Full Text    

Read 1267 times
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…