Music Source Separation Using ASPP Based on Coupled U-Net Model

Suwon Yang (1), Daewon Lee (2)
(1) Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea
(2) Department of Computer Engineering, Seokyeong University, 124 Seogyeong-ro Seongbuk-gu, Seoul, 02173, Korea
Fulltext View | Download
How to cite (IJASEIT) :
Yang, Suwon, and Daewon Lee. “Music Source Separation Using ASPP Based on Coupled U-Net Model”. International Journal on Advanced Science, Engineering and Information Technology, vol. 11, no. 2, Apr. 2021, pp. 589-94, doi:10.18517/ijaseit.11.2.12833.
Noise has established itself as one of the factors that interfere with modern human life, and various noise canceling techniques have been studied to prevent noise. While the old era's noise-canceling technique focused on the physical soundproofing technique, multiple studies have been conducted on the active noise canceling technique that removes only the activated noise in the current era. Active noise canceling (ANC) or digital noise-canceling technology is based on the sound source separation method. This leads to sound source separation technology, which refers to the technology to separate individual sound signals from mixture sounds. Most of the source separation technologies focus on improving speech, not noise reduction. This technology makes it possible to obtain desired sound information more accurately and further improves noise-canceling technology by eliminating unwanted sound information. To provide deeper capability and more enhanced sound separation than the existing structure, we are focused on coupled U-Net model and Atrous spatial pyramid pooling technique (ASPP). This paper presents the music source separation method that combined Coupled U-Net structure with Atrous spatial pyramid pooling technique. To prove the proposed source separation method, we compared GNSDR, GSIR, and GSAR using MIR-1K, a data set that can evaluate the performance of the music source separation. Performance results show that the proposed source separation method overcame other methods' disadvantages and strengthened the feature map.

Tang, Zhiqiang, et al. “CU-net: coupled U-nets.” arXiv preprint arXiv:1808.06521 (2018).

Chen, Liang-Chieh, et al. “Encoder-decoder with atrous separable convolution for semantic image segmentation.” Proceedings of the European conference on computer vision (ECCV). 2018.

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

Huang, Gao, et al. “Densely connected convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

Park, Sungheon, et al. “Music source separation using stacked hourglass networks.” arXiv preprint arXiv:1805.08559 (2018).

Yuan, Weitao, et al. “Enhanced feature network for monaural singing voice separation.” Speech Communication 106 (2019): 1-6.

Yang, Yi-Hsuan. “Low-rank representation of both singing voice and music accompaniment via learned dictionaries.” ISMIR. 2013.

Andreas Jansson, Eric Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, and Tillman Weyde. “Singing voice separation with deep u-net convolutional networks”. 18th International Society for Music Information Retrieval Conferenceng, Suzhou, China, 2017.

Stoller, Daniel, Sebastian Ewert, and Simon Dixon. “Wave-u-net: A multi-scale neural network for end-to-end audio source separation.” arXiv preprint arXiv:1806.03185 (2018).

Dí©fossez, Alexandre, et al. “Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed.” arXiv preprint arXiv:1909.01174 (2019).

Takahashi, Naoya, and Yuki Mitsufuji. “Multi-scale multi-band densenets for audio source separation.” 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2017.

Stí¶ter, Fabian-Robert, et al. “Open-unmix-a reference implementation for music source separation.” (2019).

Ince, Gí¶khan, et al. “Ego noise suppression of a robot using template subtraction.” 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2009.

Nakajima, Hirofumi, et al. “An easily-configurable robot audition system using histogram-based recursive level estimation.” 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010.

Nakajima, Hirofumi, et al. “An easily-configurable robot audition system using histogram-based recursive level estimation.” 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010.

Luo, Yi, et al. “Deep clustering and conventional networks for music separation: Stronger together.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.

Yu, Dong, et al. “Permutation invariant training of deep models for speaker-independent multi-talker speech separation.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.

Jansson, Andreas, et al. “Singing voice separation with deep U-Net convolutional networks.” (2017).

Stoller, Daniel, Sebastian Ewert, and Simon Dixon. “Wave-u-net: A multi-scale neural network for end-to-end audio source separation.” arXiv preprint arXiv:1806.03185 (2018).

Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. “Segnet: A deep convolutional encoder-decoder architecture for image segmentation.” IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495.

Tan, Ke, and DeLiang Wang. “A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement.” Interspeech. Vol. 2018. 2018.

Chen, Liang-Chieh, et al. “Encoder-decoder with atrous separable convolution for semantic image segmentation.” Proceedings of the European conference on computer vision (ECCV). 2018.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).