SelfRemaster: SSL Speech Restoration

Overview

SelfRemaster: Self-Supervised Speech Restoration

Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Demo

Setup

  1. Clone this repository: git clone https://github.com/Takaaki-Saeki/ssl_speech_restoration.git
  2. CD into this repository: cd ssl_speech_restoration
  3. Install python packages and download some pretrained models: ./setup.sh

Getting started

  • If you use default Japanese corpora
    • Download JSUT Basic5000 and JVS Corpus
    • Downsample them to 22.05 kHz and Place them under data/ as jsut_22k and jvs_22k
    • Place simulated low-quality data under ./data as jsut_22k-low and jvs_22k-low
  • Or you can use arbitrary datasets by modifying config files

Training

You can choose MelSpec or SourFilter models with --config_path option.
As shown in the paper, MelSpec model is of higher-quality.

Firstly you need to split the data to train/val/test and dump them by the following command.

python preprocess.py --config_path configs/train/${feature}/ssl_jsut.yaml

To perform self-supervised learning with dual learning, run the following command.

python train.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, refer to train.py.

Speech restoration

To perform speech restoration of the test data, run the following command.

python eval.py \
    --config_path configs/test/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, see eval.py.

Audio effect transfer

You can run a simple audio effect transfer demo using a model pretrained with real data.
Run the following command.

python aet_demo.py

Or you can customize the dataset or model.
You need to edit audio_effect_transfer.yaml and run the following command.

python aet.py \
    --config_path configs/test/melspec/audio_effect_transfer.yaml \
    --stage ssl-dual \
    --run_name aet_melspec_dual

For other options, see aet.py.

Pretrained models

See here.

Reproducing results

You can generate simulated low-quality data as in the paper with the following command.

python simulated_data.py \
    --in_dir ${input_directory (e.g., path to jsut_22k)} \
    --output_dir ${output_directory (e.g., path to jsut_22k-low)} \
    --corpus_type ${single-speaker corpus or multi-speaker corpus} \
    --deg_type lowpass

Then download the pretrained model correspond to the deg_type and run the following command.

python eval.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

Citation

@article{saeki22selfremaster,
  title={{SelfRemaster}: {S}elf-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling},
  author={T. Saeki and S. Takamichi and T. Nakamura and N. Tanji and H. Saruwatari},
  journal={arXiv preprint arXiv:2203.12937},
  year={2022}
}

Reference

You might also like...

Official repository for "Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations"

Official repository for

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations Hashmat Shadab Malik, Shahina Kunhimon, Muzammal Naseer, Salman Khan, a

Nov 28, 2022

PyTorch codes for "Towards Robust Blind Face Restoration with Codebook Lookup Transformer"

PyTorch codes for

Towards Robust Blind Face Restoration with Codebook Lookup Transformer Paper | Project Page | Video Shangchen Zhou, Kelvin C.K. Chan, Chongyi Li, Chen

Nov 25, 2022

This is a baseline for image restoration.

BaselineIR -Dependencies- python 3.6.9 torch 1.10.1 torchvision 0.11.2 tensorboardX 2.1 -Usage- Prepare dataset: Please ensure that the data organizat

Sep 26, 2022

This is a Image Restoration Framework. You can just focus on the network design and launch it easily.

cv_template   一个图像复原统一框架,可以用于去雾 🌫 、去雨 🌧 、去模糊、夜景 🌃 复原、超分辨率 👾 等等。 Highlights 特色功能 快速搭建baseline,只需生成输入和标签对应的txt文件,无需修改任何代码即可运行 (参数控制)多模型 训练过程监控 命令行日志

Sep 6, 2022

Final Year Project: Surface Electromyography (sEMG) Silent Speech - Automatic Speech Recognition (ASR)

sEMG Silent Speech - Automatic Speech Recognition (ASR) About This module is produced as part of the requirement for my Final Year Project for my BSc

Oct 15, 2022

SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for realtime automatic speech recognition (ASR) supporting multiple open-source ASR engines

SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for realtime automatic speech recognition (ASR) supporting multiple open-source ASR engines

SEPIA Speech-To-Text Server SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for realtime automatic speech recognitio

Nov 24, 2022

Text To Speech Dataset Maker Text to speech is an emerging zone of AI

Text to speech is an emerging zone of AI. This repository helps to create a dataset with audio and transcripts for personalized text to speech .

Oct 18, 2022

[ECE NTUA] Speech & Natural Language Processing - Lab Projects & Theoretical Problems

[ECE NTUA] Speech & Natural Language Processing Lab Projects & Theoretical Problems of the Speech & Natural Language Processing course held by ECE - N

Mar 25, 2022
Comments
  • running train.py compains about lack of data

    running train.py compains about lack of data

    Thank you very much for the interesting paper and the code repo.

    I downloaded jvs and jsut dataset, unpacked them, renamed and degraded them accordingly, e.g.:

    #!/usr/bin/env bash
    
    set -ev
    
    dir=jsut_ver1.1
    
    [ -e "$dir" ] || {
      >&2 echo "error: invalid directory '$dir'"
      exit 1
    }
    
    lowdir="jsut_22k"
    degradedir="jsut_22k-low"
    
    replace_once() {
      s=$1; shift
      from=$1; shift
      to=$1; shift
      env python3 -c "print('$s'.replace('$from', '$to', 1))"
    }
    
    # create subdirs
    find "$dir" -type d | while IFS= read -r line; do
      mkdir -pv "$(replace_once "$line" "$dir" "$lowdir")"
    done
    
    # downsample to 22k
    find "$dir" -type f | sort -n | while IFS= read -r line; do
      [ -e "$line" ] || {
        echo "no such file $line"
        exit 1
      }
      output=$(replace_once "$line" "$dir" "$lowdir")
      [ -e "$output" ] &&  {
        continue
      }
      if [ -z "$(echo "$line" | grep -E ".wav$")" ]; then
        #cp -v "$line" "$output"
        continue
      fi
      echo "downsample '$line' -> '$output'"
      ffmpeg -nostdin -hide_banner -loglevel error -i "$line" -ac 1 -ar 22050 -q:a 0 -y "$output"
    done
    
    # create subdirs
    find "$dir" -type d | while IFS= read -r line; do
      mkdir -p "$(replace_once "$line" "$dir" "$degradedir")"
    done
    
    # degrade audio
    find "$lowdir" -type f | sort -n | while IFS= read -r line; do
      [ -e "$line" ] || {
        echo "no such file $line"
        exit 1
      }
      output=$(replace_once "$line" "$lowdir" "$degradedir")
      [ -e "$output" ] &&  {
        continue
      }
      if [ -z "$(echo "$line" | grep -E ".wav$")" ]; then
        #cp -v "$line" "$output"
        continue
      fi
      echo "degrade '$line' -> '$output'"
      tmp="/tmp/jsut_$(basename "$output")"
      ./degrade_audio.py "$line" "$tmp"
      mv "$tmp" "$output"
    done
    

    Then I do a similar thing with jvs dataset, but restructure so that the *.wav files are found under */*.wav mask somehow (15k files).

    In configs/train/melspec/ssl_jsut.yaml i change:

      source_path: "./data/jsut_22k-low/basic5000/wav"
      aux_path: "./data/jsut_22k/basic5000/wav"
    

    Running this seems to generate a lot of pickles for 5000+14997 files (changing jsut:

    python3 preprocess.py --config_path configs/train/melspec/ssl_jsut.yaml
    

    Then running

    env python3 train.py \
        --config_path configs/train/melspec/ssl_jsut.yaml \
        --stage ssl-dual \
        --run_name ssl_melspec_dual
    

    Produces "index 0 not found" errors in the dataset, e.g:

      File "./ssl_speech_restoration/dataset.py", line 205, in __getitem__
        d_batch["wavstask"] = torch.from_numpy(self.d_out["wavstask"][idx])
    IndexError: index 0 is out of bounds for axis 0 with size 0
    

    Changing ssl-dual into pretrain produces some "augment key not found" error.

    What would be the correct pipeline? Is there something I could try to make it train?

    Thanks

    opened by theoden8 5
  • RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

    RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/gradio/routes.py", line 275, in predict
    	output = await app.blocks.process_api(body, username, session_state)
      File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 274, in process_api
    	predictions = await run_in_threadpool(block_fn.fn, *processed_input)
      File "/usr/local/lib/python3.8/dist-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    	return await anyio.to_thread.run_sync(func, *args)
      File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 31, in run_sync
    	return await get_asynclib().run_sync_in_worker_thread(
      File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    	return await future
      File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    	result = context.run(func, *args)
      File "/usr/local/lib/python3.8/dist-packages/gradio/interface.py", line 500, in <lambda>
    	lambda *args: self.run_prediction(args)[0]
      File "/usr/local/lib/python3.8/dist-packages/gradio/interface.py", line 682, in run_prediction
    	prediction = predict_fn(*processed_input)
      File "aet_demo.py", line 60, in transfer
    	src_model = SSLDualLightningModule(config).load_from_checkpoint(
      File "/root/ssl_speech_restoration/lightning_module.py", line 623, in __init__
    	super().__init__(config)
      File "/root/ssl_speech_restoration/lightning_module.py", line 307, in __init__
    	self.vocoder = load_vocoder(config)
      File "/root/ssl_speech_restoration/utils.py", line 44, in load_vocoder
    	vocoder.load_state_dict(torch.load(config["general"]["hifigan_path"])["generator"])
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 608, in load
    	return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 787, in _legacy_load
    	result = unpickler.load()
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 743, in persistent_load
    	deserialized_objects[root_key] = restore_location(obj, location)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 175, in default_restore_location
    	result = fn(storage, location)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 151, in _cuda_deserialize
    	device = validate_cuda_device(location)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 135, in validate_cuda_device
    	raise RuntimeError('Attempting to deserialize object on a CUDA '
    RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
    

    that is when upload a sample file with spanish

    no-issue-activity 
    opened by johnfelipe 3
  • No versions in requirements.txt

    No versions in requirements.txt

    Hello. Thanks for publishing your code and checkpoints 😃

    I've come across the following error

    dataset.py:145: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
    

    Although this warning disappears when you add dtype=object, I came across another problem later on and was unable to get the system running.

    My suggestion is to add version numbers for each dependency in requirements.txt. That way, we can know which versions of each library form a working solution, and the code will continue to work in the future after libraries have changed.

    opened by chrisbaume 1
  • quality of restored speech not good

    quality of restored speech not good

    Hi

    I tried the Hugging face demo on my wav file but the quality is not good. Is it because the vocoder is trained on Japanese corpus. Is there a general speech restoration model?

    opened by sciai-ai 1
Owner
Takaaki Saeki
Ph.D. Student @ UTokyo / Spoken Language Processing
Takaaki Saeki
Bypass Instagram SSL pinning on Android devices.

Instagram SSL Pinning Bypass Bypass Instagram SSL pinning on Android devices. Supported ABIs: x86, x86_64, armeabi-v7a, arm64-v8a Patched APK (No Root

Eltion Musa 95 Nov 21, 2022
This api returns the ssl expiration date of the hostname.

ssl-check-api This api returns the ssl expiration date of the hostname. Default port 443. Request Url: https://79w38kh2pa.execute-api.eu-central-1.ama

Ali İlteriş Keskin 6 Jun 27, 2022
🔏 A cron job to automatically renew the SSL certificate of your Porkbun domain

porkcron A cron job to automatically renew the SSL certificate of your Porkbun domain ?? About porkcron is a simple alternative to certbot. If you own

Tom 1 Sep 29, 2022
A python script that will verify SSL/TLS hosts verificate expiration date

Certificate-Bot It's a python script that checks for SSL/TLS hosts certificate validity If there's a certificate about to expire it will send a slack

Alexandru Duzsardi 4 Oct 6, 2022
Bypass Tiktok SSL pinning on Android devices.

Tiktok SSL Pinning Bypass Bypass Tiktok SSL pinning on Android devices. Supported ABIs: armeabi-v7a, arm64-v8a Latest version: v25.9.4 If you like thi

Eltion Musa 91 Nov 24, 2022
ja3-server based on nginx-ssl-ja3 and fastapi

ja3-server Thanks to nginx-ssl-ja3, we can obtain ja3 information without invading existing services. Refer to nginx-ssl-ja3 to configure nginx and fo

vvanglro 3 Nov 9, 2022
An open source algorithm to generate a 3D model of the women bra. This allows ONEBra to build personalized/customized cups for symmetric breast restoration starting from a simple 3D photo.

ONEBra English intro An open source algorithm to generate a 3D model of the women bra. This allows ONEBra to build personalized/customized cups for sy

null 1 May 2, 2022
The state-of-the-art image restoration model without nonlinear activation functions.

NAFNet: Nonlinear Activation Free Network for Image Restoration The official pytorch implementation of the paper Simple Baselines for Image Restoratio

MEGVII Research 957 Nov 21, 2022
A Simple framework for image restoration, it includes ECBSR, ELAN and other SOTAs.

SimpleIR A simple framework for image restoration, including training, testing and deploying. Docs and more features will come soon! Citation If Simpl

xindong zhang 37 Nov 7, 2022
D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration, ECCV 2022

D2HNet 1 Introduction This project is a night image restoration framework called D2HNet by jointly denoising and deblurring successively captured long

Yuzhi ZHAO 44 Nov 16, 2022