percepnet implemented using Keras, still need to be optimized and tuned.

Overview
PercepNet (Still need to be tuned)

Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech

Compared with https://github.com/jzi040941/PercepNet , this version is implemented using Keras.

----------------------------------------------------------
Due to github file size limit is 100M, rnn_data.c is compressed to rnn_data.c.tgz. 
This file need to be extracted before furthur compileing.
% cd src
% tar -xzvf rnn_data.c.tgz
% cd ..


To compile, just type:
% ./autogen.sh
% ./configure
% make

A simple command-line tool is
provided as an example. It operates on RAW 16-bit (machine endian) mono
PCM files sampled at 48 kHz. It can be used as:

./examples/rnnoise_demo <noisy speech> <output denoised>

The output is also a 16-bit raw PCM file.

------------------------------------------------------------

How to train:
(change to src subdirectory, assumed the clean and noise files's directory are in ~/DNS-Challenge/datasets/rnnoise3/)
cd ~/percepnet/src
./denoise_training ~/DNS-Challenge/datasets/rnnoise3/clean  ~/DNS-Challenge/datasets/rnnoise3/noise 80000000 training.f32

(change to training subdirectory)
cd ../training 
python bin2hdf5.py …/src/training.f32 80000000 138 training.h5
python rnn_train.py
python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig
cp rnn_data.c ../src/   

(change to percepnet directory)
cd ~/percepnet/
make clean
make

(change to example subdirectory)
cd examples 
./rnnoise_demo test2.raw test2_denoised.raw

----------------------------------------------------------------
More:
The performance of this version needs furthur optimization and tuning.
In some cases, it is worse than Rnnoise.
Any comments on how to optimize/tune are welcome.

test_gr in src/ is used to test the classical processing, using computed g, r (not from deep learning model) directly to check 
whether g,r work not not.

The overall framework is based on https://github.com/xiph/rnnoise
And the speech signal processing codes are from  https://github.com/jzi040941/PercepNet
Compared with Rnnoise, the training data is standardized, and use float(not quantized) when conver to rnn_data.c.
And during training, clip_norm is set to 0.1, or loss will be NAN.

The training data are from:
https://github.com/microsoft/DNS-Challenge

Wavfiles processing codes(wav.h, wav.c) are from https://faculty.fiu.edu/~wgillam/wavfiles.html
Comments
  • Confuse regarding 2 different rnn_data.h

    Confuse regarding 2 different rnn_data.h

    Hi cookcodes,

    Did you use the self-generated rnn_data.h when running this project?

    Recently I tried to build and train keras model myself, and when I dumped model using dump_rnn_float.py, I found that the rnn_data.h generated by command python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig is quite different from the rnn_data.h in /src. And apparently the one in /src is much like the correct one.

    So I was wondering what's the purpose/use of this self-generated rnn_data.h? Did you use this file in this project? Thanks!

    opened by OscarLiau 2
  • Any recommendation for deploying PercepNet model to web browser ?

    Any recommendation for deploying PercepNet model to web browser ?

    Thank you for very useful source code. I'm a noob web developer, but I need to deploy your PercepNet model to my web browser (client-side). Which are the feasible ways to do that?

    opened by tungedng2710 0
  • Raise Nan error while making data using 16k speech/noise

    Raise Nan error while making data using 16k speech/noise

    Hi, thanks for your work, I try to use the 16k sampling rate data (DNS) to training. I modify the high frequency in erbband.c from 20000 to 8000 and change the frame size from 480 to 160. I can success to compile the code using compile.sh, and run this command. ./denoise_training ~/speech_folder ~/noise_folder 100 training.f32

    But encounter the Nan problem when create the dataset, the log is shown below,

    alpha[30] is NAN, a= nan, r[30] is NANalpha[31] is NAN, a= nan, r[31] is NANalpha[32] is NAN, a= nan, r[32] is NANalpha[33] is NAN, a= nan, r[33] is NANEphaty[30] is Nan detected. r[30] is Nan detected Ephaty[31] is Nan detected

    Is there anything I miss? Thank you

    opened by aaronhsueh0506 0
  • Hi, a question about computer_band_energy()

    Hi, a question about computer_band_energy()

    computer_band_energy() in Percepnet is like this:

    `ERBBand *erb_band = new ERBBand(WINDOW_SIZE, NB_BANDS-2, 0, 20000);

    void compute_band_energy(float bandE, const kiss_fft_cpx X) { int i; float sum[NB_BANDS] = {0}; for (i=0;i<NB_BANDS;i++) { int j; int band_size; band_size = (erb_band->nfftborder[i+1]-erb_band->nfftborder[i]); for (j=0;j<band_size;j++) { float tmp; float frac = (float)j/band_size; tmp = SQUARE(X[(erb_band->nfftborder[i]) + j].r); tmp += SQUARE(X[(erb_band->nfftborder[i]) + j].i); sum[i] += (1-frac)tmp; sum[i+1] += fractmp; } / //ERBBand cosfilter is not working in interp_gain int low_nfft_idx = erb_band->filters[i].first.first; int high_nfft_idx = erb_band->filters[i].first.second; for(j=low_nfft_idx; j<high_nfft_idx; j++){ float tmp; tmp = SQUARE(X[j].r); tmp += SQUARE(X[j].i); sum[i] += tmp*erb_band->filters[i].second[j-low_nfft_idx]; } **/ } sum[0] *= 2; sum[NB_BANDS-1] *= 2; for (i=0;i<NB_BANDS;i++) { bandE[i] = sum[i]; } }`

    and yours like this:

    ` void compute_band_energy(float *bandE, const kiss_fft_cpx X) { int i; float sum[NB_BANDS] = {0}; for (i=0;i<NB_BANDS;i++) { int j; int low_nfft_idx = erbweights[i].begin; int high_nfft_idx = erbweights[i].end; for(j=low_nfft_idx; j<high_nfft_idx; j++){ float tmp; tmp = SQUARE(X[j].r); tmp += SQUARE(X[j].i); sum[i] += tmperbweights[i].weights[j-low_nfft_idx]; } }

    for (i=0;i<NB_BANDS;i++) { bandE[i] = sum[i]; } } `

    My question is: the comment of compute_band_energy() in Percepnet write that ERBBand cosfilter is not working in interp_gain, is it correct? if it's correct why do you still use cos filter? thanks very much

    opened by YangangCao 4
  • Pitch filtering question

    Pitch filtering question

    Hi cookcodes! Thanks for sharing your code with us! In the PercepNet paper, there is a scale bands module after pitch filtering which uses the pitch enhanced signal z and the postfilter output Xb_hat. However, there has few details about this module in paper. I want to know what your idea is about the scale bands module. By the way, could you please share the msse value of gb and rb after convergence in your experiments? image

    opened by qijiajun 2
  • Segmentation fault when extracting feature and raise ERROR when training

    Segmentation fault when extracting feature and raise ERROR when training

    Hi, thanks for your work, I encounter two problems:

    1. I set count as 10000000, and use original 48K speech and noise as dataset. the output looks like this:
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:1
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:2
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:3
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:4
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:5
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:6
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:7
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:8
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:9
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:10
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:11
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:12
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:13
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:14
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:15
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:16
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:17
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:18
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:19
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:20
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:21
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:22
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:23
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:24
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:25
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:26
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:27
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:28
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:29
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:30
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:31
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:32
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:33
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:34
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:35
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:36
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:37
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:38
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:39
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:40
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:41
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:42
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:43
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:44
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:45
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:46
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:47
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:48
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:49
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:50
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:51
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:52
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:53
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:54
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:55
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:56
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:57
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:58
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:59
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:60
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:61
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:62
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:63
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:64
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:65
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:66
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:67
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:68
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:69
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:70
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:71
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:72
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:73
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:74
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:75
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:76
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:77
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:78
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:79
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:80
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:81
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:82
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:83
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:84
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:85
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:86
    x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:87
    Segmentation fault (core dumped)
    

    I retry some times, segmentation fault occurs every time after total count achieve 87. I have no idea about that. I set count as 1000000, it works as normal, so I use this config to train model and encounter the second problem.

    1. when I run rnn_train.py (tensorflow-gpu 2.5.0), raise error like this:
    Traceback (most recent call last):
      File "rnn_train.py", line 206, in <module>
        callbacks=[checkpoint_cb],
      File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1122, in fit
        steps_per_execution=self._steps_per_execution)
      File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/data_adapter.py", line 1348, in get_data_handler
        return DataHandler(*args, **kwargs)
      File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/data_adapter.py", line 1136, in __init__
        adapter_cls = select_data_adapter(x, y)
      File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/data_adapter.py", line 978, in select_data_adapter
        _type_name(x), _type_name(y)))
    ValueError: Failed to find data adapter that can handle input: <class '__main__.CustomDataGen'>, <class 'NoneType'>
    

    can you please tell me your TensorFlow version?

    update: I changed some code and fixed the second problem, however, only CPU can be used to train, when I use GPU, raising error: Unknown: CUDNN_STATUS_BAD_PARAM

    can you please tell me if you can use GPU to train?

    opened by YangangCao 2
  • performance

    performance

    1. As you mention that some performance is worse than rnnoise, can you post some examples ?
    2. The DNS-chanllenge dataset is the master branch's fullband data, interspeech2020 or interspeech2021 ? There is some evaluation?
    opened by deyituo 1
Owner
cookcodes
cookcodes
Header-only library for using Keras models in C++.

frugally-deep Use Keras models in C++ with ease Table of contents Introduction Usage Performance Requirements and Installation FAQ Introduction Would

Tobias Hermann 926 Dec 30, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

null 3.2k Dec 28, 2022
SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it simple and easy for physicists to implement lattice QCD formulas while still providing the best possible performance.

SIMULATeQCD a SImple MUlti-GPU LATtice code for QCD calculations SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it simple and easy for ph

null 12 Nov 30, 2022
We implemented our own sequential version of GA, PSO, SA and ACA using C++ and the parallelized version with CUDA support

We implemented our own sequential version of GA, PSO, SA and ACA using C++ (some using Eigen3 as matrix operation backend) and the parallelized version with CUDA support. All of them are much faster than the popular lib scikit-opt.

Aron751 4 May 7, 2022
Raytracer implemented with CPU and GPU using CUDA

Raytracer This is a training project aimed at learning ray tracing algorithm and practicing convert sequential CPU code into a parallelized GPU code u

Alex Kotovsky 2 Nov 29, 2021
This repository is a tensorrt deployment of the onsets and frames model, which is implemented using pytorch.

Onsets and Frames TensorRT inference This repository is a tensorrt deployment of the onsets and frames model, which is implemented using pytorch (http

Xianke Wang 6 Jan 13, 2022
A simple facial recognition script using OpenCV's FaceRecognizer module implemented in C++

Local Binary Patterns Histogram Recognizer A proyect that implements the LBPHRecognizer class of the OpenCV library to determine if a detected face co

Pablo Agustín Ortega-Kral 0 Jan 18, 2022
copc-lib provides an easy-to-use interface for reading and creating Cloud Optimized Point Clouds

copc-lib copc-lib is a library which provides an easy-to-use reader and writer interface for COPC point clouds. This project provides a complete inter

Rock Robotic 25 Nov 29, 2022
FoxRaycaster, optimized, fixed and with a CUDA option

Like FoxRaycaster(link) but with a nicer GUI, bug fixes, more optimized and with CUDA. Used in project: Code from FoxRaycaster, which was based on thi

Błażej Roszkowski 2 Oct 21, 2021
Optimized & Generic ML Filter Runtimes for VapourSynth (with builtin support for waifu2x, RealESRGANv2 & DPIR)

vs-mlrt VapourSynth ML filter runtimes. Please see the wiki for supported models. vsov: OpenVINO-based Pure CPU Runtime OpenVINO is an AI inference ru

私立七森中ごらく部 73 Dec 20, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 16.2k Jan 5, 2023
Advent of Code 2021 optimized solutions in C++

advent2021-fast These solutions are a work in progress. Advent of Code 2021 optimized C++ solutions. Here are the timings from an example run on an i9

Andrew Skalski 10 Dec 15, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Jan 3, 2023
BM3D denoising filter for VapourSynth, implemented in CUDA

VapourSynth-BM3DCUDA Copyright© 2021 WolframRhodium BM3D denoising filter for VapourSynth, implemented in CUDA Description Please check VapourSynth-BM

null 54 Dec 16, 2022
An R3D network implemented with TensorRT

r3d_TensorRT An r3d network implemented with TensorRT8.x, The weight of the model comes from PyTorch. A description of the models in Pytroch can be fo

null 2 Nov 7, 2021
The code implemented in ROS projects a point cloud obtained by a Velodyne VLP16 3D-Lidar sensor on an image from an RGB camera.

PointCloud on Image The code implemented in ROS projects a point cloud obtained by a Velodyne VLP16 3D-Lidar sensor on an image from an RGB camera. Th

Edison Velasco Sánchez 5 Aug 12, 2022
The optical flow algorithm RAFT implemented with C++(Libtorch+TensorRT)

RAFT_CPP Attention/注意 There are some bug here,output the wrong result 代码存在bug,估计出来的光流值不准确,解决中 Quick Start 0.Export RAFT onnx model 首先加载训练完成的模型权重: pars

ChenJianqu 21 Dec 29, 2022
ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models.

Just to test for my research, and I add coordinate transformation to evaluate the ORB_SLAM3. Only applied in research, and respect the authors' all work.

B.X.W 5 Jul 11, 2022