PercepNet (Still need to be tuned) Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech Compared with https://github.com/jzi040941/PercepNet , this version is implemented using Keras. ---------------------------------------------------------- Due to github file size limit is 100M, rnn_data.c is compressed to rnn_data.c.tgz. This file need to be extracted before furthur compileing. % cd src % tar -xzvf rnn_data.c.tgz % cd .. To compile, just type: % ./autogen.sh % ./configure % make A simple command-line tool is provided as an example. It operates on RAW 16-bit (machine endian) mono PCM files sampled at 48 kHz. It can be used as: ./examples/rnnoise_demo <noisy speech> <output denoised> The output is also a 16-bit raw PCM file. ------------------------------------------------------------ How to train: (change to src subdirectory, assumed the clean and noise files's directory are in ~/DNS-Challenge/datasets/rnnoise3/) cd ~/percepnet/src ./denoise_training ~/DNS-Challenge/datasets/rnnoise3/clean ~/DNS-Challenge/datasets/rnnoise3/noise 80000000 training.f32 (change to training subdirectory) cd ../training python bin2hdf5.py …/src/training.f32 80000000 138 training.h5 python rnn_train.py python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig cp rnn_data.c ../src/ (change to percepnet directory) cd ~/percepnet/ make clean make (change to example subdirectory) cd examples ./rnnoise_demo test2.raw test2_denoised.raw ---------------------------------------------------------------- More: The performance of this version needs furthur optimization and tuning. In some cases, it is worse than Rnnoise. Any comments on how to optimize/tune are welcome. test_gr in src/ is used to test the classical processing, using computed g, r (not from deep learning model) directly to check whether g,r work not not. The overall framework is based on https://github.com/xiph/rnnoise And the speech signal processing codes are from https://github.com/jzi040941/PercepNet Compared with Rnnoise, the training data is standardized, and use float(not quantized) when conver to rnn_data.c. And during training, clip_norm is set to 0.1, or loss will be NAN. The training data are from: https://github.com/microsoft/DNS-Challenge Wavfiles processing codes(wav.h, wav.c) are from https://faculty.fiu.edu/~wgillam/wavfiles.html
percepnet implemented using Keras, still need to be optimized and tuned.
Overview
Comments
-
Confuse regarding 2 different rnn_data.h
Hi cookcodes,
Did you use the self-generated
rnn_data.h
when running this project?Recently I tried to build and train keras model myself, and when I dumped model using
dump_rnn_float.py
, I found that the rnn_data.h generated by commandpython dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig
is quite different from the rnn_data.h in /src. And apparently the one in /src is much like the correct one.So I was wondering what's the purpose/use of this self-generated
rnn_data.h
? Did you use this file in this project? Thanks! -
Any recommendation for deploying PercepNet model to web browser ?
Thank you for very useful source code. I'm a noob web developer, but I need to deploy your PercepNet model to my web browser (client-side). Which are the feasible ways to do that?
-
Raise Nan error while making data using 16k speech/noise
Hi, thanks for your work, I try to use the 16k sampling rate data (DNS) to training. I modify the high frequency in erbband.c from 20000 to 8000 and change the frame size from 480 to 160. I can success to compile the code using compile.sh, and run this command.
./denoise_training ~/speech_folder ~/noise_folder 100 training.f32
But encounter the Nan problem when create the dataset, the log is shown below,
alpha[30] is NAN, a= nan, r[30] is NANalpha[31] is NAN, a= nan, r[31] is NANalpha[32] is NAN, a= nan, r[32] is NANalpha[33] is NAN, a= nan, r[33] is NANEphaty[30] is Nan detected.
r[30] is Nan detected
Ephaty[31] is Nan detected
Is there anything I miss? Thank you
-
Hi, a question about computer_band_energy()
computer_band_energy() in Percepnet is like this:
`ERBBand *erb_band = new ERBBand(WINDOW_SIZE, NB_BANDS-2, 0, 20000);
void compute_band_energy(float bandE, const kiss_fft_cpx X) { int i; float sum[NB_BANDS] = {0}; for (i=0;i<NB_BANDS;i++) { int j; int band_size; band_size = (erb_band->nfftborder[i+1]-erb_band->nfftborder[i]); for (j=0;j<band_size;j++) { float tmp; float frac = (float)j/band_size; tmp = SQUARE(X[(erb_band->nfftborder[i]) + j].r); tmp += SQUARE(X[(erb_band->nfftborder[i]) + j].i); sum[i] += (1-frac)tmp; sum[i+1] += fractmp; } / //ERBBand cosfilter is not working in interp_gain int low_nfft_idx = erb_band->filters[i].first.first; int high_nfft_idx = erb_band->filters[i].first.second; for(j=low_nfft_idx; j<high_nfft_idx; j++){ float tmp; tmp = SQUARE(X[j].r); tmp += SQUARE(X[j].i); sum[i] += tmp*erb_band->filters[i].second[j-low_nfft_idx]; } **/ } sum[0] *= 2; sum[NB_BANDS-1] *= 2; for (i=0;i<NB_BANDS;i++) { bandE[i] = sum[i]; } }`
and yours like this:
` void compute_band_energy(float *bandE, const kiss_fft_cpx X) { int i; float sum[NB_BANDS] = {0}; for (i=0;i<NB_BANDS;i++) { int j; int low_nfft_idx = erbweights[i].begin; int high_nfft_idx = erbweights[i].end; for(j=low_nfft_idx; j<high_nfft_idx; j++){ float tmp; tmp = SQUARE(X[j].r); tmp += SQUARE(X[j].i); sum[i] += tmperbweights[i].weights[j-low_nfft_idx]; } }
for (i=0;i<NB_BANDS;i++) { bandE[i] = sum[i]; } } `
My question is: the comment of compute_band_energy() in Percepnet write that ERBBand cosfilter is not working in interp_gain, is it correct? if it's correct why do you still use cos filter? thanks very much
-
Pitch filtering question
Hi cookcodes! Thanks for sharing your code with us! In the PercepNet paper, there is a scale bands module after pitch filtering which uses the pitch enhanced signal z and the postfilter output Xb_hat. However, there has few details about this module in paper. I want to know what your idea is about the scale bands module. By the way, could you please share the msse value of gb and rb after convergence in your experiments?
-
Segmentation fault when extracting feature and raise ERROR when training
Hi, thanks for your work, I encounter two problems:
- I set count as 10000000, and use original 48K speech and noise as dataset. the output looks like this:
x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:1 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:2 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:3 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:4 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:5 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:6 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:7 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:8 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:9 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:10 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:11 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:12 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:13 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:14 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:15 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:16 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:17 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:18 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:19 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:20 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:21 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:22 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:23 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:24 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:25 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:26 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:27 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:28 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:29 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:30 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:31 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:32 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:33 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:34 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:35 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:36 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:37 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:38 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:39 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:40 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:41 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:42 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:43 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:44 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:45 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:46 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:47 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:48 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:49 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:50 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:51 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:52 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:53 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:54 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:55 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:56 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:57 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:58 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:59 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:60 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:61 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:62 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:63 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:64 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:65 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:66 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:67 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:68 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:69 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:70 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:71 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:72 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:73 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:74 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:75 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:76 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:77 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:78 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:79 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:80 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:81 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:82 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:83 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:84 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:85 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:86 x_lp[0] is Nan detected after pitch_downsample, so not filtered. total count:87 Segmentation fault (core dumped)
I retry some times, segmentation fault occurs every time after total count achieve 87. I have no idea about that. I set count as 1000000, it works as normal, so I use this config to train model and encounter the second problem.
- when I run rnn_train.py (tensorflow-gpu 2.5.0), raise error like this:
Traceback (most recent call last): File "rnn_train.py", line 206, in <module> callbacks=[checkpoint_cb], File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1122, in fit steps_per_execution=self._steps_per_execution) File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/data_adapter.py", line 1348, in get_data_handler return DataHandler(*args, **kwargs) File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/data_adapter.py", line 1136, in __init__ adapter_cls = select_data_adapter(x, y) File "/home/edev/.local/lib/python3.6/site-packages/keras/engine/data_adapter.py", line 978, in select_data_adapter _type_name(x), _type_name(y))) ValueError: Failed to find data adapter that can handle input: <class '__main__.CustomDataGen'>, <class 'NoneType'>
can you please tell me your TensorFlow version?
update: I changed some code and fixed the second problem, however, only CPU can be used to train, when I use GPU, raising error:
Unknown: CUDNN_STATUS_BAD_PARAM
can you please tell me if you can use GPU to train?
-
performance
- As you mention that some performance is worse than rnnoise, can you post some examples ?
- The DNS-chanllenge dataset is the master branch's fullband data, interspeech2020 or interspeech2021 ? There is some evaluation?
Header-only library for using Keras models in C++.
frugally-deep Use Keras models in C++ with ease Table of contents Introduction Usage Performance Requirements and Installation FAQ Introduction Would
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop
PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r
SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it simple and easy for physicists to implement lattice QCD formulas while still providing the best possible performance.
SIMULATeQCD a SImple MUlti-GPU LATtice code for QCD calculations SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it simple and easy for ph
We implemented our own sequential version of GA, PSO, SA and ACA using C++ and the parallelized version with CUDA support
We implemented our own sequential version of GA, PSO, SA and ACA using C++ (some using Eigen3 as matrix operation backend) and the parallelized version with CUDA support. All of them are much faster than the popular lib scikit-opt.
Raytracer implemented with CPU and GPU using CUDA
Raytracer This is a training project aimed at learning ray tracing algorithm and practicing convert sequential CPU code into a parallelized GPU code u
This repository is a tensorrt deployment of the onsets and frames model, which is implemented using pytorch.
Onsets and Frames TensorRT inference This repository is a tensorrt deployment of the onsets and frames model, which is implemented using pytorch (http
A simple facial recognition script using OpenCV's FaceRecognizer module implemented in C++
Local Binary Patterns Histogram Recognizer A proyect that implements the LBPHRecognizer class of the OpenCV library to determine if a detected face co
copc-lib provides an easy-to-use interface for reading and creating Cloud Optimized Point Clouds
copc-lib copc-lib is a library which provides an easy-to-use reader and writer interface for COPC point clouds. This project provides a complete inter
FoxRaycaster, optimized, fixed and with a CUDA option
Like FoxRaycaster(link) but with a nicer GUI, bug fixes, more optimized and with CUDA. Used in project: Code from FoxRaycaster, which was based on thi
Optimized & Generic ML Filter Runtimes for VapourSynth (with builtin support for waifu2x, RealESRGANv2 & DPIR)
vs-mlrt VapourSynth ML filter runtimes. Please see the wiki for supported models. vsov: OpenVINO-based Pure CPU Runtime OpenVINO is an AI inference ru
ncnn is a high-performance neural network inference framework optimized for the mobile platform
ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme
Advent of Code 2021 optimized solutions in C++
advent2021-fast These solutions are a work in progress. Advent of Code 2021 optimized C++ solutions. Here are the timings from an example run on an i9
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.
BM3D denoising filter for VapourSynth, implemented in CUDA
VapourSynth-BM3DCUDA Copyright© 2021 WolframRhodium BM3D denoising filter for VapourSynth, implemented in CUDA Description Please check VapourSynth-BM
An R3D network implemented with TensorRT
r3d_TensorRT An r3d network implemented with TensorRT8.x, The weight of the model comes from PyTorch. A description of the models in Pytroch can be fo
The code implemented in ROS projects a point cloud obtained by a Velodyne VLP16 3D-Lidar sensor on an image from an RGB camera.
PointCloud on Image The code implemented in ROS projects a point cloud obtained by a Velodyne VLP16 3D-Lidar sensor on an image from an RGB camera. Th
The optical flow algorithm RAFT implemented with C++(Libtorch+TensorRT)
RAFT_CPP Attention/注意 There are some bug here,output the wrong result 代码存在bug,估计出来的光流值不准确,解决中 Quick Start 0.Export RAFT onnx model 首先加载训练完成的模型权重: pars
A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™ using Deepstream SDK.
Using NVIDIA Pre-trained Models and Transfer Learning Toolkit 3.0 to Create Gesture-based Interactions with a Robot In this project, we demonstrate ho
ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models.
Just to test for my research, and I add coordinate transformation to evaluate the ORB_SLAM3. Only applied in research, and respect the authors' all work.