kaldi-asr/kaldi is the official location of the Kaldi project.

Overview

Build Status Gitpod Ready-to-Code Kaldi Speech Recognition Toolkit

To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux; Darwin; and Cygwin (has not been tested on more "exotic" varieties of UNIX). For Windows installation instructions (excluding Cygwin), see windows/INSTALL.

To run the example system builds, see egs/README.txt

If you encounter problems (and you probably will), please do not hesitate to contact the developers (see below). In addition to specific questions, please let us know if there are specific aspects of the project that you feel could be improved, that you find confusing, etc., and which missing features you most wish it had.

Kaldi information channels

For HOT news about Kaldi see the project site.

Documentation of Kaldi:

  • Info about the project, description of techniques, tutorial for C++ coding.
  • Doxygen reference of the C++ code.

Kaldi forums and mailing lists:

We have two different lists

  • User list kaldi-help
  • Developer list kaldi-developers:

To sign up to any of those mailing lists, go to http://kaldi-asr.org/forums.html:

Development pattern for contributors

  1. Create a personal fork of the main Kaldi repository in GitHub.
  2. Make your changes in a named branch different from master, e.g. you create a branch my-awesome-feature.
  3. Generate a pull request through the Web interface of GitHub.
  4. As a general rule, please follow Google C++ Style Guide. There are a few exceptions in Kaldi. You can use the Google's cpplint.py to verify that your code is free of basic mistakes.

Platform specific notes

PowerPC 64bits little-endian (ppc64le)

Android

  • Kaldi supports cross compiling for Android using Android NDK, clang++ and OpenBLAS.
  • See this blog post for details.
Issues
  • WIP: Multi-database English LVCSR recipe

    WIP: Multi-database English LVCSR recipe

    See #699.

    opened by guoguo12 71
  • Wake-word detection

    Wake-word detection

    Results of the regular LF-MMI based recipes:

    Mobvoi: EER=~0.2%, FRR=1.02% at FAH=1.5 vs. FRR=3.8% at FAH=1.5 (Mobvoi paper)

    SNIPS: EER=~0.1%, FRR=0.08% at FAH=0.5 vs. FRR=0.12% at FAH=0.5 (SNIPS paper)

    E2E LF-MMI recipes are still being run to confirm the reproducibility of the previous results.

    opened by freewym 67
  • show L2 norm of parameters during training.

    show L2 norm of parameters during training.

    In addition, set affine to false for batchnorm layers and switch to SGD optimizer.

    The training is still running and a screenshot of the L2-norms of the training parameters is as follows:

    Screen Shot 2020-02-12 at 09 05 51

    I will post the decoding results once it is done.

    opened by csukuangfj 67
  • Multilingual using modified configs

    Multilingual using modified configs

    This is a modified multilingual setup based on new xconfig and training scripts. In this setup, xconfig used to create network configuration for multilingual training. Also the egs generation moved out of training script and multilingual egs dir passed to train_raw_dnn.py. Also a new script added for average posterior computation and prior adjustment.

    opened by pegahgh 65
  • CUDA context creation problem in nnet3 training with

    CUDA context creation problem in nnet3 training with "--use-gpu=wait" option

    I am not sure if this is a Kaldi issue but I thought someone might have an idea.

    First some context. I am trying to tune a few TDNN chain models on a workstation with 2 Maxwell Titan X 12GB cards. The data sets I am working with are fairly small (Babel full language packs with 40-80 hours audio). Initially I set the number of initial and final training jobs to 2 and trained the models with scripts adapted from babel and swbd recipes. While this worked without any issues, I noticed that the models were overtraining, so I tried tuning relu-dim, number of epochs and xent-regularize with one of the language packs to see if I could get a better model. Eventually the best model I got was with a single epoch and xent-regularize=0.25 (WER base model: 45.5% vs best model: 41.4%). To see if the training schedule might have any further effects on the model performance, I also tried training with --num-jobs-initial=2, --num-jobs-final=8 after setting the GPUs to "default" compute mode to allow the creation of multiple CUDA contexts. I added 2 seconds delay between individual jobs so that earlier jobs would start allocating device memory before a new job is scheduled on the device with the largest amount of free memory. This mostly worked fine, except towards the end when 8 jobs were distributed 5-3 between the two cards. The resulting model had 40.9% WER after 2 epochs and the log probability difference between the train and validation sets was also smaller than before. It seems like the training schedule (number of jobs, learning rate, etc. at each iteration) has an effect on the model performance in this small data scenario. Maybe averaging gradients across a larger number of jobs is beneficial, or the learning rate schedule is somehow tuned for this type of training schedule.

    Now the actual problem. Since large number of jobs seemed to work better for me, I wanted to remove the job delay hack, set GPUs back to "exclusive process" compute mode and take advantage of the --use-gpu=wait option while scheduling the training jobs. However, it seems like I am missing something. If I launch multiple training processes with the --use-gpu=wait option while GPUs are in "exclusive process" compute mode, only one process can create a CUDA context on a given GPU card even after that one process completes. My expectation was that other processes would wait for the GPUs to be available and then one by one acquire the GPUs and complete their work. I added a few debug statements to GetCudaContext function to see what the problem was. cudaDeviceSynchronize call returns "all CUDA-capable devices are busy or unavailable" even after processes running on the GPUs are long gone. Any ideas?

    opened by dogancan 63
  • Modify TransitionModel for more compact chain-model graphs

    Modify TransitionModel for more compact chain-model graphs

    Place holder for addressing #1031 . WIP log:

    1. self_loop_pdf_class added to HmmState, done
    2. self_loop_pdf added to Tuple in TransitionModel. done
    3. another branch of ContextDependencyInterface::GetPdfInfo. ugly done
    4. create test code for new structures. done
    5. back compatability for all read code. done
    6. normal HMM validation using RM. done
    7. chain code modification. done
    8. chain validation using RM. done
    9. iterate 2nd version of GetPdfInfo. done
    10. documents and comments. tbd...
    opened by naxingyu 63
  • add PyTorch's DistributedDataParallel training.

    add PyTorch's DistributedDataParallel training.

    support distributed training across multiple GPUs.

    TODOs:

    • there are lots of code duplicates

    Part of the training log

    2020-02-19 13:55:10,646 INFO [ddp_train.py:160] Device (1) processing 1100/4724(23.285351%) global average objf: -0.225449 over 6165760.0 frames, current batch average objf: -0.130735 over 6400 frames, epoch 0
    2020-02-19 13:55:55,251 INFO [ddp_train.py:160] Device (0) processing 1200/4724(25.402202%) global average objf: -0.216779 over 6732672.0 frames, current batch average objf: -0.123979 over 3840 frames, epoch 0
    2020-02-19 13:55:55,252 INFO [ddp_train.py:160] Device (1) processing 1200/4724(25.402202%) global average objf: -0.216412 over 6738176.0 frames, current batch average objf: -0.132368 over 4736 frames, epoch 0
    

    The training seems working.

    opened by csukuangfj 62
  • Is there any speaker diarization documentation and already trained model?

    Is there any speaker diarization documentation and already trained model?

    Hi there, thanks for Kaldi :)

    I want to perform speaker diarization on a set of audio recordings. I believe Kaldi recently added the speaker diarization feature. I have managed to find this link, however, I have not been able to figure out how to use it since there is very little documentation. Also, may I ask is there any already trained model on conversions in English that I can use off-the-shelf, please?

    Thanks a lot!

    opened by bwang482 61
  • WIP: add TDNNF to pytorch.

    WIP: add TDNNF to pytorch.

    We are trying to replace TDNN with TDNNF in kaldi pybind training with PyTorch.

    opened by csukuangfj 60
  • How do I train a PLDA model on custom data?

    How do I train a PLDA model on custom data?

    You only need to use one of those PLDA models for your system. Also, if you have enough in-domain training data, you'll have better results training a new PLDA model. If your data is wideband microphone data, you might even have better luck using a different x-vector system, such as this one: http://kaldi-asr.org/models/m7. It was developed for speaker recognition, but it should work just fine for diarization as well.

    In the egs/callhome_diarization, we split the evaluation dataset into two halves so that we can use one half as a development set for the other half. Callhome is split into callhome1 and callhome2. We then train a PLDA backend (let's call it backend1) on callhome1, and tune the stopping threshold so that it minimizes the error on callhome1. Then backend1 is used to diarize callhome2. Next, we do the same thing for callhome2: backend2 is developed on callhome2, and evaluated on callhome1. The concatenation at the end is so that we can evaluate on the entire dataset. It doesn't matter that the two backends would assign different labels to different speakers, since they diarized different recordings.

    Regarding the short segment, I think the issue is that your SAD has determined that there's a speech segment from 24.99 to 25.43 and a separate speech segment starting at 25.51. It might be a good idea to smooth these SAD decisions earlier in the pipeline (e.g., in your SAD system itself) to avoid having adjacent segments with small gaps between them. Increasing the min-segment threshold might cause the diarization system to throw out this segment, but to me it seems preferable to keep it, and just merge it with the adjacent segment. But this stuff requires a lot of tuning to get right, and it's hard to say what the optimal strategy is without playing with the data myself.

    By the way, what is this "nasa_telescopes" dataset you're using?

    Originally posted by @david-ryan-snyder in https://github.com/kaldi-asr/kaldi/issues/2523#issuecomment-409597254

    opened by saumyaborwankar 0
  • remove sgmm2* from the bin target

    remove sgmm2* from the bin target

    this will finally stop everything depending on portaudio. sgmm2 and online are in the ext depend and the way depends are compiled for ext depend will pull in portaudio issues into normal compilation

    opened by jtrmal 1
  • link is broken to the

    link is broken to the "Generating exact lattices in the WFST framework" paper

    so in here: https://www.kaldi-asr.org/doc/lattices.html the link to the "Generating exact lattices in the WFST framework" paper is broken.

    I guess the link should be updated to go here instead: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/03/4213.pdf ?

    bug 
    opened by lordadamson 0
  • Issue in lattice determinization

    Issue in lattice determinization

    I believe (I hope this is not a duplicate) that there is an outstanding issue with the lattice-determinization, that, as I point out in this conversation: https://groups.google.com/d/msgid/kaldi-help/18ffcc4a-9228-464f-90e3-389a7050bba3n%40googlegroups.com .. it was created in the following commit: ee8ed396ae905bb8296b6c43c3aa50e89f4da3b9 ... unfortunately the code kind of needs cleaning up. It may simply be easier to revert that commit if we care about correctness; and if there is a speed issue, address it another way maybe. But I wouldn't want to cause any speed regressions. Any fix may need testing for speed.

    bug 
    opened by danpovey 0
  • Kaldi feedstock update on conda-forge

    Kaldi feedstock update on conda-forge

    Hello,

    I've recently updated the kaldi feedstock for conda-forge, and I was wondering if any Kaldi maintainers were interested in being added a maintainer there as well.

    To get it working with the conda-forge system, I made some changes to the CMake build system. To compile some notes on these:

    • Kaldi on conda-forge doesn't link to libfst.so, but rather to libfst.16.so (to allow for installation beside the OpenFst conda-forge package which is 1.8.1)
    • On Windows, OpenFst binaries are included as well as Kaldi binaries (OpenFst on Conda doesn't support Windows)
    • Removed building of online and online2 on Windows, as I was getting some compiler errors
    • Added support for building fgmmbin on all platforms

    I can make a PR for the ones that would be helpful generally (i.e. fgmmbin included in the CMake build).

    discussion 
    opened by mmcauliffe 0
  • Allow override cxx/ar/ranlib when host is set

    Allow override cxx/ar/ranlib when host is set

    Currently when HOST environment is set, CXX, AR and RANLIB are hardcoded. This change allows to redefine them with environment variables which follows default automake practice.

    opened by nshmyrev 1
  • [Fisher] Issue on semisup/run_100k.sh

    [Fisher] Issue on semisup/run_100k.sh

    Hi,

    I've come across two problems in the https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_english/s5/local/semisup/run_100k.sh recipe.

    1. On stage 7: When running local/fisher_train_lms_pocolm.sh I get an error becuase the number o n-grams of the dataset (100k) is smaller than the number of n-grams to prune:
    the num-ngrams(1544907) of input LM is less than the target-num-ngrams(5000000), can not do any pruning.
    
    1. On stage 10: the param --sup-lat-dir $exp_root/chain/tri4a_train_sup_unk_lats should be changed to --sup-lat-dir $exp_root/chain/tri4a_train_sup_sp_unk_lats which uses the sp version instead.

    R, Juan Pablo

    bug 
    opened by JuanPZuluaga 0
  • [script] fix gitpod open error

    [script] fix gitpod open error

    When I open gitpod I see this error File .gitpod.dockerfile does not exist in repository kaldi-asr/kaldi

    image

    This PR will fix this.

    opened by clarencetw 1
  • Kick portaudio install out of the main makefile

    Kick portaudio install out of the main makefile

    We already have tools/extras/install_portaudio.sh.

    If you need portaudio, install portaudio. Make dependencies are so messed up that portaudio is a dependency of lmbin. This makes no sense.

    bug 
    opened by kkm000 1
Owner
Kaldi
A state-of-the-art automatic speech recognition toolkit
Kaldi
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 74 Dec 4, 2021
Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor"

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 70 Nov 25, 2021
The official implementation of the research paper "DAG Amendment for Inverse Control of Parametric Shapes"

DAG Amendment for Inverse Control of Parametric Shapes This repository is the official Blender implementation of the paper "DAG Amendment for Inverse

Elie Michel 136 Nov 16, 2021
The official Brainfuckn't esolang

Brainfuckn't Backstory Brainfuckn't is an esolang created by me (4gboframram) that is similar to brainfuck, but definitely isn't. The name came from a

null 1 Nov 7, 2021
Praprotem Official Repository.

Praprotem V1.0.0 Praprotem Official Repository. Praprotem is a project management system being built to help users easily manage all projects from one

Praise Codes 2 Nov 19, 2021
Official Pytorch implementation of RePOSE (ICCV2021)

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering (ICCV2021) [Link] Abstract We present RePOSE, a fast iterative refinement method for

Shun Iwase 34 Nov 22, 2021
This fork adds enhancements to the Loz project (Legend of Zelda remake).

LOZ This project is a remake of the game Legend of Zelda. Summary The repository is split into a game project and two tools. The tools extract resourc

Aldo Núñez 17 Nov 16, 2021
this is a repo of where you will get to see tic tac toe AI intregrated project

?? Tic-Tac-Toe-AI-Intregrated ?? What is the meaning of AI Intregrated ??‍♀️ ??‍♂️ ❓ ❓ You all have Played Tic Tac Toe in your life if you don't know

Ujjwal 10 Nov 24, 2021
Super Mario Remake using C++, SFML, and Image Processing which was a project for Structure Programming Course, 1st Year

Super Mario Remake We use : C++ in OOP concepts SFML for game animations and sound effects. Image processing (Tensorflow and openCV) to add additional

Omar Elshopky 4 Nov 20, 2021
This is a small example project, that showcases the possibility of using a surrogate model to estimate the drag coefficient of arbitrary triangles.

flowAroundTriangles This is a small example project, that showcases the possibility of using a surrogate model to estimate the drag coefficient of arb

null 4 Jun 22, 2021
A project to control Petoi Bittle using human pose

Petoi Bittle Controlled by Pose A project to control Petoi Bittle by human pose Human pose is estimated using MoveNet and TensorFlow Lite YouTube Syst

iwatake 10 Jul 18, 2021
This is a sample ncnn android project, it depends on ncnn library and opencv

This is a sample ncnn android project, it depends on ncnn library and opencv

null 149 Dec 4, 2021
A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.

NVIDIA AI IOT 3 Nov 6, 2021
OpenSpeaker is a completely independent and open source speaker recognition project.

OpenSpeaker is a completely independent and open source speaker recognition project. It provides the entire process of speaker recognition including multi-platform deployment and model optimization.

ZY 13 Dec 2, 2021
Mirror of compiler project code. Not for SVC purpose.

Compiler-proj Project progress is updated here. Progress 2021/11/28: Started! Set up Makefile and finished basic scanner. 2021/10/24: Repo created. Ac

Yuheng 1 Nov 28, 2021
✔️The smallest header-only GUI library(4 KLOC) for all platforms

Welcome to GUI-lite The smallest header-only GUI library (4 KLOC) for all platforms. 中文 Lightweight ✂️ Small: 4,000+ lines of C++ code, zero dependenc

null 5.8k Nov 28, 2021
The Pizza Compass will determine your location and direct you to the nearest pizza place. It’s like a regular compass, but better!

Pizza_Compass A Particle project named Pizza_Compass Welcome to your project! Every new Particle project is composed of 3 important elements that you'

Joe Grand 44 Jul 31, 2021
A rosbag2 recorder node that backs up split files to an external location during recording

System Data Recorder (SDR) A lifecycle node and executable for recording topic data to a rosbag2 bag, while simultaneously copying the split bag files

Open Robotics 3 Nov 19, 2021
Official examples and tools from the JACK project

JACK example tools This repository holds the official JACK example clients and tools, which have been tracked in the example-clients and tools reposit

JACK Audio Connection Kit 7 Nov 6, 2021
The official distribution of olcPixelGameEngine, a tool used in javidx9's YouTube videos and projects

olcPixelGameEngine The official distribution of olcPixelGameEngine, a tool used in javidx9's YouTube videos and projects. You only need the one file -

Javidx9 2.7k Dec 3, 2021
The official Open-Asset-Importer-Library Repository. Loads 40+ 3D-file-formats into one unified and clean data structure.

Open Asset Import Library (assimp) A library to import and export various 3d-model-formats including scene-post-processing to generate missing render

Open Asset Import Library 7.3k Dec 4, 2021
Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket

JSMN jsmn (pronounced like 'jasmine') is a minimalistic JSON parser in C. It can be easily integrated into resource-limited or embedded projects. You

Serge Zaitsev 2.8k Dec 4, 2021
https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/

\mainpage json-c Overview and Build Status Building on Unix Prerequisites Build commands CMake options Testing Building with vcpkg Linking to libjson-

json-c 2.3k Dec 2, 2021
Experimental telegram client based on official Android sources

Catogram Experimental telegram client based on official Android sources Catogram features: Message translator TGX Style of context menu VKUI Icons and

null 183 Nov 26, 2021
The official C++ client API for PostgreSQL.

libpqxx Welcome to libpqxx, the C++ API to the PostgreSQL database management system. Home page: http://pqxx.org/development/libpqxx/ Find libpqxx on

Jeroen Vermeulen 565 Dec 2, 2021
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 66 Nov 18, 2021
Official Vanguard Anti-Cheat source code.

Vanguard Official Vanguard Anti-Cheat source code. Using the compiled binary For ease, an unprotected compiled version of Vanguard is available. Downl

Riot Vanguard 383 Dec 6, 2021