kaldi-asr/kaldi is the official location of the Kaldi project.

Overview

Build Status Gitpod Ready-to-Code Kaldi Speech Recognition Toolkit

To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux; Darwin; and Cygwin (has not been tested on more "exotic" varieties of UNIX). For Windows installation instructions (excluding Cygwin), see windows/INSTALL.

To run the example system builds, see egs/README.txt

If you encounter problems (and you probably will), please do not hesitate to contact the developers (see below). In addition to specific questions, please let us know if there are specific aspects of the project that you feel could be improved, that you find confusing, etc., and which missing features you most wish it had.

Kaldi information channels

For HOT news about Kaldi see the project site.

Documentation of Kaldi:

  • Info about the project, description of techniques, tutorial for C++ coding.
  • Doxygen reference of the C++ code.

Kaldi forums and mailing lists:

We have two different lists

  • User list kaldi-help
  • Developer list kaldi-developers:

To sign up to any of those mailing lists, go to http://kaldi-asr.org/forums.html:

Development pattern for contributors

  1. Create a personal fork of the main Kaldi repository in GitHub.
  2. Make your changes in a named branch different from master, e.g. you create a branch my-awesome-feature.
  3. Generate a pull request through the Web interface of GitHub.
  4. As a general rule, please follow Google C++ Style Guide. There are a few exceptions in Kaldi. You can use the Google's cpplint.py to verify that your code is free of basic mistakes.

Platform specific notes

PowerPC 64bits little-endian (ppc64le)

Android

  • Kaldi supports cross compiling for Android using Android NDK, clang++ and OpenBLAS.
  • See this blog post for details.
Issues
  • show L2 norm of parameters during training.

    show L2 norm of parameters during training.

    In addition, set affine to false for batchnorm layers and switch to SGD optimizer.

    The training is still running and a screenshot of the L2-norms of the training parameters is as follows:

    Screen Shot 2020-02-12 at 09 05 51

    I will post the decoding results once it is done.

    opened by csukuangfj 67
  • Wake-word detection

    Wake-word detection

    Results of the regular LF-MMI based recipes:

    Mobvoi: EER=~0.2%, FRR=1.02% at FAH=1.5 vs. FRR=3.8% at FAH=1.5 (Mobvoi paper)

    SNIPS: EER=~0.1%, FRR=0.08% at FAH=0.5 vs. FRR=0.12% at FAH=0.5 (SNIPS paper)

    E2E LF-MMI recipes are still being run to confirm the reproducibility of the previous results.

    opened by freewym 67
  • Multilingual using modified configs

    Multilingual using modified configs

    This is a modified multilingual setup based on new xconfig and training scripts. In this setup, xconfig used to create network configuration for multilingual training. Also the egs generation moved out of training script and multilingual egs dir passed to train_raw_dnn.py. Also a new script added for average posterior computation and prior adjustment.

    opened by pegahgh 65
  • CUDA context creation problem in nnet3 training with

    CUDA context creation problem in nnet3 training with "--use-gpu=wait" option

    I am not sure if this is a Kaldi issue but I thought someone might have an idea.

    First some context. I am trying to tune a few TDNN chain models on a workstation with 2 Maxwell Titan X 12GB cards. The data sets I am working with are fairly small (Babel full language packs with 40-80 hours audio). Initially I set the number of initial and final training jobs to 2 and trained the models with scripts adapted from babel and swbd recipes. While this worked without any issues, I noticed that the models were overtraining, so I tried tuning relu-dim, number of epochs and xent-regularize with one of the language packs to see if I could get a better model. Eventually the best model I got was with a single epoch and xent-regularize=0.25 (WER base model: 45.5% vs best model: 41.4%). To see if the training schedule might have any further effects on the model performance, I also tried training with --num-jobs-initial=2, --num-jobs-final=8 after setting the GPUs to "default" compute mode to allow the creation of multiple CUDA contexts. I added 2 seconds delay between individual jobs so that earlier jobs would start allocating device memory before a new job is scheduled on the device with the largest amount of free memory. This mostly worked fine, except towards the end when 8 jobs were distributed 5-3 between the two cards. The resulting model had 40.9% WER after 2 epochs and the log probability difference between the train and validation sets was also smaller than before. It seems like the training schedule (number of jobs, learning rate, etc. at each iteration) has an effect on the model performance in this small data scenario. Maybe averaging gradients across a larger number of jobs is beneficial, or the learning rate schedule is somehow tuned for this type of training schedule.

    Now the actual problem. Since large number of jobs seemed to work better for me, I wanted to remove the job delay hack, set GPUs back to "exclusive process" compute mode and take advantage of the --use-gpu=wait option while scheduling the training jobs. However, it seems like I am missing something. If I launch multiple training processes with the --use-gpu=wait option while GPUs are in "exclusive process" compute mode, only one process can create a CUDA context on a given GPU card even after that one process completes. My expectation was that other processes would wait for the GPUs to be available and then one by one acquire the GPUs and complete their work. I added a few debug statements to GetCudaContext function to see what the problem was. cudaDeviceSynchronize call returns "all CUDA-capable devices are busy or unavailable" even after processes running on the GPUs are long gone. Any ideas?

    opened by dogancan 63
  • Modify TransitionModel for more compact chain-model graphs

    Modify TransitionModel for more compact chain-model graphs

    Place holder for addressing #1031 . WIP log:

    1. self_loop_pdf_class added to HmmState, done
    2. self_loop_pdf added to Tuple in TransitionModel. done
    3. another branch of ContextDependencyInterface::GetPdfInfo. ugly done
    4. create test code for new structures. done
    5. back compatability for all read code. done
    6. normal HMM validation using RM. done
    7. chain code modification. done
    8. chain validation using RM. done
    9. iterate 2nd version of GetPdfInfo. done
    10. documents and comments. tbd...
    opened by naxingyu 63
  • add PyTorch's DistributedDataParallel training.

    add PyTorch's DistributedDataParallel training.

    support distributed training across multiple GPUs.

    TODOs:

    • there are lots of code duplicates

    Part of the training log

    2020-02-19 13:55:10,646 INFO [ddp_train.py:160] Device (1) processing 1100/4724(23.285351%) global average objf: -0.225449 over 6165760.0 frames, current batch average objf: -0.130735 over 6400 frames, epoch 0
    2020-02-19 13:55:55,251 INFO [ddp_train.py:160] Device (0) processing 1200/4724(25.402202%) global average objf: -0.216779 over 6732672.0 frames, current batch average objf: -0.123979 over 3840 frames, epoch 0
    2020-02-19 13:55:55,252 INFO [ddp_train.py:160] Device (1) processing 1200/4724(25.402202%) global average objf: -0.216412 over 6738176.0 frames, current batch average objf: -0.132368 over 4736 frames, epoch 0
    

    The training seems working.

    opened by csukuangfj 62
  • Is there any speaker diarization documentation and already trained model?

    Is there any speaker diarization documentation and already trained model?

    Hi there, thanks for Kaldi :)

    I want to perform speaker diarization on a set of audio recordings. I believe Kaldi recently added the speaker diarization feature. I have managed to find this link, however, I have not been able to figure out how to use it since there is very little documentation. Also, may I ask is there any already trained model on conversions in English that I can use off-the-shelf, please?

    Thanks a lot!

    opened by bwang482 61
  • expose egs as Dataloader

    expose egs as Dataloader

    Expose egs as a Dataloader in PyTorch, training time now decreased from 150mins to 90mins for 6 epochs with 4 workers.

    RESULT

    ||TDNN-F(Pytorch, Adam, delta dropout without ivector ) from @fanlu | TDNN-F(Pytorch, Adam, delta dropout without ivector ) this PR 2nd run | TDNN-F(Pytorch, Adam, delta dropout without ivector ) this PR 1st run | this Pr with commit 0d8aada to make dropout go to zero at the end | |--|--|--|--|--| |dev_cer|6.10|6.13|6.18|6.12 |dev_wer|13.86|13.89|13.96|13.92 |test_cer|7.14|7.19|7.20|7.26 |test_wer|15.49|15.54|15.66|15.63 |training_time|151mins|88mins|84mins|

    WER/CER increase may come from:

    • Shuffle, we do not shuffle egs-minibatch during each epoch.
    • Dropout, we use pseudo_epoch (one scp file is one pseudo-epoch) to compute data_fraction in dropout, that is relatively too much coarse-grained than using batch_idx

    Note that I have tried copy|shuffle|merge in dataloader (see code below"), but seems that it will take as much time as (or even a little more time than) the original approach (egs as a Dataset), I may do further experiment to look into this:

     scp_rspecifier = scp_file_to_process
     egs_rspecifier = 'ark,bg:nnet-chain-copy-egs --frame-shift .. scp:scp_rspecifier ark:- | \
                            nnet3-chain-shuffle-egs --buffer-size .. --srand .. ark:- ark:- | \
                            nnet3-chain-merge-egs --minibatch-size .. ark:- ark:- |'
     with SequentialNnetChainExampleReader(egs_rspecifier) as example_reader:
           for key, eg in example_reader:
                 batch = self.collate_fn(eg)
                 yield pseudo_epoch, batch
    

    TODO

    • [ ] Split egs to more scp files (currently 56 files) to see whether it will get fine-grained in data_fraction of dropout or not.
    • [ ] Do further experiment and trace for approach copy|shuffle|merge in dataloader to confirm the bottleneck of this approach.
    • [ ] Profile first epoch of training to see why it takes so much time, as we can see now that first epoch training would take most part of time of the whole training time, no matter what approach (egs as dataset or dataloader) we use.
    opened by qindazhu 58
  • [src] CUDA Online/Offline pipelines + light batched nnet3 driver

    [src] CUDA Online/Offline pipelines + light batched nnet3 driver

    This is still WIP. Requires some cleaning, integrating the online mfcc into a separate PR (cf below), and some other things.

    Implementing a low-latency high-throughput pipeline designed for online. It uses the GPU decoder, the GPU mfcc/ivector, and a new lean nnet3 driver (including nnet3 context switching on device).

    • Online/Offline pipelines

    The online pipeline can be seen as taking a batch as input, and then processing a very regular algorithm of calling feature extraction, nnet3, decoder, and postprocessing on that same batch, in a synchronous fashion (i.e. all of those steps will run when DecodeBatch is called. Nothing is sent to some async pipelines along the way). What happens when you run DecodeBatch is very regular, and because of that it is able to guarantee some latency constraints (because the way the code will be executed is very predicable). It also focus on being lean, avoiding reallocations or recomputations (such as recompiling nnet3).

    The online pipeline takes care of computing [MFCC, iVectors], nnet3, decoder, postprocessing. It can either uses as input chunks of raw audio (and then compute mfcc->nnet3->decoder->postprocessing), or it can be called directly with mfcc features/ivectors (and then compute nnet3->decoder->postprocessing). The second possibility is used by the offline wrapper when use_online_ivectors=false.

    The old offline pipeline is replaced by a new offline pipeline which is mostly a wrapper around the online pipeline. What it does is having an offline-friendly API (accepting full utterances as input instead of chunks), and has the possibility to pre-compute ivectors on the full utterance first (use_online_ivectors = false). It then calls the online pipeline internally to compute most of the work.

    The easiest way to test the online pipeline end-to-end is to call it through the offline wrapper for now, with use_online_ivectors = true. Please note that ivectors will be ignored for now in this full end-to-end online (i.e. when use_online_ivectors=true). That's because the GPU ivectors are not yet ready for online. However the pipeline code is ready. The offline pipeline with use_online_ivectors=false should be fully functional and returns the same WER than before.

    • Light nnet3 driver designed for GPU and online

    It includes a new light nnet3 driver designed for the GPU. The key idea is that it's usually better to waste some flops to compute things such as partial chunks or partial batches. For example for the last chunk (nframes=17) of an utterance, that chunk can be smaller than max_chunk_size (50 frames per default). It that case compiling a new nnet3 computation for that exact chunk size is slower than just running it for a chunk size of 50 and ignoring the invalid output.

    Same idea for batch_size: The nnet3 computation will always run a fixed minibatch size. It is defined as minibatch_size = std::min(max_batch_size, MAX_MINIBATCH_SIZE). MAX_MINIBATCH_SIZE is defined to be large enough to hide the kernel launch latency and increase the arithmetic intensity of the GEMMs, but not larger, so that partial batches will not be slowed down too much (i.e. avoiding to run a minibatch of size 512 where only 72 utterances are valid). MAX_MINIBATCH_SIZE is currently 128. We'll then run nnet3 multiple time on the same batch if necessary. If batch_size=512, we'll run nnet3 (with minibatch_size=128) four times.

    The context-switch (to restore the nnet left and right context, and ivector) is done on device. Everything that needs context-switch is using the concept of channels, to be consistent with the GPU decoder.

    Those "lean" approaches gave us better performance, and a drop in memory usage (total GPU memory usage from 15GB to 4GB for librispeech and batch size 500). It also removes the need for "high level" multithreading (i.e. cuda-control-threads).

    • Parameters simplification

    Dropping some parameters because the new code design doesn't require them (--cuda-control-threads, the drain size parameter). In theory the configuration should be greatly simplified (only --max-batch-size needs to be set, others are optional).

    • Adding batching and online to GPU mfcc

    The code in cudafeat/ is modifying the mfcc GPU code. MFCC features can now be batched and processed online (restoring a few hundreds frames of past audio for each new chunk). That code was implemented by @mcdavid109 (thanks!). We'll create a separate PR for this, it requires some cleaning, and a large part of the code is redundant with existing mfcc files. GPU batched online ivectors and cmvn are WIP.

    • Indicative measurements

    When used with use_online_ivectors=false, that code reach 4,940 XRTF on librispeech/test_clean, with a latency around 6x realtime for max_batch_size=512 (latency would be lower with smaller max_batch_size). One use case where that GPU pipeline can be used in a situation where only latency matters (and not throughput) is for instance on the jetson nano, where some initial runs were measured at 5-10x realtime latency for a single channel (max_batch_size=1) on librispeech/clean. Those measurements are indicative only - more reliable measurements will be done in the future.

    opened by hugovbraun 56
  • Online2 NNet3 TCP server program

    Online2 NNet3 TCP server program

    Several people asked for this and I feel like it would be a nice addition to the project.

    The protocol is much simpler than the audio-server program I did a while ago - audio in -> text out.

    The way it's made now is nice for a live demo (I added some commands to the doxygen docs), but may still lack some features for real-life use.

    The main issue I have is that the new decoder is slightly different than before. The old decoder had a way to check which part of the output is final and which is "partial". This time, I can only check the current best path every N seconds (eg once per second of input audio). I use endpointing to determine when to finalize decoding.

    Now, what would be really nice it to have online speech detection and speaker diarization included with this, but I know it's probably not happening too soon. What can be done (and I may do it myself if I find time) is a multithreaded version of the program with shared acoustic model and FST. Also, I bet it could be possible to combine the grammar version with the server to allow runtime vocabulary modification.

    I also have a web interface that works with this server, but I'm not sure if it would fit the main Kaldi repo, so I'll probably make a separate repo for that (if anyone wants it).

    I'm open to comments and suggestions.

    opened by danijel3 56
  • Xvectors: DNN Embeddings for Speaker Recognition

    Xvectors: DNN Embeddings for Speaker Recognition

    Overview This pull request adds xvectors for speaker recognition. The system consists of a feedforward DNN with a statistics pooling layer. Training is multiclass cross entropy over the list of training speakers (we may add other methods in the future). After training, variable-length utterances are mapped to fixed-dimensional embeddings or “xvectors” and used in a PLDA backend. This is based on http://www.danielpovey.com/files/2017_interspeech_embeddings.pdf, but includes recent enhancements not in that paper, such as data augmentation.

    This PR also adds a new data augmentation script, which is important to achieve good performance in the xvector system. It is also helpful for ivectors (but only in PLDA training).

    This PR adds a basic SRE16 recipe to demonstrate the system. An ivector system is in v1, and an xvector system is in v2.

    Example Generation An example consists of a chunk of speech features and the corresponding speaker label. Within an archive, all examples have the same chunk-size, but the chunk-size varies across archives. The relevant additions:

    • sid/nnet3/xvector/get_egs.sh — Top-level script for example creation
    • sid/nnet3/xvector/allocate_egs.py — This script is responsible for deciding what is contained in the examples and what archives they belong to.
    • src/nnet3bin/nnet3-xvector-get-egs — The binary for creating the examples. It constructs examples based on the ranges.* file.

    Training This version of xvectors is trained with multiclass cross entropy (softmax over the training speakers). Fortunately, steps/nnet3/train_raw_dnn.py is compatible with the egs created here, so no new code is needed for training. Relevant code:

    • sre16/v1/local/nnet3/xvector/tuning/run_xvector_1a.sh — Does example creation, creates the xconfig, and trains the nnet

    Extracting XVectors After training, the xvectors are extracted from a specified layer of the DNN after the temporal pooling layer. Relevant additions

    • sid/nnet3/xvector/extract_xvectors.sh — Extracts embeddings from the xvector DNN. This is analogous to extract_ivectors.sh.
    • src/nnet3bin/nnet3-xvector-compute — Does the forward computation for the xvector DNN (variable-length input, with a single output).

    Augmentation We’ve found that embeddings almost always benefit from augmented training data. This appears to be true even when evaluated on clean telephone speech. Relevant additions:

    • steps/data/augment_data_dir.py — Similar to reverberate_data_dir.py but only handles additive noise.
    • egs/sre16/v1/run.sh — PLDA training list is augmented with reverb and MUSAN audio
    • egs/sre16/v2/run.sh — DNN training and PLDA list are augmented with reverb and MUSAN.

    SRE16 Recipe The PR includes a bare bones SRE16 recipe. The goal is primarily to demonstrate how to train and evaluate an xvector system. The version in egs/sre16/v1/ is a straightforward i-vector system. The recipe in egs/sre16/v2 contains the DNN embedding recipe. Relevant additions

    • egs/sre16/v1/local/ — A bunch of dataprep scripts
    • egs/sre16/v2/local/nnet3/xvector/prepare_feats_for_egs.sh -- A script that applies cmvn and removes silence frames and writes the results to disk. This is what the nnet examples are generated from.
    • egs/sre16/v1/run.sh — ivector top-level script
    • egs/sre16/v2/run.sh — xvector top-level script

    Results for this example:

      xvector (from v2) EER: Pooled 8.76%, Tagalog 12.73%, Cantonese 4.86%
      ivector (from v1) EER: Pooled 12.98%, Tagalog 17.8%, Cantonese 8.35%
    

    Note that the recipe is somewhat "bare bones." We could improve the results for the xvector system further by adding even more training data (e.g., Voxceleb: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/). Both systems would improve from updates to the backend such as adaptive score normalization or more effective PLDA domain adaptation techniques. However, I believe that is orthogonal to this PR.

    opened by david-ryan-snyder 55
  • How to create an action in Kaldi?

    How to create an action in Kaldi?

    Well.. I'm new to this area so I came here to ask for the help of some administrator or other exceptional programmer. my question is that I don't know how to create an action for my React test example. For example I say "Donuts" and after he understands the word he will reveal himself with some (IF) that I will put...

    OBS: I've tried using several things to transform the final result into STR and Print I've used JSON, but when I try the word is invisible. if there is another way or am i doing it wrong..please HELP me!!

    R = json.dumps(Result) if "blue" in rec.Result(): print("blue")

    #I try to use JSON and the (print) as said above is invisible

    opened by MR-ALLONE 0
  • Training failing for custom egs

    Training failing for custom egs

    Hi, I've been using Kaldi in a way that probably isn't supported. I'm trying to train a dnn using a sequence of vectors as an input and a target of a label (0,1 or 2). Borrowing from the xvector training code I tried to train a raw dnn, however, the training itself doesn't converge. My best guess is that my indexes might be wrong on my sequences of input vectors.

    bug 
    opened by ExarchD 0
  • kaldi build on riscv64

    kaldi build on riscv64

    Hi all,

    I want to build kaldi on riscv64, can anyone help me to build kaldi on riscv64.

    when i am running this command facing issue, output log I attached here-

    [email protected]:/opt/kaldi/src# make -j 10 online2 lm rnnlm

    test -d /opt/kaldi/src/lib || mkdir /opt/kaldi/src/lib The version of configure script matches kaldi.mk version. Good. make -C base make[1]: Entering directory '/opt/kaldi/src/base' c++ -std=c++17 -I.. -isystem /opt/kaldi/tools/openfst/include -O1 -Wall -Wno-sign-compare -Wno-unused-local-typedefs -Wno-deprecated-declarations -Winit-self -DKALDI_DOUBLEPRECISION=0 -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_OPENBLAS -I/opt/kaldi/tools/OpenBLAS/install/include -msse -msse2 -pthread -g -fPIC -DUSE_KALDI_SVD -c -o kaldi-math.o kaldi-math.cc c++ -std=c++17 -I.. -isystem /opt/kaldi/tools/openfst/include -O1 -Wall -Wno-sign-compare -Wno-unused-local-typedefs -Wno-deprecated-declarations -Winit-self -DKALDI_DOUBLEPRECISION=0 -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_OPENBLAS -I/opt/kaldi/tools/OpenBLAS/install/include -msse -msse2 -pthread -g -fPIC -DUSE_KALDI_SVD -c -o kaldi-error.o kaldi-error.cc c++ -std=c++17 -I.. -isystem /opt/kaldi/tools/openfst/include -O1 -Wall -Wno-sign-compare -Wno-unused-local-typedefs -Wno-deprecated-declarations -Winit-self -DKALDI_DOUBLEPRECISION=0 -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_OPENBLAS -I/opt/kaldi/tools/OpenBLAS/install/include -msse -msse2 -pthread -g -fPIC -DUSE_KALDI_SVD -c -o io-funcs.o io-funcs.cc c++ -std=c++17 -I.. -isystem /opt/kaldi/tools/openfst/include -O1 -Wall -Wno-sign-compare -Wno-unused-local-typedefs -Wno-deprecated-declarations -Winit-self -DKALDI_DOUBLEPRECISION=0 -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_OPENBLAS -I/opt/kaldi/tools/OpenBLAS/install/include -msse -msse2 -pthread -g -fPIC -DUSE_KALDI_SVD -c -o kaldi-utils.o kaldi-utils.cc c++ -std=c++17 -I.. -isystem /opt/kaldi/tools/openfst/include -O1 -Wall -Wno-sign-compare -Wno-unused-local-typedefs -Wno-deprecated-declarations -Winit-self -DKALDI_DOUBLEPRECISION=0 -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_OPENBLAS -I/opt/kaldi/tools/OpenBLAS/install/include -msse -msse2 -pthread -g -fPIC -DUSE_KALDI_SVD -c -o timer.o timer.cc c++: error: unrecognized command-line option '-msse' c++: error: unrecognized command-line option '-msse' c++: error: unrecognized command-line option '-msse2' c++: error: unrecognized command-line option '-msse2' c++: error: unrecognized command-line option '-msse' make[1]: *** [<builtin>: kaldi-error.o] Error 1 make[1]: *** Waiting for unfinished jobs.... make[1]: *** [<builtin>: kaldi-math.o] Error 1 c++: error: unrecognized command-line option '-msse2' make[1]: *** [<builtin>: io-funcs.o] Error 1 c++: error: unrecognized command-line option '-msse' c++: error: unrecognized command-line option '-msse2' make[1]: *** [<builtin>: kaldi-utils.o] Error 1 c++: error: unrecognized command-line option '-msse' c++: error: unrecognized command-line option '-msse2' make[1]: *** [<builtin>: timer.o] Error 1 make[1]: Leaving directory '/opt/kaldi/src/base' make: *** [Makefile:147: base] Error 2 T&R Kush

    kaldi10-TODO 
    opened by kush930 4
  • The certificate has expired when running `install_portaudio.sh` in Github Action

    The certificate has expired when running `install_portaudio.sh` in Github Action

    Hi team,

    I am not sure if this is the correct place to post this issue. I am running a CI pipeline using Github actions which consists of buidling a docker image.

    A portion of my Dockerfile

    ...
    
    RUN mkdir -p /home/appuser/opt
    WORKDIR /home/appuser/opt
    #Commit on May 15, 2019 
    ENV KALDI_SHA1 35f96db7082559a57dcc222218db3f0be6dd7983
    RUN git clone https://github.com/kaldi-asr/kaldi && \
        cd /home/appuser/opt/kaldi && \
        git reset --hard $KALDI_SHA1 && \
        cd /home/appuser/opt/kaldi/tools && \
        make -j 2 && \
        ./install_portaudio.sh
    

    This is part of my error log

    2022-06-06 21:17:42 (2.40 MB/s) - ‘sph2pipe_v2.5.tar.gz’ saved [329832/329832]
    wget -T 10 -t 3 -O cub-1.8.0.zip https://github.com/NVlabs/cub/archive/1.8.0.zip
    --2022-06-06 21:17:42--  https://github.com/NVlabs/cub/archive/1.8.0.zip
    Resolving github.com (github.com)... 140.82.114.3
    Connecting to github.com (github.com)|140.82.114.3|:443... connected.
    302 Found
    Location: https://us.openslr.org/resources/2/openfst-1.6.7.tar.gz [following]
    --2022-06-06 21:17:42--  https://us.openslr.org/resources/2/openfst-1.6.7.tar.gz
    Resolving us.openslr.org (us.openslr.org)... 46.101.158.64
    Connecting to us.openslr.org (us.openslr.org)|46.101.158.64|:443... HTTP request sent, awaiting response... connected.
    302 Found
    Location: https://codeload.github.com/NVlabs/cub/zip/refs/tags/1.8.0 [following]
    --2022-06-06 21:17:42--  https://codeload.github.com/NVlabs/cub/zip/refs/tags/1.8.0
    Resolving codeload.github.com (codeload.github.com)... 140.82.113.10
    Connecting to codeload.github.com (codeload.github.com)|140.82.113.10|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [application/zip]
    Saving to: ‘cub-1.8.0.zip’
         0K .......... .......... .......... .......... .......... 6.76M
        50K .......... .......... .......... .......... .......... 5.19M
       100K .......... .......... .......... .......... .......... 5.88M
       150K .......... .......... .......... .......... .......... 6.18M
       200K .......... .......... .......... .......... .......... 6.32M
       250K .......... .......... .......... .......... .......... 7.28M
       300K .......... .......... .......... .......... .......... 6.65M
       350K .......... .......... .......... .......... ....ERROR: The certificate of ‘us.openslr.org’ is not trusted.
    ERROR: The certificate of ‘us.openslr.org’ has expired.
    ..make: *** [Makefile:92: openfst-1.6.7.tar.gz] Error 5
    .make: *** Waiting for unfinished jobs....
    ... 5.59M
       400K .................... .......... .......... .......... 5.89M
       450K .......... .......... .......... .......... .......... 5.67M
       500K .......... .......... .......... .......... .......... 6.02M
       550K .......... .......... .......... ........              7.81M=0.09s
    2022-06-06 21:17:43 (6.17 MB/s) - ‘cub-1.8.0.zip’ saved [602396]
    unzip -oq cub-1.8.0.zip
    rm -f cub
    ln -s cub-1.8.0 cub
    The command '/bin/sh -c git clone https://github.com/kaldi-asr/kaldi &&     cd /home/appuser/opt/kaldi &&     git reset --hard $KALDI_SHA1 &&     cd /home/appuser/opt/kaldi/tools &&     make -j 2 &&     ./install_portaudio.sh' returned a non-zero code: 2
    Error: Process completed with exit code 2.
    

    The Dockerfile works fine in my local machine. I suspect that the job is taking too long, however, the job has only ran for 4minutes which is still below the limit for Github Action's allowed job duration.

    I wonder if anyone has any insights/solution for this ? Thanks !

    opened by kaikiat 7
  • Initial Web Commit

    Initial Web Commit

    These Are the Initial Commits for the Website Improvement. @danpovey the design are 'not' final. assume to see it finish by Thursday(June 9, 2022)

    The Following Contents are just Placeholder for the Big One!.

    Plans

    1. Download Button When Click should Show a Pre-Compiled Packages in the