Lyra: a generative low bitrate speech codec

Related tags

Algorithms lyra
Overview

Lyra: a generative low bitrate speech codec

What is Lyra?

Lyra is a high-quality, low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this it applies traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.

Overview

The basic architecture of the Lyra codec is quite simple. Features are extracted from speech every 40ms and are then compressed for transmission at a bitrate of 3kbps. The features themselves are log mel spectrograms, a list of numbers representing the speech energy in different frequency bands, which have traditionally been used for their perceptual relevance because they are modeled after human auditory response. On the other end, a generative model uses those features to recreate the speech signal.

Lyra harnesses the power of new natural-sounding generative models to maintain the low bitrate of parametric codecs while achieving high quality, on par with state-of-the-art waveform codecs used in most streaming and communication platforms today.

Computational complexity is reduced by using a cheaper recurrent generative model, a WaveRNN variation, that works at a lower rate, but generates in parallel multiple signals in different frequency ranges that it later combines into a single output signal at the desired sample rate. This trick, plus 64-bit ARM optimizations, enables Lyra to not only run on cloud servers, but also on-device on mid-range phones, such as Pixel phones, in real time (with a processing latency of 90ms). This generative model is then trained on thousands of hours of speech data with speakers in over 70 languages and optimized to accurately recreate the input audio.

Prerequisites

There are a few things you'll need to do to set up your computer to build Lyra.

Common setup

Lyra is built using Google's build system, Bazel. Install it following these instructions.

Lyra can be built from linux using bazel for an arm android target, or a linux target. The android target is optimized for realtime performance. The linux target is typically used for development and debugging.

You will also need to install some tools (which may already be on your system). You can install them with:

sudo apt update
sudo apt install ninja-build git cmake clang python

Linux requirements

The instructions below are for Ubuntu and have been verified on 20.04.

You will need to install a certain version of clang to ensure ABI compatibility.

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout 96ef4f307df2

mkdir build_clang
cd build_clang
cmake -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_PROJECTS="clang" -DCMAKE_BUILD_TYPE=release ../llvm
ninja
sudo $(which ninja) install

cd ..
mkdir build_libcxx
cd build_libcxx
cmake -G Ninja -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi" -DCMAKE_BUILD_TYPE=release ../llvm
ninja
sudo $(which ninja) install

sudo ldconfig

Note: the above will install a particular version of libc++ to /usr/local/lib, and clang to /usr/local/bin, which the toolchain depends on.

Android requirements

Building on android requires downloading a specific version of the android NDK toolchain. If you develop with Android Studio already, you might not need to do these steps if ANDROID_HOME and ANDROID_NDK_HOME are defined and pointing at the right version of the NDK.

  1. Download the sdk manager from https://developer.android.com/studio
  2. Unzip and cd to the directory
  3. Check the available packages to install in case they don't match the following steps.
bin/sdkmanager  --sdk_root=$HOME/android/sdk --list

Some systems will already have the java runtime set up. But if you see an error here like ERROR: JAVA_HOME is not set and no 'java' command could be found on your PATH., this means you need to install the java runtime with sudo apt install default-jdk first. You will also need to add export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 (type ls /usr/lib/jvm to see which path was installed) to your $HOME/.bashrc and reload it with source $HOME/.bashrc.

  1. Install the r21 ndk, android sdk 29, and build tools:
bin/sdkmanager  --sdk_root=$HOME/android/sdk --install  "platforms;android-29" "build-tools;29.0.3" "ndk;21.4.7075529"
  1. Add the following to .bashrc (or export the variables)
export ANDROID_NDK_HOME=$HOME/android/sdk/ndk/21.4.7075529
export ANDROID_HOME=$HOME/android/sdk
  1. Reload .bashrc (with source $HOME/.bashrc)

Building

The building and running process differs slightly depending on the selected platform.

Building for Linux

You can build the cc_binaries with the default config. encoder_main is an example of a file encoder.

bazel build -c opt :encoder_main

You can run encoder_main to encode a test .wav file with some speech in it, specified by --input_path. The --model_path flag contains the model data necessary to encode, and --output_path specifies where to write the encoded (compressed) representation.

bazel-bin/encoder_main --model_path=wavegru --output_dir=$HOME/temp --input_path=testdata/16khz_sample_000001.wav

Similarly, you can build decoder_main and use it on the output of encoder_main to decode the encoded data back into speech.

bazel build -c opt :decoder_main
bazel-bin/decoder_main  --model_path=wavegru --output_dir=$HOME/temp/ --encoded_path=$HOME/temp/16khz_sample_000001.lyra

Building for Android

Android App

There is an example APK target called lyra_android_example that you can build after you have set up the NDK.

This example is an app with a minimal GUI that has buttons for two options. One option is to record from the microphone and encode/decode with Lyra so you can test what Lyra would sound like for your voice. The other option runs a benchmark that encodes and decodes in the background and prints the timings to logcat.

bazel build android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK
adb install bazel-bin/android_example/lyra_android_example.apk

After this you should see an app called "Lyra Example App".

You can open it, and you will see a simple TextView that says the benchmark is running, and when it finishes.

Press "Record from microphone", say a few words (be sure to have your microphone near your mouth), and then press "Encode and decode to speaker". You should hear your voice being played back after being coded with Lyra.

If you press 'Benchmark', you should you should see something like the following in logcat on a Pixel 4 when running the benchmark:

I  Starting benchmarkDecode()
I  I20210401 11:04:06.898649  6870 lyra_wavegru.h:75] lyra_wavegru running fast multiplication kernels for aarch64.
I  I20210401 11:04:06.900411  6870 layer_wrapper.h:162] |lyra_16khz_ar_to_gates_| layer:  Shape: [3072, 4]. Sparsity: 0
I  I20210401 11:04:07.031975  6870 layer_wrapper.h:162] |lyra_16khz_gru_layer_| layer:  Shape: [3072, 1024]. Sparsity: 0.9375
...
I  I20210401 11:04:26.700160  6870 benchmark_decode_lib.cc:167] Using float arithmetic.
I  I20210401 11:04:26.700352  6870 benchmark_decode_lib.cc:85] conditioning_only stats for generating 2000 frames of audio, max: 506 us, min: 368 us, mean: 391 us, stdev: 10.3923.
I  I20210401 11:04:26.725538  6870 benchmark_decode_lib.cc:85] model_only stats for generating 2000 frames of audio, max: 12690 us, min: 9087 us, mean: 9237 us, stdev: 262.416.
I  I20210401 11:04:26.729460  6870 benchmark_decode_lib.cc:85] combined_model_and_conditioning stats for generating 2000 frames of audio, max: 13173 us, min: 9463 us, mean: 9629 us, stdev: 270.788.
I  Finished benchmarkDecode()

This shows that decoding a 25Hz frame (each frame is .04 seconds) takes 9629 microseconds on average (.0096 seconds). So decoding is performed at around 4.15 (.04/.0096) times faster than realtime.

For even faster decoding, you can use a fixed point representation by building with --copt=-DUSE_FIXED16, although there may be some loss of quality.

To build your own android app, you can either use the cc_library target outputs to create a .so that you can use in your own build system. Or you can use it with an android_binary rule within bazel to create an .apk file as in this example.

There is a tutorial on building for android with Bazel in the bazel docs.

Android command-line binaries

There are also the binary targets that you can use to experiment with encoding and decoding .wav files.

You can build the example cc_binary targets with:

bazel build -c opt :encoder_main --config=android_arm64
bazel build -c opt :decoder_main --config=android_arm64

This builds an executable binary that can be run on android 64-bit arm devices (not an android app). You can then push it to your android device and run it as a binary through the shell.

# Push the binary and the data it needs, including the model, .wav, and .so files:
adb push bazel-bin/encoder_main /data/local/tmp/
adb push bazel-bin/decoder_main /data/local/tmp/
adb push wavegru/ /data/local/tmp/
adb push testdata/ /data/local/tmp/
adb shell mkdir -p /data/local/tmp/_U_S_S_Csparse_Uinference_Umatrixvector___Ulib_Sandroid_Uarm64
adb push bazel-bin/_solib_arm64-v8a/_U_S_S_Csparse_Uinference_Umatrixvector___Ulib_Sandroid_Uarm64/libsparse_inference.so /data/local/tmp/_U_S_S_Csparse_Uinference_Umatrixvector___Ulib_Sandroid_Uarm64

adb shell
cd /data/local/tmp
./encoder_main --model_path=/data/local/tmp/wavegru --output_dir=/data/local/tmp --input_path=testdata/16khz_sample_000001.wav
./decoder_main --model_path=/data/local/tmp/wavegru --output_dir=/data/local/tmp --encoded_path=16khz_sample_000001.lyra

The encoder_main/decoder_main as above should also work.

API

For integrating Lyra into any project only two APIs are relevant: LyraEncoder and LyraDecoder.

DISCLAIMER: At this time Lyra's API and bit-stream are not guaranteed to be stable and might change in future versions of the code.

On the sending side, LyraEncoder can be used to encode an audio stream using the following interface:

class LyraEncoder : public LyraEncoderInterface {
 public:
  static std::unique_ptr Create(
      int sample_rate_hz, int num_channels, int bitrate, bool enable_dtx,
      const ghc::filesystem::path& model_path);

  absl::optionaluint8_t>> Encode(
      const absl::Span<const int16_t> audio) override;

  int sample_rate_hz() const override;

  int num_channels() const override;

  int bitrate() const override;

  int frame_rate() const override;
};

The static Create method instantiates a LyraEncoder with the desired sample rate in Hertz, number of channels and bitrate, as long as those parameters are supported. Else it returns a nullptr. The Create method also needs to know if DTX should be enabled and where the model weights are stored. It also checks that these weights exist and are compatible with the current Lyra version.

Given a LyraEncoder, any audio stream can be compressed using the Encode method. The provided span of int16-formatted samples is assumed to contain 40ms of data at the sample rate chosen at Create time. As long as this condition is met the Encode method returns the encoded packet as a vector of bytes that is ready to be stored or transmitted over the network.

The rest of the LyraEncoder methods are just getters for the different predetermined parameters.

On the receiving end, LyraDecoder can be used to decode the encoded packet using the following interface:

class LyraDecoder : public LyraDecoderInterface {
 public:
  static std::unique_ptr Create(
      int sample_rate_hz, int num_channels, int bitrate,
      const ghc::filesystem::path& model_path);

  bool SetEncodedPacket(absl::Span<const uint8_t> encoded) override;

  absl::optionalint16_t>> DecodeSamples(int num_samples) override;

  absl::optionalint16_t>> DecodePacketLoss(
      int num_samples) override;

  int sample_rate_hz() const override;

  int num_channels() const override;

  int bitrate() const override;

  int frame_rate() const override;

  bool is_comfort_noise() const override;
};

Once again, the static Create method instantiates a LyraDecoder with the desired sample rate in Hertz, number of channels and bitrate, as long as those parameters are supported. Else it returns a nullptr. These parameters don't need to be the same as the ones in LyraEncoder. And once again, the Create method also needs to know where the model weights are stored. It also checks that these weights exist and are compatible with the current Lyra version.

Given a LyraDecoder, any packet can be decoded by first feeding it into SetEncodedPacket, which returns true if the provided span of bytes is a valid Lyra-encoded packet.

Then the int16-formatted samples can be obtained by calling DecodeSamples, as long as the total number of samples obtained this way between any two calls to SetEncodedPacket is less than 40ms of data at the sample rate chose at Create time.

If there isn't a packet available, but samples still need to be generated, DecodePacketLoss can be used, which doesn't have a restriction on the number of samples.

In those cases, the decoder might switch to a comfort noise generation mode, which can be checked using is_confort_noise.

The rest of the LyraDecoder methods are just getters for the different predetermined parameters.

For an example on how to use LyraEncoder and LyraDecoder to encode and decode a stream of audio, please refer to the integration test.

License

Use of this source code is governed by a Apache v2.0 license that can be found in the LICENSE file.

Please note that there is a closed-source kernel used for math operations that is linked via a shared object called libsparse_inference.so. We provide the libsparse_inference.so library to be linked, but are unable to provide source for it. This is the reason that a specific toolchain/compiler is required.

Papers

  1. Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018, April). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676-680). IEEE.
  2. Denton, T., Luebs, A., Lim, F. S., Storus, A., Yeh, H., Kleijn, W. B., & Skoglund, J. (2021). Handling Background Noise in Neural Speech Generation. arXiv preprint arXiv:2102.11906.
  3. Kleijn, W. B., Storus, A., Chinen, M., Denton, T., Lim, F. S., Luebs, A., ... & Yeh, H. (2021). Generative Speech Coding with Predictive Variance Regularization. arXiv preprint arXiv:2102.09660.
Comments
  • Lyra bitrate is way too high for a vocoder. How to reduce the bitrate?

    Lyra bitrate is way too high for a vocoder. How to reduce the bitrate?

    Hi, I'm looking the code and trying to guess where to change to reduce the bitrate / change quantizers. There are 25 frames/s of 15 bytes, correct? Is there a way to change this without having to re-train the network?

    3 kbit/s is way too high bitrate for a vocoder. State-of-the-art uses 1.6 kbit/s or less, for example, with LPCNet, or much less with AMBE, TWELP, codec2, or even 20 years old MELPe. For use in HF radios for example, 3 kbit/s it totally a no-go, way too high.

    Is it possible to get in the range of 1.5 kbit/s with Lyra? Even with a degraded quality, having a 1 kbit/s option is important, otherwise all the "standard narrow-band" HF radio use cases are definitely lost.

    enhancement 
    opened by rafael2k 18
  • bazel build -c opt :encoder_main error on 20.04-Ubuntu

    bazel build -c opt :encoder_main error on 20.04-Ubuntu

    when use bazel build -c opt :encoder_main, some error has occur

    ERROR: /home/w/lyra-main/BUILD:860:10: Compiling encoder_main.cc failed: undeclared inclusion(s) in rule '//:encoder_main': this rule is missing dependency declarations for the following files included by 'encoder_main.cc': '/usr/local/lib/clang/14.0.0/include/stddef.h' '/usr/local/lib/clang/14.0.0/include/__stddef_max_align_t.h' '/usr/local/lib/clang/14.0.0/include/stdarg.h' '/usr/local/lib/clang/14.0.0/include/stdint.h' '/usr/local/lib/clang/14.0.0/include/limits.h' Target //:encoder_main failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 6.123s, Critical Path: 4.18s INFO: 3 processes: 3 internal. FAILED: Build did NOT complete successfully

    bug 
    opened by wzhlovepy 11
  • "cannot find Foundation" error for Android example

    System Information: MacOS: 10.15.7 bazel 4.0.0-homebrew ANDROID_NDK_HOME=$HOME/android/sdk/ndk/21.4.7075529 ANDROID_HOME=$HOME/android/sdk Python 3.8.3 Apple clang version 12.0.0 (clang-1200.0.31.1), Target: x86_64-apple-darwin19.6.0 javac 1.8.0_181

    external/androidndk/ndk/toolchains/aarch64-linux-android-4.9/prebuilt/darwin-x86_64/lib/gcc/aarch64-linux-android/4.9.x/../../../../aarch64-linux-android/bin/ld: cannot find Foundation: No such file or directory clang: error: linker command failed with exit code 1 (use -v to see invocation) Target //android_example:lyra_android_example failed to build

    bug help wanted 
    opened by hitrefresh 10
  • Clang version to build llvm

    Clang version to build llvm

    When i use Clang 3.9.1 to build LLVM 12.0, it failed. llvm-project/llvm/include/llvm/ADT/DenseMap.h:550:37: error: no matching constructor for initialization of 'llvm::ValueEnumerator::MDRange' llvm-project/llvm/include/llvm/ADT/DenseMap.h:201:12: error: no matching constructor for initialization of 'llvm::ValueEnumerator::MDRange' llvm-project/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp:811:11: error: no matching constructor for initialization of 'llvm::ValueEnumerator::MDRange' Should i use which version of Clang to build llvm?

    bug 
    opened by HyacinthJingjing 9
  • Building on Linux (Debian) without Android support

    Building on Linux (Debian) without Android support

    Hello,

    When building on a Linux using the command given in README (bazel build -c opt :encoder_main), I get the following errors:

    ERROR: /usr/local/src/lyra/WORKSPACE:121:1: name 'android_sdk_repository' is not defined
    ERROR: /usr/local/src/lyra/WORKSPACE:128:1: name 'android_ndk_repository' is not defined
    

    If I remove the lines related to android_sdk_repository and android_sdk_repository from the WORKSPACE file, I can build the encoder without issues.

    I would like to find the correct way to deal with this and create a PR, but I don't know bazel, so I didn't find a way to ignore Android targets when building for Linux. Hopefully someone can fix it or point me to the right direction :)

    edit: I'm using Debian in Docker, if that's any help:

    FROM debian:bullseye-slim
    
    WORKDIR /usr/local/src
    
    RUN mkdir -p /usr/share/man/man1 \
        && apt-get update \
        && apt-get install -y \
            ninja-build git cmake clang python bazel-bootstrap
    
    RUN git clone https://github.com/llvm/llvm-project.git \
        && cd llvm-project \
        && git checkout 96ef4f307df2 \
        && mkdir build_clang \
        && cd build_clang \
        && cmake -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_PROJECTS="clang" -DCMAKE_BUILD_TYPE=release ../llvm \
        && ninja \
        && $(which ninja) install \
        && cd .. \
        && mkdir build_libcxx \
        && cd build_libcxx \
        && cmake -G Ninja -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi" -DCMAKE_BUILD_TYPE=release ../llvm \
        && ninja \
        && $(which ninja) install \
        && ldconfig
    
    RUN git clone --depth 1 --branch v0.0.1 https://github.com/google/lyra \
        && cd lyra \
        && bazel build -c opt :encoder_main \
        && bazel build -c opt :decoder_main
    
    opened by a-rose 9
  • Build Lyra with Bazel 5.0.0

    Build Lyra with Bazel 5.0.0

    This fixes #76.

    In the ubuntu-latest and macos-latest virtual environments, the default Bazel version is 5.0.0, which requires build-tools 30.0.0 or newer version to work properly. Lyra is currently using build-tools 29.0.3, which will cause Lyra failed to build with the default Bazel in GitHub Actions virtual environments.

    This change will build Lyra with:

    • Bazel 5.0.0
    • build-tools 30.0.3
    • platforms android-30
    • ndk 21.4.7075529 (unchanged)

    Built successfully on hosts:

    • Ubuntu 20.04 (amd64)
      • linux-bin ✅
      • android-bin ✅
      • android-app ✅
    • macOS 11 (Intel)
      • macos-bin ✅
      • android-bin ❌
      • android-app ❌
    • macOS 12.2 (Apple Silicon)
      • macos-bin ✅
      • android-bin ❌
      • android-app ❌

    I also added actions to build on more platforms:

    • android-bin
    • android-app
    • linux-amd64
    • macos-amd64

    GitHub Actions also uploads artifacts for download nightly builds now.

    opened by reekystive 8
  • Lyra + Gstreamer + webRTC - missing specs?

    Lyra + Gstreamer + webRTC - missing specs?

    Hi folks, first of all thanks for shipping version 2! So stoked on this work.

    We're looking to explore integrating lyra into a gstreamer pipeline that is stacked with webRTC. I'm not the webRTC expert on our team, but my contacts are mentioning they had trouble finding the required specs in order to get started on building such an integration.

    They mentioned RTP payloading/depayloading, and perhaps there are other items/specs needed to be understood before embarking on this journey.

    We would greatly appreciate some guidance :)

    documentation 
    opened by rocco-haro 6
  • How to include the encoder + decoder in a simple c program

    How to include the encoder + decoder in a simple c program

    Hello, I am trying to use lyra in a simple c program (hello world) which is built with cmake. As you can see in the screenshot, I can build lyra with bazel (run cmake .. && make && ./main.o from the build dir) but now I am stuck on how I can actually include lyra in the main.c file and how I can link the encoder_main and decoder_main binaies to my main.o file. Can someone help me out? Thanks! screen

    question 
    opened by GatCode 6
  • fail to build lyra_android_example

    fail to build lyra_android_example

    I use the command " bazel build android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK" follow the README.md and got the error below:

    WARNING: /private/var/tmp/_bazel_jiahong/91206591588ab49765e9be8ccee0dd3b/external/com_google_audio_dsp/third_party/fft2d/BUILD:3:11: in linkstatic attribute of cc_library rule @com_google_audio_dsp//third_party/fft2d:fft2d: setting 'linkstatic=1' is recommended if there are no object files INFO: Analyzed target //android_example:lyra_android_example (0 packages loaded, 0 targets configured). INFO: Found 1 target... ERROR: /private/var/tmp/_bazel_jiahong/91206591588ab49765e9be8ccee0dd3b/external/androidsdk/BUILD.bazel:13:25: Middleman _middlemen/external_Sandroidsdk_Saapt2_Ubinary-runfiles failed: missing input file 'external/androidsdk/build-tools/29.0.3/aapt2', owner: '@androidsdk//:build-tools/29.0.3/aapt2' Target //android_example:lyra_android_example failed to build Use --verbose_failures to see the command lines of failed build steps. ERROR: /private/var/tmp/_bazel_jiahong/91206591588ab49765e9be8ccee0dd3b/external/androidsdk/BUILD.bazel:13:25 Middleman _middlemen/external_Sandroidsdk_Saapt2_Ubinary-runfiles failed: 1 input file(s) do not exist INFO: Elapsed time: 5.936s, Critical Path: 5.68s INFO: 14 processes: 6 internal, 8 darwin-sandbox. FAILED: Build did NOT complete successfully


    system:Mac I installed Android Studio and installed the android sdk 29 and ndk 21 for requirement. How do I solve it ? Thanks !

    bug help wanted 
    opened by aijianiula0601 6
  • Fixed an issue where Android example would not compile on Mac.

    Fixed an issue where Android example would not compile on Mac.

    I think "fail to build lyra_android_example" is a better solution to this problem. I changed it like this, 1st. Modify WORKSPACE in:WORKSPACE#L36-L40:

    git_repository(
        name = "com_google_absl",
        remote = "https://github.com/abseil/abseil-cpp.git",
        tag = "20210324.2",
        # Remove after https://github.com/abseil/abseil-cpp/issues/326 is solved.
        patches = [
            "@//third_party:com_google_absl_f863b622fe13612433fdf43f76547d5edda0c93001.diff"
        ],
        patch_args = [
            "-p1",
        ]
    )
    

    2end.

    mkdir third_party
    cd third_party
    touch BUILD
    touch com_google_absl_f863b622fe13612433fdf43f76547d5edda0c93001.diff
    

    The content in BUILD is:

    licenses(["notice"])  # Apache License 2.0
    
    exports_files(["LICENSE"])
    
    package(default_visibility = ["//visibility:public"])
    

    The content in com_google_absl_f863b622fe13612433fdf43f76547d5edda0c93001.diff is:

    diff --git a/absl/time/internal/cctz/BUILD.bazel b/absl/time/internal/cctz/BUILD.bazel
    index 9fceffe..e7f9d01 100644
    --- a/absl/time/internal/cctz/BUILD.bazel
    +++ b/absl/time/internal/cctz/BUILD.bazel
    @@ -69,8 +69,5 @@ cc_library(
             "include/cctz/zone_info_source.h",
         ],
         linkopts = select({
    -        ":osx": [
    -            "-framework Foundation",
    -        ],
             ":ios": [
                 "-framework Foundation",
             ],
    

    (I learned from mediapipe#WORKSPCE#L25-L40... (:

    opened by lilinxiong 5
  • failure in building Lyra

    failure in building Lyra

    I am building Lyra on Ubuntu 18.04.1 for Linux. The bazel version that I have is 4.0.0 (although I have tried with 4.1.0 as well). The gcc/g++ version that I have is 7.5.0.

    The error appears to be that bazel is invoking gcc with -std=c++0x, yet calling in functionalies that are only supported starting C++17.

    Any help would be appreciated.

    In file included from layer_wrapper.h:29:0, from conv1d_layer_wrapper.h:27, from layer_wrappers_lib.h:21, from causal_convolutional_conditioning.h:28, from wavegru_model_impl.h:28, from wavegru_model_impl.cc:15: layer_wrapper_interface.h: At global scope: layer_wrapper_interface.h:82:8: error: 'variant' in namespace 'std' does not name a template type std::variant<FromDisk, FromConstant> from = FromDisk();

    bug 
    opened by shoham5 5
  • Can Lyra be parameterized?

    Can Lyra be parameterized?

    I'm wanting to try to modify Lyra to try encoding at 1200 bps or even sub-1000. I've just started looking at Lyra. Has anyone looked at the difficulty of adding a custom bitrate?

    opened by psommerfeld 0
  • v1.3.2 is much slower than v1.3.1 if it's built into WebAssembly

    v1.3.2 is much slower than v1.3.1 if it's built into WebAssembly

    [NOTE] This is just an FYI issue as I know this project doesn't officially support WebAssembly.

    As I mentioned in https://github.com/google/lyra/issues/49, shiguredo/lyra-wasm maintains a no-patch WebAssembly build of Lyra. Today, I updated the Lyra version to 1.3.2 (https://github.com/shiguredo/lyra-wasm/pull/10). However, it turned out that the encoding and decoding peformance is degraded after the update.

    The following table is a benchmark result from https://shiguredo.github.io/lyra-wasm/lyra-benchmark.html. (elapsed times taken to encode / decode 10 seconds audio data)

    | Browser | Lyra Version | Encode Time | Decode Time | |-------------------|-------------|--------------:|-------------:| | Chrome (m1 mac) | 1.3.1 | 550.230 ms | 804.230 ms | | Chrome (m1 mac) | 1.3.2 | 898.375 ms | 1144.754 ms | | Safari (m1 mac) | 1.3.1 | 596.880 ms | 866.779 ms | | Safari (m1 mac) | 1.3.2 | 905.639 ms | 1168.120 ms | | Firefox (m1 mac) | 1.3.1 | 540.199 ms | 800.540 ms | | Firefox (m1 mac) | 1.3.2 | 609.940 ms | 1064.080 ms | | Chrome (android) | 1.3.1 | 1002.769 ms | 1040.140 ms | | Chrome (android) | 1.3.2 | 1398.920 ms | 1621.900 ms |

    I don't know the reason of this performance drop. Any information that helps alleviate this problem is more than welcomed.

    opened by sile 3
  • Implementing a Python wrapper for Lyra

    Implementing a Python wrapper for Lyra

    Hi!

    I'm trying to create a python binding for your project that would be easily pip-installable. To do so I am also developing a C binding. This is fairly simple, as it mainly translates C++ vectors into C structs. This is required because C++ name mangling makes it quite hard to export .so libraries directly from C++. I think, from what I gathered, that this is also required for most other programming languages to be able to access the library (I found similar methods for JAVA for example). Maybe this would be a contribution useful to other people. It would also allow making Lyra to work without using Bazel.

    On another side, I'm trying to export the most useful functionalities of the library. I already made wrappers for LyraEncoder and LyraDecoder, but I'm not sure what other functionalities would be useful for people to have access to. I'm planning to tackle SoundStream next. What other functionalities do you think would be useful to expose?

    Thanks!

    opened by fedingo 1
  • Feeding FB audio outputs WB

    Feeding FB audio outputs WB

    Running the codec with full-band input (48 khz sr):

    bazel-bin/encoder_main --input_path=path/to/fullband-audio/.wav --output_dir= dir/to/bs_output --bitrate=6000 bazel-bin/decoder_main --encoded_path= dir/to/bs_output/.lyra --output_dir= dir/to/output --bitrate=6000

    Is giving me WB output samplerate 16 khz. Is it about the bitrate? with lower bitrates you lower the bandwidth? if so, what's the bitrate for SWB and FB output? thanks!

    question 
    opened by ChamMoradi 7
  • In what configuration is the Soundstream in Lyra V2 trained?

    In what configuration is the Soundstream in Lyra V2 trained?

    Referring to the original Soundstream article, Soundstream should be trained on 24kHz data. I would like to know what sample rate wavs these models released in lyraV2 (soundstream_encoder.tflite; quantizer.tflite; lyragan.tflite) were trained on. Can these models also support processing 24kHz wavs? Could these models be used on 24kHz wavs to do some interesting experiments similar to another Google work AudioLM.

    I found that the existing models seem to be processing 16kHz wavs. However, I found in 48 line in lyra_encoder.h the supported sample rates are not only 16000, but also 8000, 32000, and 48000. This makes me confused. Different sample rate means that the fixed 320 samples vary in the different time span. I'm not quite sure if this fixed soundstream_encoder can directly handle data of different sample rates. Because given 46 4bit quantizers, the encoded data is not the supported bit rates (9.2kbps) mentioned in the API doc. Actually, I use the three released models to encode, quantize and decode a 16Khz and a 24Khz wav with the same content, the two decode waves sound like the same. Due to the limitation of the num of test examples, I am not sure about the recovery quality. Can anyone explain this? Much thanks.

    question 
    opened by yd11111 9
Releases(v1.3.2)
  • v1.3.2(Dec 20, 2022)

    Lyra 1.3.2 is now available. Updating should be a medium priority for most users. This release is a relatively small change. It upgrades from TensorFlow 2.9 to the latest stable 2.11, which produces a ~10% speed improvement due to more modules being supported in XNNPack. The benchmark on the README is also updated to reflect the current bazel build, but note that previously it measured the internal build speed, which was already based on the faster TF 2.12.

    Notes

    • Added tests to the github CI actions. This is not complete test coverage, but spans the majority of the code.
    • Bazel 6.0.0, which was just released breaks the build. .bazelversion was added to pin it to 5.3.2. The bazelisk interface can be used to automatically select the correct bazel version.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(Dec 5, 2022)

    Lyra 1.3.1 is now available. Updating should be a low priority for most users. This release contains mostly cosmetic changes due to a directory restructuring. However, for those who are considering submitting PRs, it would be helpful to sync to prevent merge conflicts.

    Resolved Issues

    • The directory structure was refactored to remove clutter from the root directory (#90). Note that this will change the target path for most targets.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Nov 10, 2022)

    Lyra 1.3.0 is now available. This release increases the speed and reduces the storage space of the model. We recommend all users upgrade if they do not need to reuse the earlier versions’ bitstream.

    New Features

    • The new model is 43% smaller (TFLite model size) and 20% faster (comparing 1.2.0 to 1.3.0 on Pixel 6 Pro). This is accomplished by storing some weights and performing arithmetic operations in 8-bit integers instead of 32-bit floats. Thanks to quantization aware training, the audio quality for the smaller and faster model remains as good as the previous model – our listening tests show that users have no preference for either this or the previous model. The bitstream, however, is different from the previous model due to the changed weights.

    Breaking Changes

    • The 1.3.0 bitstream is incompatible with 1.2.0.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Sep 30, 2022)

    Lyra V2 (1.2.0) is now available. This release increases the quality and flexibility of the model. We recommend all users upgrade if they do not need to use the V1 bitstream.

    New Features

    • Speed is significantly faster (~5x improvement seen on Android devices).
    • The SoundStream-based model produces significantly higher quality speech (when comparing 3kbps V1 to 3.2 kbps V2).
    • Selectable bitrate (3200, 6000, 9200 bits per second).
    • Codec latency reduced from 100 ms to 20 ms.
    • Mac and Windows support (in addition to continuing support for Linux and Android). Note: we have verified that these build, and run correctly, but have numerous compilation and linker warnings (Windows in particular due to MSVC/gcc mismatch). These issues and support for other platforms like iOS can be addressed by modifying the .bazelrc file. We welcome community contributions for this.
    • More portable code: The TensorFlow Lite model in the .tflite files can be used in other platforms. The TFLite runtime is optimized for individual platforms, replacing the need to write platform specific assembly.

    Breaking Changes

    • The V2 bitstream is incompatible with V1.
    • Bitrate can now be set dynamically on the encoder using the new set_bitrate() API. Likewise, the bitrate parameter of the decoder was dropped, since it will decode each packet correctly, regardless of bitrate.
    • DecodePacketLoss() API was folded into DecodeSamples(), which now switches to Packet Loss Concealment (PLC) when needed.
    • lyra_encoder.h API changes
      • Addition of set_bitrate()
      • Encode() returns std::optional instead of absl::optional
    • lyra_decoder.h API changes
      • Removal of bitrate as an argument to Create()
      • Removal of DecodePacketLoss()
      • DecodeSamples() returns std::optional instead of absl::optional
      • Removal of bitrate()
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jun 28, 2021)

    Lyra version 0.0.2 is now available on GitHub. The main improvement of this version is the open-source release of the sparse_matmul library code, which was co-developed by Google and DeepMind. That means no more pre-compiled “.so” dynamic library binaries and no more restrictions on which toolchain to use, which opens up the door to port Lyra onto different platforms. The full list of features and fixes include:

    • Release sparse_matmul library code and remove pre-compiled dynamic library binaries.
    • Add support for the Bazel default gcc toolchain on linux, and make this the default instead of the clang toolchain.
    • Fix noise bursts at the beginning of output audio files.
    • Abstract out UnitFloatToInt16Scalar, UnitFloatToInt16 and Int16ToUnitFloat functions.
    • Provide operator<< to unique_ptr to be used with CHECK() macros.
    • Fix float distribution compatibility in benchmark_decode_lib.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Apr 6, 2021)

Owner
Google
Google ❤️ Open Source
Google
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 497 Jan 4, 2023
This speech synthesizer is actually the SAM speech synthesizer in an ESP8266

SSSSAM Serial Speech Synthesizer SAM This speech synthesizer is actually the SAM speech synthesizer in an ESP8266. Where SAM was a software applicatio

Jan 12 Oct 4, 2022
Through hole PCB version of the HAGIWO 005 Generative Sequencer Eurorack module.

HAGIWO 005 Eurorack Sequencer PCB and Code Through hole PCB version of the HAGIWO 005 Generative Sequencer Eurorack module. The module is a very simpl

null 11 Sep 28, 2022
LFAC - Low Fidelity Audio Codec

LFAC - Low-Fidelity Audio Codec Copyright 2021 Jari Komppa, http://iki.fi/sol Licensed under Unlicense. Not to be confused with FLAC. What is this? Do

Jari Komppa 15 Sep 18, 2022
Legion Low Level Rendering Interface provides a graphics API agnostic rendering interface with minimal CPU overhead and low level access to verbose GPU operations.

Legion-LLRI Legion-LLRI, or “Legion Low Level Rendering Interface” is a rendering API that aims to provide a graphics API agnostic approach to graphic

Rythe Interactive 25 Dec 6, 2022
CC2500 Low-Cost Low-Power 2.4 GHz RF Transceiver driver for esp-idf

esp-idf-cc2500 CC2500 Low-Cost Low-Power 2.4 GHz RF Transceiver driver for esp-idf. I ported from this. 2.00mm pitch External Antena 1.27mm pitch PCB

null 3 May 29, 2022
Simple Binary Encoding (SBE) - High Performance Message Codec

Simple Binary Encoding (SBE) SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applic

Real Logic 2.8k Dec 28, 2022
Open h.265 video codec implementation.

libde265 - open h.265 codec implementation libde265 is an open source implementation of the h.265 video codec. It is written from scratch and has a pl

struktur AG 1.4k Dec 30, 2022
Open Source H.264 Codec

OpenH264 OpenH264 is a codec library which supports H.264 encoding and decoding. It is suitable for use in real time applications such as WebRTC. See

Cisco Systems 4.8k Jan 1, 2023
hessian2-codec it is a complete C++ implementation of hessian2 spec

hessian2-codec is a C++ library from Alibaba for hessian2 codec. It is a complete C++ implementation of hessian2 spec. Because it was originally intended to implement the Dubbo Filter of Envoy, it did not provide good support for serialization of user-defined types (there is only one way to implement user-defined types using ADL, but it is not very complete and does not support nested types well). At the moment it is simply deserializing content into some C++ intermediate types.

Alibaba 16 Nov 15, 2022
Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++

LZHAM - Lossless Data Compression Codec Public Domain (see LICENSE) LZHAM is a lossless data compression codec written in C/C++ (specifically C++03),

Rich Geldreich 641 Dec 22, 2022
A bespoke sample compression codec for 64k intros

pulsejet A bespoke sample compression codec for 64K intros codec pulsejet lifts a lot of ideas from Opus, and more specifically, its CELT layer, which

logicoma 34 Jul 25, 2022
A free, fast, cross-platform volumetric codec for everyone.

The open source Universal Volumetric (".uvol") compressed interchange format for streaming mesh sequences. This project also includes a cross-platform player implementation using h.264 video for texture.

XR Foundation 85 Dec 28, 2022
Python bindings of silk codec.

Python silk module. --- pysilk --- APIs See test\test.py. import pysilk as m m.silkEncode(buf , 24000) m.silkDecode(buf , 24000) #the first param is b

DCZ_Yewen 16 Oct 11, 2022
Cross-platform silk codec wrap library depends on ploverlake/silk.

libSilkCodec Cross-platform silk codec wrap library depends on ploverlake/silk. Clone & Build Linux/Unix like # clone $ git clone https://github.c

KonataDev 8 Sep 9, 2022
ffmpeg supporting EVC codec and file formats.

ffevc ffmpeg supporting EVC codec and file formats. MPEG-5 Essential Video Coding (EVC) integration with FFmpeg project. It is supported under Linux a

MPEG-5 28 Nov 23, 2022
Nuvoton codec/amp driver with different ASoC version

Nuvoton-ASoC Nuvoton codec and amp drivers with different ASoC versions. The Nuvoton-ASoC repository stores the Linux driver for codec and amplifier m

Nuvoton Technology Corp. 1 Aug 31, 2022
Basis Universal GPU Texture Codec

basis_universal Basis Universal Supercompressed GPU Texture Codec Basis Universal is a "supercompressed" GPU texture data interchange system that supp

null 2.3k Dec 28, 2022
Facebook AI Research's Automatic Speech Recognition Toolkit

wav2letter++ Important Note: wav2letter has been moved and consolidated into Flashlight in the ASR application. Future wav2letter development will occ

Facebook Research 6.2k Jan 3, 2023
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.7k Jan 2, 2023