TensorVox is an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop

Overview

TensorVox

TensorVox is an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop, aimed at increasing accessibility to such technology.

Powered by TensorflowTTS, it is written in pure C++/Qt, using the Tensorflow C API for interacting with the models. This way, we can perform inference without having to install gigabytes worth of pip libraries, just a 100MB DLL.

Interface with Tac2 model loaded

Try it out

Grab it from the releases and check the Google Drive folder for models and installation instructions

TODO: Add instructions for training and exporting models

Supported architectures

Currently, only FastSpeech2, Tacotron2 (phoneme-based) and Multi-Band MelGAN from TensorflowTTS are supported.

Build instructions

Currently, only Windows x64 is supported.

Requirements:

  1. Qt Creator
  2. MSVC 2017 (v141) compiler

Primed build (with all provided libraries):

  1. Download precompiled binary dependencies and includes
  2. Unzip it so that the deps folder is in the same place as the .pro and main source files.
  3. Open the project with Qt Creator, add your compiler and compile

Note that to try your shiny new executable you'll need to download the program as described above and insert the models folder where your new build is output.

TODO: Add instructions for compile from scratch.

Externals (and thanks)

Contact

You can open an issue here or join the Discord server and discuss/ask anything there

Note about licensing

This project is MIT licensed almost everywhere except for Vietnam, where, due to using TensorflowTTS models as backend, it cannot be used without permission from the TensorflowTTS authors. See here for details

Comments
  • Batch Inference on C++

    Batch Inference on C++

    Hi @ZDisket , Thanks for your great work. For fastspeech2 and tacotron , it's easy batch inference on Python version. How to batch inference on C++ version.

    opened by zhangsanfeng86 4
  • using the models in python

    using the models in python

    Hello, thank you for your code. I tried to test the models with Python, but the generated wav file is wrong. Can you help me to check my code? thank you!

    import numpy as np
    import soundfile as sf
    import yaml
    import time
    import tensorflow as tf
    import matplotlib.pyplot as plt
    from tensorflow_tts.inference import AutoProcessor
    
    processor = AutoProcessor.from_pretrained(pretrained_path="./test/files/ljspeech_mapper.json")
    input_text = "There’s a way to measure the acute emotional intelligence that has never gone out of style."
    input_ids = processor.text_to_sequence(input_text)
    
    fastspeech2 = tf.saved_model.load(r"temp\Win64DemoWithModel\LJ\melgen")
    mb_melgan = tf.saved_model.load(r"temp\Win64DemoWithModel\LJ\vocoder")
    print("fs2")
    
    start = time.time()
    mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
        input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
        speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
        speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
        f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
        energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32)
    )
    print(mel_before.shape) # (1, 345, 80)
    print("fs:", time.time() - start)
    
    audios = mb_melgan.inference(mel_before)
    audio_after = mb_melgan.inference(mel_after)
    print("fs:", time.time() - start)
    print(audios.shape)
    sf.write('./mel_before2.wav', audios[0, :, 0], 22050, "PCM_16")
    sf.write('./mel_after2.wav', audio_after[0, :, 0], 22050, "PCM_16")
    plt.plot(audios[0, :, 0])
    
    plt.show()
    
    opened by copperdong 3
  • JSON parse error

    JSON parse error

    [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal that's it, there's nothing else plz fix plz :(

    opened by YTR76 2
  • G2P model fails and crashes with words exceeding 23 characters.

    G2P model fails and crashes with words exceeding 23 characters.

    Hi!

    I'm trying to re-implement your G2P model in Android and Java in TensorFlow Lite. While doing this I noticed that if I gave inputs greater than 23 characters long, the TensorFlow interpreter failed with the error:

    RuntimeError: Fill dimensions must be >= 0Node number 3 (FILL) failed to invoke.

    For example, this input does not crash the application: abcdefghijklmnopqrstuvw (23 chars), but this input does: abcdefghijklmnopqrstuvwx (24 chars).

    I tried this out inside this TensorVox application, and could reproduce the same crash within your program too, so I do not think it is a mistake I have made. See this video:

    https://user-images.githubusercontent.com/7999692/107827010-d5308b00-6d7d-11eb-87de-4fc27f17b41d.mp4

    Of course, no real words are 24 characters long (some locations are though!), but we should probably both sanitise our inputs to avoid this, or maybe this is unintended behaviour from your model? :)

    opened by OscarVanL 2
  • Add TorchMoji-enabled VITS

    Add TorchMoji-enabled VITS

    Using TorchMoji hidden states as embedding during training allows for rich emotion control. Time to support this model in my program.

    • [x] Add TorchMoji model loader
    • [x] Add word-level tokenizer
    • [x] Make VITS compatible
    opened by ZDisket 0
  • Add VITS (PyTorch)

    Add VITS (PyTorch)

    I managed to export jaywalnut310's VITS with TorchScript

    • [x] Write VITS class with speed control
    • [x] Integrate single-speaker VITS into the program
    • [ ] Add support for Coqui VITS
    • [x] Add VITS export notebook
    enhancement 
    opened by ZDisket 0
  • Add a Gitter chat badge to README.md

    Add a Gitter chat badge to README.md

    ZDisket/TensorVox now has a Chat Room on Gitter

    @ZDisket has just created a chat room. You can visit it here: https://gitter.im/TensorVox/community.

    This pull-request adds this badge to your README.md:

    Gitter

    If my aim is a little off, please let me know.

    Happy chatting.

    PS: Click here if you would prefer not to receive automatic pull-requests from Gitter in future.

    opened by gitter-badger 0
  • Add support for Coqui-TTS models

    Add support for Coqui-TTS models

    This program supports TensorFlowTTS and all of that is great, but Coqui has been making a lot of noise recently. It would help increase notoriety and adoption

    • [x] Tacotron 2 and MB-MelGAN support
    • [x] IPA phonemizers
    • [x] Documentation and export instructions
    • [x] Pre-exported model(s) in English (and German?)
    • [x] Adapt README
    enhancement 
    opened by ZDisket 0
  • Update to CppFlow 2

    Update to CppFlow 2

    With CppFlow 2, which uses the Tensorflow 2 C API, we can extract multiple outputs from the models, allowing us to do things like taking Tacotron 2 alignment and plotting it (included in this PR)

    opened by ZDisket 0
  • Remove Phonetisaurus, add Tensorflow-based g2p.

    Remove Phonetisaurus, add Tensorflow-based g2p.

    This will remove Phonetisaurus and OpenFST as dependencies, which were the most problematic ones. However, since my current design isn't reliable for known words, it's a combined dictionary lookup + g2p when not found instead of pure model.

    • [x] Add model interaction code
    • [x] Add dictionary lookup
    • [x] Remove Phonetisaurus, OpenFST
    • [x] Add model training code
    • [x] Add documentation for model training
    enhancement 
    opened by ZDisket 0
Releases(V1.0.0.0)
  • V1.0.0.0(Oct 16, 2022)

    Big release

    • Added VITS PyTorch support.
    • Language Standard V1, reflected in new model config JSONs, now allows for flexible addition of languages without having to change code (more relevant for developers than end users). Program can still load old models as it performs conversion.

    For a VITS model, Tupac VITS has been released on the Google Drive folder. More coming.

    Source code(tar.gz)
    Source code(zip)
    TensorVox.zip(170.64 MB)
  • V0.9.9.0(Jun 9, 2022)

  • V0.8.9.0(Aug 14, 2021)

  • V0.8.8.5(Aug 9, 2021)

    Bugfix and new feature release:

    • Fixed char language values invalid for non-English langs:
    • also releasing German model and "g2p" (just German character listings)
    • Batch denoiser can now output specific sampling rate

    As always, the release comes without any models. You'll have to grab some off the Google Drive folder, including the new Thorsten one, or train and export your own.

    Source code(tar.gz)
    Source code(zip)
    TensorVox.zip(90.82 MB)
  • V0.8.8.1A(Jun 22, 2021)

  • V0.8.8.0(Apr 26, 2021)

    Second release. Main improvements:

    • Tacotron 2 support
    • Full Spanish (voice, not interface) support
    • Spectrogram, attention (TT2 only), waveform views
    • Many bugfixes

    Note that as before, this distribution is without any voice models (but does include the G2P models), so you'll have to grab some from the Google Drive folder or train and export it yourself.

    Source code(tar.gz)
    Source code(zip)
    TensorVox.zip(90.81 MB)
  • v0.8.2.0(Jan 22, 2021)

Owner
C++ & C# desktop application developer; Python neural voice synthesis
null
ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021 Dataset Code Demos Authors: He Zhang, Yuting Ye, Tak

HE ZHANG 193 Nov 12, 2022
International Business Machines 9 Jul 21, 2022
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Tencent 1.2k Nov 23, 2022
Lite.AI 🚀🚀🌟 is a user-friendly C++ lib for awesome🔥🔥🔥 AI models based on onnxruntime, ncnn or mnn. YOLOX, YoloV5, YoloV4, DeepLabV3, ArcFace, CosFace, Colorization, SSD

Lite.AI ?????? is a user-friendly C++ lib for awesome?????? AI models based on onnxruntime, ncnn or mnn. YOLOX??, YoloV5??, YoloV4??, DeepLabV3??, ArcFace??, CosFace??, Colorization??, SSD??, etc.

Def++ 2.3k Nov 23, 2022
Lite.AI 🚀🚀🌟 is a user friendly C++ lib of 60+ awesome AI models. YOLOX🔥, YoloV5🔥, YoloV4🔥, DeepLabV3🔥, ArcFace🔥, CosFace🔥, RetinaFace🔥, SSD🔥, etc.

Lite.AI ?? ?? ?? Introduction. Lite.AI ?? ?? ?? is a simple and user-friendly C++ library of awesome ?? ?? ?? AI models. It's a collection of personal

Def++ 2.3k Nov 28, 2022
A lightweight C library for artificial neural networks

Getting Started # acquire source code and compile git clone https://github.com/attractivechaos/kann cd kann; make # learn unsigned addition (30000 sam

Attractive Chaos 615 Nov 21, 2022
Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

null 31 Nov 16, 2022
WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,

WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models, to reduce the effort of productionizing E2E models, and to explore better E2E models for production.

null 2.6k Nov 23, 2022
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.6k Nov 29, 2022
Very portable voice recorder with speech recognition.

DictoFun Small wearable voice recorder. NRF52832-based. Concept Device was initiated after my frustration while using voice recorder for storing ideas

Roman 5 Feb 3, 2022
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.6k Dec 2, 2022
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

null 60.7k Dec 3, 2022
neural net with blackjack and hookers

SkyNet is a light deep learning library. Linux/Windows License ResNet cpp-example for Win Compare with Tensorflow, inference ResNet50. PC: i5-2400, GF

Alexander Medvedev 62 Nov 26, 2022
Raspberry Pi guitar pedal using neural networks to emulate real amps and pedals.

NeuralPi NeuralPi is a guitar pedal using neural networks to emulate real amps and pedals on a Raspberry Pi 4. The NeuralPi software is a VST3 plugin

Keith Bloemer 851 Nov 15, 2022
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

ENCRYPTO 14 Nov 13, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Daniel Frenzel 92 Sep 27, 2022
simple neural network library in ANSI C

Genann Genann is a minimal, well-tested library for training and using feedforward artificial neural networks (ANN) in C. Its primary focus is on bein

Lewis Van Winkle 1.3k Nov 28, 2022
oneAPI Deep Neural Network Library (oneDNN)

oneAPI Deep Neural Network Library (oneDNN) This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-

oneAPI-SRC 3k Nov 25, 2022