Snowboy reimplementation

Related tags

Utilities snowman
Overview

Snowman Hotword Detection

Snowman Hotword Detection is an open source rewrite of the popular Snowboy library originally developed by Kitt.AI. It was created in the hope of preserving support for and improving it, as well as allowing it to be used on modern devices (embedded and desktop).

Disclamer

While I did my best, reversing software tends to be more art than science, so it is very likely that I introduced some bugs while doing so. The fact that I have very little experience with audio processing and neural networks does not really help either. If you have either of those and could spare some time proofreading what I did, that would be highly appreciated. I won't provide any warranty for it, but I did my best to make sure this library won't light your PC on fire.

Feature support

In the default build configuration, it should be a drop-in replacement for the original snowboy library. However, it does not implement everything the original library did. The most important differences are the following:

  • Missing frontend processing: I do not implement any automatic gain control or noise suppression (both were part of the library if you enabled "ApplyFrontend"), so make sure you have a good audio source until it is implemented. Voice Activity Detection (VAD) does work, however.

  • Missing support for some hotword search algorithms: There are multiple hotword search algorithms used by universal models. I have only implemented "Naive" so far and added asserts to those that are completely unused and redirected used ones to the Naive method, which seems to work fine. However, we should probably implement all of them at some point.

  • Split Radix FFT: There were two supported FFT modes, normal FFT and split radix FFT. So far I have only implemented normal FFT and hardcoded SRFFT models to use normal FFT. From my understanding, the result should be identical, but split radix FFT might have better performance.

  • PipelineVAD: While reversed, it is totally untested. That said, most of the code is identical with PipelineDetect and thus somewhat tested, so I don't expect any major bugs in it.

  • Wave reading, PipelineNNETForward: While present in the executable, they where never exposed with headers so no user code should rely on them. I might implement them at some point, though.

Universal models

Existing universal models should work out of the box and perform similarly to the original library. Since they are designed to work with "ApplyFrontend" disabled, the missing AGC/NoiseSuppression should not have an effect.

New universal models should be doable in theory. However, I don't know enough about neural networks to do so. If you do, please reach out to me. Another issue is the lack of a way to gather samples. In the future I might build a website similar to the original kitt.ai website where people can train their personal models using a nice UI, as well as an option to share audio samples for building universal models, but this is still in the far future.

Personal models

Training personal models is now possible using the enroll utility build along the library. While the resulting model is not bit identical with models trained using the original library, it is identical to 5 digits of precision. The remaining differences are most likely a result of rounding errors within the process and should not affect the performance of the model.

Usage

As before the main interface is snowboy-detect.h which includes the well known snowboy::SnowboyDetect, snowboy::SnowboyVad, snowboy::SnowboyPersonalEnroll and snowboy::SnowboyTemplateCut classes. Those classes provide a very high level interface to snowboy that should be sufficient for most applications. There is also a file snowboy-detect-c.h file which provides a C wrapper for the beforementioned classes and should make integration into other languages a lot easier.

Contributing

Any help would be highly appreciated. I am particularly looking for people with knowledge of machine learning, audio processing or reverse engineering. However, all help is welcome. Simply take a look at the open issues and pull requests, as well as the TODO.md file.

License

The original project was licensed under the Apache 2 license, so this one is as well. Just please don't use it as a voice interface for Skynet.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0 or here

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You might also like...
Comments
  • WebAssembly build (32-bit)

    WebAssembly build (32-bit)

    Good morning! First of all thank you very much for the efforts you are putting in reverse engineering and open sourcing the original snowboy library, those are impressive skills!

    I have wanted a snowboy WASM build running in the browser for a while and now with the source code it seems it might be possible. I have made an attempt you can find in my fork of the project but although it builds and runs detection on the audio data sent to it, it always returns silence (-2).

    I put up a demo you can try out here (it unfortunately only works with Chrome as Firefox does not support retrieving user audio at 16000 Hz and resampling in the browser would add another step where things might go wrong, I would like to focus on getting it to work now)

    In order to make it build I had to comment out several self_assert statements that assume a 64-bit architecture and I found an issue stating that only 32bit ARM is supported.

    I assume thus that the WebAssembly build might not work due to some math assuming 64 bits? I am a bit "überfragt" :) so I was hoping you could shine some light on the issue or changes needed to make it work.

    Thanks!

    enhancement more-info-needed 
    opened by ccoreilly 9
  • Updated documentation formatting, fixed spelling errors etc.

    Updated documentation formatting, fixed spelling errors etc.

    Not much in the way of a contribution, but I improved documentation formatting, fixed some spelling mistakes/typos and added a license file. Thank you for your great work on this!

    opened by sveinbjornt 5
  • Samples devided but never multiplied in gain-control-stream.cpp

    Samples devided but never multiplied in gain-control-stream.cpp

    Hi! First of all, thanks for making snowman! It's really great to see snowboy living on thanks to the open source community :)

    I've noticed a small issue: Calling detector->SetAudioGain(), with any other value than 1.0f seems to be causing the detection to break.

    Issue Source

    Browsing the code, I think I've found the issue: In gain-control-stream.cpp:39 we devide the audio samples by m_maxAudioAmplitude (set to 32767.0 in line 29), in order to normalize the values for applying the gain modifier.

    I've noticed, that we never multiply by m_maxAudioAmplitude afterwards, resulting in the samples becoming very quiet. Thus the detection stops working.

    This explains that the detection stops working when setting a gain different from 1.0f since the arithmetics are only applied when a different value than 1.0 is passed.

    Solution

    A simple fix is to add a multiplication in line 47:

    - ptr[i] = v;
    + ptr[i] = v * m_maxAudioAmplitude;
    
    opened by leso-kn 2
  • Webassembly build and examples

    Webassembly build and examples

    Thanks to the good initial work of @ccoreilly there is now a working webassembly example, as well as support for building the library using webassembly.

    #8

    opened by Thalhammer 1
Clean-room reimplementation of Half-Life: Deathmatch and Half-Life (Experimental) in QuakeC.

FreeHL Clean-room reimplementation of Half-Life: Deathmatch and Half-Life (Experimental). Similar to FreeCS, this aims to recreate the feeling of the

null 77 Sep 21, 2022
kernalemu is a C reimplementation of the Commodore KERNAL API, combined with a 6502 emulator.

kernalemu - Commodore KERNAL emulator kernalemu is a C reimplementation of the Commodore KERNAL API, combined with a 6502 emulator. It allows runnung

Michael Steil 25 May 4, 2022
An open source standard C library that includes useful functions && (Reimplementation of libc functions + own functions).

?? LIBFT-42 : Artistic view of LIBC: ?? HOW DOES IT FEEL HAVING YOUR OWN LIB: SUBJECT : ENGLISH PDF ℹ️ What is LIBFT : This project aims to code a C l

Abdessamad Laamimi 11 Sep 21, 2022
Open-source flog (Nintendo Switch's Golf NES easter egg) reimplementation

This repository is dedicated to flog, an easter egg present on Nintendo Switch consoles, containing various tools and information, which are the result of progressive reverse-engineering of the easter egg.

XorTroll 9 May 16, 2022
Reimplementation of some of the Standard C Library functions.

42-libft Reimplementation of some of the Standard C Library functions. This repository contains some of the standard library C functions. List of avai

Lavrenova Maria 5 Jan 26, 2022
mrcceppc is a reimplementation project for the Metrowerks mwcceppc compiler.

Compiler | mrcceppc mrcceppc is a reimplementation project for the Metrowerks mwcceppc compiler. Compiling Run generate_{version}.bat for which versio

null 8 Jul 27, 2022
Free in the Dark, a Alone in the Dark engine reimplementation.

FITD Free in the Dark, a Alone in the Dark engine reimplementation. Long overdue source code update from the version that was released on source forge

null 39 Aug 16, 2022
Humongous Yaga engine reimplementation

linyaga Wrapper for Humongous Entertainment games based on the Yaga engine. Pajama Sam: Life is Rough When You Lose Your Stuff Putt-Putt: Pep's Birthd

Gregory 9 Aug 31, 2022
A reimplementation of the OnionShare library and cli, written in C.

libonionshare A reimplementation of the OnionShare library and cli, written in C. The goal of this is to eventually port OnionShare to a flutter app u

null 4 Dec 30, 2021
FAudio - Accuracy-focused XAudio reimplementation for open platforms

FAudio - Accuracy-focused XAudio reimplementation for open platforms

FNA Development Team 416 Sep 13, 2022