OpenSpeaker is a completely independent and open source speaker recognition project.

Overview

OpenSpeaker

OpenSpeaker is a completely independent and open source speaker recognition project. It provides the entire process of speaker recognition including data preparation, model training, multi-platform deployment and model optimization.

Build

  • Clone the repo
git clone https://github.com/zycv/OpenSpeaker.git
  • Build
cd OpenSpeaker && mkdir build && cd build && cmake .. && cmake --build . -j8
  • Run
GLOG_logtostderr=1 ./voiceprint_main 
  • Output

Then you will see output like this:

I1010 00:04:44.024065 5598 voiceprint_main.cc:107] Enroll wav embedding:-0.000527721 -42.1776 0.383387 -4.88499 -35.6465 22.2982 28.4302 0.335538 -7.77056 21.861 0.852521 38.6987 -33.6933 8.14212 2.1728 5.32039 -31.5929 26.2519 1.12099 17.0451 -34.2525 28.0443 2.62836 13.5042 24.5696 8.05435 1.86737 14.1633 15.315 15.0323 -0.243613 22.7958 23.3888 -3.52539 -4.50719 2.26 22.6081 16.9342 -3.73238 -2.30486 34.241 -4.34527 -8.53935 13.4037 -5.2506 31.0014 0.477698 19.1969 -0.0354049 8.56949 -0.00334034 -0.557936 -18.4449 11.7907 -28.2117 9.47196 -17.3517 36.2643 -1.56259 21.1091 -32.0706 -31.3819 1.85436 -0.00354689 4.80609 21.506 -5.79249 0.25165 -55.7074 -4.32717e-05 18.4236 -16.799 -30.733 1.3678 8.01844 -25.2722 20.525 -1.54043 11.9003 -2.97013 4.9329 5.92645 27.0518 -2.54181 -0.00735733 4.37697 -0.149234 15.4831 15.8355 0.597157 -12.4455 -23.3576 12.2849 -0.0110179 -18.9306 -0.0170257 -4.67177e-06 -22.8124 14.2268 -34.0223 ...

I1010 00:04:44.024065 5598 voiceprint_main.cc:107] Test wav embedding:-0.000527721 -42.1776 0.383387 -4.88499 -35.6465 22.2982 28.4302 0.335538 -7.77056 21.861 0.852521 38.6987 -33.6933 8.14212 2.1728 5.32039 -31.5929 26.2519 1.12099 17.0451 -34.2525 28.0443 2.62836 13.5042 24.5696 8.05435 1.86737 14.1633 15.315 15.0323 -0.243613 22.7958 23.3888 -3.52539 -4.50719 2.26 22.6081 16.9342 -3.73238 -2.30486 34.241 -4.34527 -8.53935 13.4037 -5.2506 31.0014 0.477698 19.1969 -0.0354049 8.56949 -0.00334034 -0.557936 -18.4449 11.7907 -28.2117 9.47196 -17.3517 36.2643 -1.56259 21.1091 -32.0706 -31.3819 1.85436 -0.00354689 4.80609 21.506 -5.79249 0.25165 -55.7074 -4.32717e-05 18.4236 -16.799 -30.733 1.3678 8.01844 -25.2722 20.525 -1.54043 11.9003 -2.97013 4.9329 5.92645 27.0518 -2.54181 -0.00735733 4.37697 -0.149234 15.4831 15.8355 0.597157 -12.4455 -23.3576 12.2849 -0.0110179 -18.9306 -0.0170257 -4.67177e-06 -22.8124 14.2268 -34.0223 ...

I1010 00:04:44.358219 5598 voiceprint_main.cc:118] Cosine similarity: 0.975539

The Cosine similarity in the last line indicates the similarity of the current two speakers.

  • Optional

If you want to test other audio or run with another model, you can run:

GLOG_logtostderr=1 ./voiceprint_main --help

Then you will see the help information as follows:

-enroll_wav (First wav as enroll wav.) type: string
    default: "../test_data/BAC009S0749W0480.wav"

-feats_dims (Dims for input features.) type: uint32 default: 24

-model (Path to voiceprint model.) type: string default: "../model/tdnn.pt"

-sample_rate (Wav sample rate supported.) type: uint32 default: 16000

-test_wav (Second wav as test wav.) type: string
    default: "../test_data/BAC009S0749W0489.wav"
Issues
  • 大小写是否需要改一下

    大小写是否需要改一下

    OpenSpeaker/voiceprint/voiceprint_main.cc line 89: OpenSpeaker::Features features(sample_rate, feats_dims);

    头文件里的定义是 namespace openspeaker {

    大小写不匹配,编译的时候报错了,改成统一的就可以了。

    opened by larryKra 1
  • Making Licence explicit.

    Making Licence explicit.

    From your cpp files I can see you've applied a header to licence under Apache licence. Unfortunately Github doesn't recognise the licence until you include it as a LICENCE(.md/txt) file. So this is just a PR with the Apache licence as a file to inform Github and things that pull from Github. :)

    Licence taken from the Github "Automatic licence chooser".

    opened by mo-g 1
  • 编译时gflags报错:Some (but not all) targets in this export set were already defined.

    编译时gflags报错:Some (but not all) targets in this export set were already defined.

    CMake Error at /usr/local/lib/cmake/gflags/gflags-nonamespace-targets.cmake:37 (message): Some (but not all) targets in this export set were already defined.

    Targets Defined: gflags_nothreads_static

    Targets not yet defined: gflags_static

    opened by summerandautum 0
  • padding ?

    padding ?

    Dear author: I try to compare your fbank and torchaudio fbank. For an input with shape [1, 16000], the output of yours is [1, 98, 80], however, the torchaudio get [101, 80]. I guess some of padding is different between yours and torchaudio. Could you give some tips on how to align these two implementations? thank you very much.

    opened by dragen1860 1
Releases(v0.1)
Owner
ZY
Software and Algorithm Engineer
ZY
UVIC ECE 499 Real-Time Gesture Recognition Project

Welcome to GitHub Pages You can use the editor on GitHub to maintain and preview the content for your website in Markdown files. Whenever you commit t

Lyndon Bauto 3 Sep 21, 2019
WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,

WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models, to reduce the effort of productionizing E2E models, and to explore better E2E models for production.

null 2.3k Aug 9, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

null 3.1k Aug 9, 2022
Number recognition with MNIST on Raspberry Pi Pico + TensorFlow Lite for Microcontrollers

About Number recognition with MNIST on Raspberry Pi Pico + TensorFlow Lite for Microcontrollers Device Raspberry Pi Pico LCDディスプレイ 2.8"240x320 SPI TFT

iwatake 48 Jul 28, 2022
ICRA 2021 - Robust Place Recognition using an Imaging Lidar

Robust Place Recognition using an Imaging Lidar A place recognition package using high-resolution imaging lidar. For best performance, a lidar equippe

Tixiao Shan 275 Jul 28, 2022
In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

Pico-Motion-Recognition This Repository has the code used on the 2 parts tutorial TinyML - Motion Recognition Using Raspberry Pi Pico The first part i

Marcelo Rovai 16 Jun 18, 2022
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

null 33 Jun 27, 2021
A simple facial recognition script using OpenCV's FaceRecognizer module implemented in C++

Local Binary Patterns Histogram Recognizer A proyect that implements the LBPHRecognizer class of the OpenCV library to determine if a detected face co

Pablo Agustín Ortega-Kral 0 Jan 18, 2022
Very portable voice recorder with speech recognition.

DictoFun Small wearable voice recorder. NRF52832-based. Concept Device was initiated after my frustration while using voice recorder for storing ideas

Roman 5 Feb 3, 2022
Cinder is a community-developed, free and open source library for professional-quality creative coding in C++.

Cinder 0.9.3dev: libcinder.org Cinder is a peer-reviewed, free, open source C++ library for creative coding. Please note that Cinder depends on a few

Cinder 4.9k Aug 10, 2022
Insight Toolkit (ITK) is an open-source, cross-platform toolkit for N-dimensional scientific image processing, segmentation, and registration

ITK: The Insight Toolkit C++ Python Linux macOS Windows Linux (Code coverage) Links Homepage Download Discussion Software Guide Help Examples Issue tr

Insight Software Consortium 1.1k Aug 8, 2022
An open source iOS framework for GPU-based image and video processing

GPUImage Brad Larson http://www.sunsetlakesoftware.com @bradlarson [email protected] Overview The GPUImage framework is a BSD-licensed iO

Brad Larson 20k Aug 1, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 166.8k Aug 3, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.2k Aug 6, 2022
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.4k Aug 8, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
An open source python library for automated feature engineering

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to

alteryx 6.3k Aug 2, 2022
Open source modules to interface Metavision Intelligence Suite with event-based vision hardware equipment

Metavision: installation from source This page describes how to compile and install the OpenEB codebase. For more information, refer to our online doc

PROPHESEE 77 Aug 4, 2022