MITIE: library and tools for information extraction

Overview

MITIE: MIT Information Extraction

This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.

MITIE is built on top of dlib, a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English, Spanish, and German trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.

Outside projects have created API bindings for OCaml, .NET, .NET Core, and Ruby. There is also an interactive tool for labeling data and training MITIE.

Using MITIE

MITIE's primary API is a C API which is documented in the mitie.h header file. Beyond this, there are many example programs showing how to use MITIE from C, C++, Java, R, or Python 2.7.

Initial Setup

Before you can run the provided examples you will need to download the trained model files which you can do by running:

make MITIE-models

or by simply downloading the MITIE-models-v0.2.tar.bz2 file and extracting it in your MITIE folder. Note that the Spanish and German models are supplied in separate downloads. So if you want to use the Spanish NER model then download MITIE-models-v0.2-Spanish.zip and extract it into your MITIE folder. Similarly for the German model: MITIE-models-v0.2-German.tar.bz2

Using MITIE from the command line

MITIE comes with a basic streaming NER tool. So you can tell MITIE to process each line of a text file independently and output marked up text with the command:

cat sample_text.txt | ./ner_stream MITIE-models/english/ner_model.dat  

The ner_stream executable can be compiled by running make in the top level MITIE folder or by navigating to the tools/ner_stream folder and running make or using CMake to build it which can be done with the following commands:

cd tools/ner_stream
mkdir build
cd build
cmake ..
cmake --build . --config Release

Compiling MITIE as a shared library

On a UNIX like system, this can be accomplished by running make in the top level MITIE folder or by running:

cd mitielib
make

This produces shared and static library files in the mitielib folder. Or you can use CMake to compile a shared library by typing:

cd mitielib
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install

Either of these methods will create a MITIE shared library in the mitielib folder.

Compiling MITIE using OpenBLAS

If you compile MITIE using cmake then it will automatically find and use any optimized BLAS libraries on your machine. However, if you compile using regular make then you have to manually locate your BLAS libaries or DLIB will default to its built in, but slower, BLAS implementation. Therefore, to use OpenBLAS when compiling without cmake, locate libopenblas.a and libgfortran.a, then run make as follows:

cd mitielib 
make BLAS_PATH=/path/to/openblas.a LIBGFORTRAN_PATH=/path/to/libfortran.a

Note that if your BLAS libraries are not in standard locations cmake will fail to find them. However, you can tell it what folder to look in by replacing cmake .. with a statement such as:

cmake -DCMAKE_LIBRARY_PATH=/home/me/place/i/put/blas/lib ..

Using MITIE from a Python 2.7 program

Once you have built the MITIE shared library, you can go to the examples/python folder and simply run any of the Python scripts. Each script is a tutorial explaining some aspect of MITIE: named entity recognition and relation extraction, training a custom NER tool, or training a custom relation extractor.

You can also install mitie direcly from github with this command: pip install git+https://github.com/mit-nlp/MITIE.git.

Using MITIE from R

MITIE can be installed as an R package. See the README for more details.

Using MITIE from a C program

There are example C programs in the examples/C folder. To compile of them you simply go into those folders and run make. Or use CMake like so:

cd examples/C/ner
mkdir build
cd build
cmake ..
cmake --build . --config Release

Using MITIE from a C++ program

There are example C++ programs in the examples/cpp folder. To compile any of them you simply go into those folders and run make. Or use CMake like so:

cd examples/cpp/ner
mkdir build
cd build
cmake ..
cmake --build . --config Release

Using MITIE from a Java program

There is an example Java program in the examples/java folder. Before you can run it you must compile MITIE's java interface which you can do like so:

cd mitielib/java
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install

That will place a javamitie shared library and jar file into the mitielib folder. Once you have those two files you can run the example program in examples/java by running run_ner.bat if you are on Windows or run_ner.sh if you are on a POSIX system like Linux or OS X.

Also note that you must have Swig 1.3.40 or newer, CMake 2.8.4 or newer, and the Java JDK installed to compile the MITIE interface. Finally, note that if you are using 64bit Java on Windows then you will need to use a command like:

cmake -G "Visual Studio 10 Win64" ..

instead of cmake .. so that Visual Studio knows to make a 64bit library.

Running MITIE's unit tests

You can run a simple regression test to validate your build. Do this by running the following command from the top level MITIE folder:

make test

make test builds both the example programs and downloads required example models. If you require a non-standard C++ compiler, change CC in examples/C/makefile and in tools/ner_stream/makefile.

Precompiled Python 2.7 binaries

We have built Python 2.7 binaries packaged with sample models for 64bit Linux and Windows (both 32 and 64 bit version of Python). You can download the precompiled package here: Precompiled MITIE 0.2

Precompiled Java 64bit binaries

We have built Java binaries for the 64bit JVM which work on Linux and Windows. You can download the precompiled package here: Precompiled Java MITIE 0.3. In the file is an examples/java folder. You can run the example by executing the provided .bat or .sh file.

Citing MITIE

There isn't any paper specifically about MITIE. However, since MITIE is basically just a thin wrapper around dlib please cite dlib's JMLR paper if you use MITIE in your research:

Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009

@Article{dlib09,
  author = {Davis E. King},
  title = {Dlib-ml: A Machine Learning Toolkit},
  journal = {Journal of Machine Learning Research},
  year = {2009},
  volume = {10},
  pages = {1755-1758},
}

License

MITIE is licensed under the Boost Software License - Version 1.0 - August 17th, 2003.

Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

References

[1] Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009.

[2] Paramveer Dhillon, Dean Foster and Lyle Ungar, Eigenwords: Spectral Word Embeddings, Journal of Machine Learning Research (JMLR), 16, 2015.

[3] T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning, 77(1):27-59, 2009.

Comments
  • MITIE install fails on Windows 10

    MITIE install fails on Windows 10

    Hi,

    I'm trying to install MITIE backend on windows 10 using pip install git+https://github.com/mit-nlp/MITIE.git. However the install fails at the build step with the following error message. `Collecting git+https://github.com/mit-nlp/MITIE.git Cloning https://github.com/mit-nlp/MITIE.git to c:\users\njones\appdata\local\temp\pip-97mlgx-build Installing collected packages: mitie Running setup.py install for mitie: started Running setup.py install for mitie: finished with status 'error' Complete output from command c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\njones\appdata\local\temp\pip-97mlgx-build\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record c:\users\njones\appdata\local\temp\pip-ydwrn7-record\install-record.txt --single-version-externally-managed --compile: running install running build error: [Error 2] The system cannot find the file specified


    Command "c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\njones\appdata\local\temp\pip-97mlgx-build\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record c:\users\njones\appdata\local\temp\pip-ydwrn7-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\njones\appdata\local\temp\pip-97mlgx-build`

    Any suggestions on I could be doing wrong would be really helpful!

    Thanks

    opened by maxiz 40
  • Training ner on a a new corpus

    Training ner on a a new corpus

    Is there any memory leak? its taking a lot of memory for a very few training samples . Its getting killed after printing this

    num feats in chunker model: 4095 train: precision, recall, f1-score: 0.984615 0.984615 0.984615 now do training num training samples: 198

    I observed the memory usage and saw that it kept on increasing gradually once it reaches here, as if in each iteration some memory is getting filled garbage.

    opened by KanwalSingh 26
  • Add text classification support

    Add text classification support

    Hi Davis, as discussed, I simply use the average feature vectors of each word as the feature vector for the entire document. Overall, there are a few new files to support this new feature, but not any single line deletion on existing codes. Therefore, all the existing features should not be affected, and future improvements are expected based on these new files.

    Regarding the performance in its current form, I test it based on a small internal mail classification dataset, the result is surprisingly good (e.g., more than 90% F1-score).

    I would really appreciate if you can spare some time to review the code, and give me your valuable comments. Thanks.

    opened by jinyichao 25
  • Errors Building MITIE for Java

    Errors Building MITIE for Java

    /Users/davidlaxer/MITIE/dlib/dlib/gui_widgets/nativefont.h:29:10: fatal error: 'X11/Xlocale.h' file not found

    include <X11/Xlocale.h>

         ^
    

    1 error generated.

    OS X 10.10.4.

    David-Laxers-MacBook-Pro:MITIE davidlaxer$ xcode-select --install xcode-select: error: command line tools are already installed, use "Software Update" to install updates

    David-Laxers-MacBook-Pro:java davidlaxer$ pwd /Users/davidlaxer/MITIE/mitielib/java

    David-Laxers-MacBook-Pro:java davidlaxer$ java -version java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) David-Laxers-MacBook-Pro:java davidlaxer$

    -- The C compiler identification is AppleClang 6.1.0.6020053 -- The CXX compiler identification is AppleClang 6.1.0.6020053

    David-Laxers-MacBook-Pro:java davidlaxer$ mkdir build David-Laxers-MacBook-Pro:java davidlaxer$ cmake .. -- The C compiler identification is AppleClang 6.1.0.6020053 -- The CXX compiler identification is AppleClang 6.1.0.6020053 -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for png_create_read_struct -- Looking for png_create_read_struct - found -- Looking for jpeg_read_header -- Looking for jpeg_read_header - found -- Searching for BLAS and LAPACK -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of void* -- Check size of void* - done -- Found OpenBLAS library -- Looking for sgetrf_single -- Looking for sgetrf_single - not found -- Found LAPACK library -- Looking for cblas_ddot -- Looking for cblas_ddot - found -- Check for STD namespace -- Check for STD namespace - found -- Looking for C++ include iostream -- Looking for C++ include iostream - found -- Configuring done CMake Warning (dev): Policy CMP0042 is not set: MACOSX_RPATH is enabled by default. Run "cmake --help-policy CMP0042" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

    MACOSX_RPATH is not specified for the following targets:

    mitie

    This warning is for project developers. Use -Wno-dev to suppress it.

    -- Generating done -- Build files have been written to: /Users/davidlaxer/MITIE/mitielib/java David-Laxers-MacBook-Pro:java davidlaxer$ cmake --build . --config Release --target install Scanning dependencies of target dlib [ 0%] Building CXX object dlib_build/CMakeFiles/dlib.dir/base64/base64_kernel_1.o [ 1%] Building CXX object dlib_build/CMakeFiles/dlib.dir/bigint/bigint_kernel_1.o [ 2%] Building CXX object dlib_build/CMakeFiles/dlib.dir/bigint/bigint_kernel_2.o [ 3%] Building CXX object dlib_build/CMakeFiles/dlib.dir/bit_stream/bit_stream_kernel_1.o [ 4%] Building CXX object dlib_build/CMakeFiles/dlib.dir/entropy_decoder/entropy_decoder_kernel_1.o [ 5%] Building CXX object dlib_build/CMakeFiles/dlib.dir/entropy_decoder/entropy_decoder_kernel_2.o [ 6%] Building CXX object dlib_build/CMakeFiles/dlib.dir/entropy_encoder/entropy_encoder_kernel_1.o [ 7%] Building CXX object dlib_build/CMakeFiles/dlib.dir/entropy_encoder/entropy_encoder_kernel_2.o [ 8%] Building CXX object dlib_build/CMakeFiles/dlib.dir/md5/md5_kernel_1.o [ 9%] Building CXX object dlib_build/CMakeFiles/dlib.dir/tokenizer/tokenizer_kernel_1.o [ 10%] Building CXX object dlib_build/CMakeFiles/dlib.dir/unicode/unicode.o [ 11%] Building CXX object dlib_build/CMakeFiles/dlib.dir/data_io/image_dataset_metadata.o [ 12%] Building CXX object dlib_build/CMakeFiles/dlib.dir/sockets/sockets_kernel_1.o [ 13%] Building CXX object dlib_build/CMakeFiles/dlib.dir/bsp/bsp.o [ 14%] Building CXX object dlib_build/CMakeFiles/dlib.dir/dir_nav/dir_nav_kernel_1.o [ 15%] Building CXX object dlib_build/CMakeFiles/dlib.dir/dir_nav/dir_nav_kernel_2.o [ 16%] Building CXX object dlib_build/CMakeFiles/dlib.dir/dir_nav/dir_nav_extensions.o [ 17%] Building CXX object dlib_build/CMakeFiles/dlib.dir/linker/linker_kernel_1.o [ 18%] Building CXX object dlib_build/CMakeFiles/dlib.dir/logger/extra_logger_headers.o [ 19%] Building CXX object dlib_build/CMakeFiles/dlib.dir/logger/logger_kernel_1.o [ 20%] Building CXX object dlib_build/CMakeFiles/dlib.dir/logger/logger_config_file.o [ 20%] Building CXX object dlib_build/CMakeFiles/dlib.dir/misc_api/misc_api_kernel_1.o [ 21%] Building CXX object dlib_build/CMakeFiles/dlib.dir/misc_api/misc_api_kernel_2.o [ 22%] Building CXX object dlib_build/CMakeFiles/dlib.dir/sockets/sockets_extensions.o [ 23%] Building CXX object dlib_build/CMakeFiles/dlib.dir/sockets/sockets_kernel_2.o [ 24%] Building CXX object dlib_build/CMakeFiles/dlib.dir/sockstreambuf/sockstreambuf.o [ 25%] Building CXX object dlib_build/CMakeFiles/dlib.dir/sockstreambuf/sockstreambuf_unbuffered.o [ 26%] Building CXX object dlib_build/CMakeFiles/dlib.dir/server/server_kernel.o [ 27%] Building CXX object dlib_build/CMakeFiles/dlib.dir/server/server_iostream.o [ 28%] Building CXX object dlib_build/CMakeFiles/dlib.dir/server/server_http.o [ 29%] Building CXX object dlib_build/CMakeFiles/dlib.dir/threads/multithreaded_object_extension.o [ 30%] Building CXX object dlib_build/CMakeFiles/dlib.dir/threads/threaded_object_extension.o [ 31%] Building CXX object dlib_build/CMakeFiles/dlib.dir/threads/threads_kernel_1.o [ 32%] Building CXX object dlib_build/CMakeFiles/dlib.dir/threads/threads_kernel_2.o [ 33%] Building CXX object dlib_build/CMakeFiles/dlib.dir/threads/threads_kernel_shared.o [ 34%] Building CXX object dlib_build/CMakeFiles/dlib.dir/threads/thread_pool_extension.o [ 35%] Building CXX object dlib_build/CMakeFiles/dlib.dir/timer/timer.o [ 36%] Building CXX object dlib_build/CMakeFiles/dlib.dir/stack_trace.o [ 37%] Building CXX object dlib_build/CMakeFiles/dlib.dir/gui_widgets/fonts.o In file included from /Users/davidlaxer/MITIE/dlib/dlib/gui_widgets/fonts.cpp:14: /Users/davidlaxer/MITIE/dlib/dlib/gui_widgets/nativefont.h:29:10: fatal error: 'X11/Xlocale.h' file not found

    include <X11/Xlocale.h>

         ^
    

    1 error generated. dlib_build/CMakeFiles/dlib.dir/build.make:974: recipe for target 'dlib_build/CMakeFiles/dlib.dir/gui_widgets/fonts.o' failed gmake[2]: *** [dlib_build/CMakeFiles/dlib.dir/gui_widgets/fonts.o] Error 1 CMakeFiles/Makefile2:122: recipe for target 'dlib_build/CMakeFiles/dlib.dir/all' failed gmake[1]: *** [dlib_build/CMakeFiles/dlib.dir/all] Error 2 Makefile:127: recipe for target 'all' failed gmake: *** [all] Error 2 David-Laxers-MacBook-Pro:java davidlaxer$

    opened by dbl001 24
  • UTF-8 problems

    UTF-8 problems

    Hi,

    First of all thank let me thank you for this great tool. We are using MITIE via python 2.7. To my best knowledge we have to convert our strings from unicode to plain bytes before passing them to MITIE. When using tokenize_with_offset this can lead to some offset detected in the middle of some unicode character spanning over multiple bytes which results in "UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-5: unexpected end of data" after attempt for decode.

    Any ideas?

    Many thanks, Jakub

    opened by jaksmid 19
  • Does MITIE support Python 3.6.3?

    Does MITIE support Python 3.6.3?

    Hello,

    I need to use Python 3.6.3 for another python library on Windows, so just wondering if MITIE supports Python 3 and if not, how difficult is it to support it? Thanks!

    opened by lzhao7812 14
  • python API for text categorizer

    python API for text categorizer

    Ok so not done yet but wanted to start discussion. Have quite a few editor artefacts looking at the changes here.

    Currently if I try to call the add function on an instance of the text_categorizer_trainer I get an error thrown in the checked_cast method in mitie.cpp (see examples/python/train_text_categorizer.py )

    opened by amn41 14
  • How to make multiple models share the same extractor?

    How to make multiple models share the same extractor?

    Hi Davis,

    Thanks for your help always.

    We always want to reduce the memory usage. Since normally we can not control extractor, so at least we hope that multiple models can share the same extractor.

    With the current C++ implementation without using pointer, it seems that there is no way to share the extractor among multiple models. I tried to write the following code in three cases.

    TotalWordFeatureExtractor totalWordFeatureExtractor = TotalWordFeatureExtractor.getEnglishExtractor();
    NamedEntityExtractor ner = new NamedEntityExtractor(file.getAbsolutePath(), totalWordFeatureExtractor);
    

    The above code consumes around 680 MB JVM memory.

    TotalWordFeatureExtractor totalWordFeatureExtractor = TotalWordFeatureExtractor.getEnglishExtractor();
    NamedEntityExtractor ner = new NamedEntityExtractor(file.getAbsolutePath(), totalWordFeatureExtractor);
    NamedEntityExtractor ner2 = new NamedEntityExtractor(file.getAbsolutePath(), totalWordFeatureExtractor);
    

    The above code consumes around 975 MB JVM memory as following.

    screen shot 2016-01-09 at 7 47 15 pm
    TotalWordFeatureExtractor totalWordFeatureExtractor = TotalWordFeatureExtractor.getEnglishExtractor();
    NamedEntityExtractor ner = new NamedEntityExtractor(file.getAbsolutePath(), totalWordFeatureExtractor);
    NamedEntityExtractor ner2 = new NamedEntityExtractor(file.getAbsolutePath(), totalWordFeatureExtractor);
    NamedEntityExtractor ner3 = new NamedEntityExtractor(file.getAbsolutePath(), totalWordFeatureExtractor);
    

    The above code consumes around 1.26 GB JVM memory.

    For the detailed code, please refer to the following link. https://github.com/wihoho/MITIE/blob/master/mitielib/java/maven/src/test/java/edu/mit/ll/mitie/NamedEntityExtractorTest.java#L41

    Obviously, there is not what we want. The ideal case is that the memory shall still be around 690 MB even there are three different models. So I assume that using pointer in the C++ code will be the only way to overcome this issue. We would like to seek your opinions on resolving this issue because actually we are not good at C++.

    Thank you.

    opened by wihoho 13
  • Exposing scores from the named entity exctractor.

    Exposing scores from the named entity exctractor.

    Exposing scores in mitie::named_entity_extractor by adding a predict method, which follows the design patterns of the underlying dlib::multiclass_linear_decision_function.

    opened by arjunmajum 13
  • Is it possible to reduce the size of the model

    Is it possible to reduce the size of the model

    Hi, I've come across this library, and found it is really amazing! The accuracy is even better than Stanford NER demo!

    Although I understand it contains a high dimensional space with over 500,000 dimensions, is it possible to reduce the model size?

    opened by jinyichao 12
  • Mitie installation failing on Windows 10

    Mitie installation failing on Windows 10

    I tried executing "cmake --build . --config Release --target install". But getting following error.

    The system cannot find the file specified CMake Error: Generator: execution of make failed. Make command was: "nmake" "/NOLOGO" "install"

    Can you please help me.

    opened by ankitarath2011 11
  • How does mitie deal with the segmentation of OOV

    How does mitie deal with the segmentation of OOV

    Expected Behavior

    Hi,I want to know how does mitie deal with the segmentation of OOV. In fact, two of my train example like this: 1.The daily life of the [League Of Legends](name) on November 10 (chinese: [英雄联盟](name)11.10的日活) 2. The daily life of the [Tomb Raider3](name) on November 10 (chinese: [古墓丽影3](name)11.10的日活) My training sample is in Chinese which contains many entities related to the game name. Some game names contain numbers, some have no numbers,like "古墓丽影3" and ”英雄联盟“.In the example above , I want mitie to identify the entities as "古墓丽影3" and the ”英雄联盟“. 11.10 is a simple representation of the date,which should not be include.

    Current Behavior

    I label the entity correctly.However, the first sample is often identified as ”英雄联盟11" rather than ”英雄联盟". How can I deal with this problem? I try to add several data,but It's work. Should I add more data ?

    • Version: 0.7.0
    • Where did you get MITIE: pip install
    • Platform: windows64 and linux64
    opened by rookiebird 1
  • extract_entities returns score of 0

    extract_entities returns score of 0

    Expected Behavior

    So, I'm using the named_entity_extractor, trained it on some data, then extracting entities from some data the model has never seen before using the extract_entities. Expecting to get back the extracted entities with their scores ranging 0-1

    Current Behavior

    The entities are extracted correctly, but the score is 0.

    Steps to Reproduce

    Train model, give it new data, get back score == 0

    • Version: downloaded on 19th of July 2019
    • Where did you get MITIE: Github
    • Platform: 64 bit
    opened by sketing 1
  • “std::bad_alloc”: am I using too much memory?

    “std::bad_alloc”: am I using too much memory?

    Behaviour and Step to Reproduce

    When I running on 8GB RAM, and 16GB swap, it uses full ram and 10GB swap. but still ~5GB free on the swap. Why It raises std::bad_alloc instead of using the rest of swap

    • Version: 0.7.0
    • Where did you get MITIE: Clone latest
    • Platform: Ubuntu 16.04
    • Compiler: GNU Make 4.1
    opened by ilham-bintang 0
  • MITIE Training fails to generate entity_extrator.dat  for more examples

    MITIE Training fails to generate entity_extrator.dat for more examples

    We are using Windows 10 PRO 64 Bit, with RASA .13.0 and Mitie .5.0 version. My RAM is 16GB and Processor : CPU 2.30Ghz, PageFile says: 25677MB used, 938 available.

    I'm running MITIE with 180 examples with 24 threads ,It took 4.5 hours and throw exception. [ I have uploaded the exception message what i get.] mitie_issue

    mitie.py,line 271, in save_disk if(_f.mitie_save_named_entity_extractor_pure_model(filename, self._obj)!=0): OSError: exception: access viloation reading 0x00000000000..0000030

    Also my model_20180918-150254 contains only training_data.json, other files like entity_extractor.dat,intent_classifier.dat, metadata.json,regex_feaurizer.json are not generated.

    But when i test the same with only 2-3 examples, it's all good.

    opened by vicky88377 3
  • AttributeError: function 'mitie_extract_entities_with_extractor' not found

    AttributeError: function 'mitie_extract_entities_with_extractor' not found

    Hi, When I try to run ner.py, I get the following error File "C:\Users\xxxx\rasa_nlu\MITIE\examples\python\ner.py", line 15, in from mitie import * File "C:\Users\xxxx\rasa_nlu\MITIE\examples\python\mitielib\mitie.py", line 61, in f.mitie_extract_entities_with_extractor.restype = ctypes.c_void_p File "C:\Program Files\Python37\lib\ctypes_init.py", line 369, in getattr func = self.getitem(name) File "C:\Program Files\Python37\lib\ctypes_init_.py", line 374, in getitem func = self._FuncPtr((name_or_ordinal, self)) AttributeError: function 'mitie_extract_entities_with_extractor' not found

    I see references to mitie_extract_entities_with_extractor in mitie.cpp and mitie.h which are in C:\Users\xxxx\rasa_nlu\MITIE\mitielib\src and C:\Users\xxxx\rasa_nlu\MITIE\mitielib\include

    why is it not able to get to the function call?

    opened by uchandr 2
Owner
null
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Jiabao Lei 41 Sep 8, 2022
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

null 41 Sep 8, 2022
Compress life's valuable information using Huffman Coding algorithm!

Super Duper Compressor Compress and decompress files lossless using this amazing tool! No more spending your hard earned money to buy a brand new IBM

Matthew Ng 4 Mar 27, 2022
the implementations of 'Parzen-Window Based Normalized Mutual Information for Medical Image Registration'

ImageRegistration_NormalisedMutualInformation 代码复现论文《Parzen-Window Based Normalized Mutual Information for Medical Image Registration》 利用归一化互信息对医学图像进行

gtc1072 3 Apr 4, 2022
C++ Live Toolkit are tools subset used to perform on-the-fly compilation and running of cpp code

C++ Live Toolkit CLT (C++ Live Toolkit) is subset of tools that are very light in size, and maintained to help programmers in compiling and executing

MondeO 1 Jan 4, 2022
Helpful files and tools to get ffxiv running without the native launcher

ffxiv-on-mac About Installing What works What doesn't work Troubleshooting About ffxiv-on-mac is a set of files and scripts for running Final Fantasy

null 18 Apr 4, 2022
NVIDIA Texture Tools samples for compression, image processing, and decompression.

NVTT 3 Samples This repository contains a number of samples showing how to use NVTT 3, a GPU-accelerated texture compression and image processing libr

NVIDIA DesignWorks Samples 32 Sep 25, 2022
My set of enhancements for the Black Ops III Mod Tools.

T7MTEnhancements This is a patch for Linker (and possibly other tools within Black Ops III's Mod Tools in the future) that adds some enhancements to L

Philip 4 Jan 22, 2022
Set of tools for No Man's Sky

NMS-Tools Set of tools for No Man's Sky. NMS-Extract Shut up! Give me the latest version! Extractor that scans a directory (should be the PCBANKS fold

Hugo Peters 27 Jun 29, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

dgSPARSE 57 Sep 30, 2022
The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control.

Robotics Library The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control. It co

Robotics Library 616 Oct 2, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.2k Sep 28, 2022
ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models.

Just to test for my research, and I add coordinate transformation to evaluate the ORB_SLAM3. Only applied in research, and respect the authors' all work.

B.X.W 5 Jul 11, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Oct 4, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 177 Sep 25, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.7k Oct 3, 2022
Radeon Rays is ray intersection acceleration library for hardware and software multiplatforms using CPU and GPU

RadeonRays 4.1 Summary RadeonRays is a ray intersection acceleration library. AMD developed RadeonRays to help developers make the most of GPU and to

GPUOpen Libraries & SDKs 967 Sep 26, 2022
Using PLT trampolines to provide a BLAS and LAPACK demuxing library.

libblastrampoline All problems in computer science can be solved by another level of indirection Using PLT trampolines to provide a BLAS and LAPACK de

Elliot Saba 50 Sep 30, 2022