Huggingface Transformers Tokenizer in C++

Overview

Huggingface Transformers Tokenizer in C++

A tokenizer is in charge of preparing the inputs for a model.

The tokenizer can tokenize Chinese-English bilingual in Linux.

This project mainly solves some Chinese character encoding problems.

Requirements

  • Boost

C++ unicode support

Owner
Eric
All i do is eat and sleep.
Eric
Dockerfile/docker-compose Elasticsearch with plugins elasticsearch-analysis-vietnamese and coccoc-tokenizer

docker-es-cococ-tokenizer Dockerfile/docker-compose Elasticsearch with plugins elasticsearch-analysis-vietnamese and coccoc-tokenizer Deployment docke

Nguyễn Nhật Minh Tú 5 May 8, 2022