A flexible, high-performance serving system for machine learning models

Overview

Build Test

XGBoost Serving

This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more information about TensorFlow Serving, switch to the master branch or visit the TensorFlow Serving website.


XGBoost Serving is a flexible, high-performance serving system for XGBoost && FM models, designed for production environments. It deals with the inference aspect of XGBoost && FM models, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. XGBoost Serving derives from TensorFlow Serving and is used widely inside iQIYI.

To note a few features:

  • Can serve multiple models, or multiple versions of the same model simultaneously
  • Exposes gRPC inference endpoints
  • Allows deployment of new model versions without changing any client code
  • Supports canarying new versions and A/B testing experimental models
  • Adds minimal latency to inference time due to efficient, low-overhead implementation
  • Supports XGBoost servables, XGBoost && FM servables and XGBoost && alphaFM_Softmax servables
  • Supports computation latency distribution statistics

Documentation

Set up

The easiest and most straight-forward way of building and using XGBoost Serving is with Docker images. We highly recommend this route unless you have specific needs that are not addressed by running in a container.

Use

Export your XGBoost && FM model

In order to serve a XGBoost && FM model, simply export your XGBoot model, leaf mapping and FM model.

Please refer to Export XGBoost && FM model for details about the models's specification and how to export XGBoost && FM model.

Configure and Use XGBoost Serving

Extend

XGBoost Serving derives from TensorFlow Serving and thanks to Tensorflow Serving's highly modular architecture. You can use some parts individually and/or extend it to serve new use cases.

Contribute

If you'd like to contribute to XGBoost Serving, be sure to review the contribution guidelines.

Feedback and Getting involved

  • Report bugs, ask questions or give suggestions by Github Issues
You might also like...
 Forward - A library for high performance deep learning inference on NVIDIA GPUs
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++
DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research written in C++

DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provides an python interface to better interface with machine-learning toolkits. Deep RTS can process the game with over 6 000 000 steps per second and 2 000 000 steps when rendering graphics. In comparison to other solutions, such as StarCraft, this is over 15 000% faster simulation time running on Intel i7-8700k with Nvidia RTX 2080 TI.

Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.
In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

Pico-Motion-Recognition This Repository has the code used on the 2 parts tutorial TinyML - Motion Recognition Using Raspberry Pi Pico The first part i

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

 Deploying Deep Learning Models in C++: BERT Language Model
Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

Owner
iQIYI
hosting open source projects in iQIYI, a provider of high-quality video and entertainment services in China
iQIYI
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Nov 17, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.8k Nov 26, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.4k Nov 24, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Nov 25, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 178 Nov 16, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

nod.ai 78 Nov 19, 2022
Turi Create simplifies the development of custom machine learning models.

Quick Links: Installation | Documentation Turi Create Turi Create simplifies the development of custom machine learning models. You don't have to be a

Apple 10.9k Nov 28, 2022
MozoLM: A language model (LM) serving library

A language model serving library, with middleware functionality including mixing of probabilities from disparate base language model types and tokenizations along with RPC client/server interactions.

Google Research 36 Nov 3, 2022
faiss serving :)

faiss-server faiss-server provides gRPC services to for similarity search using faiss. It is written in C++ and now supports only CPU environments. In

null 109 Nov 16, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022