Yggdrasil Decision Forests (YDF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Overview

Yggdrasil Decision Forests (YDF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is developed in C++ and available in C++, CLI (command-line-interface, i.e. shell commands) and in TensorFlow under the name TensorFlow Decision Forests (TF-DF).

Developing models in TF-DF and productionizing them (possibly including re-training) in C++ with YDF allows both for a flexible and fast development and an efficient and safe serving.

Usage example

Train, evaluate and benchmark the speed of a model in a few shell lines with the CLI interface:

config.pbtxt # Scan the dataset infer_dataspec --dataset="csv:train.csv" --output="spec.pbtxt" # Train a model train --dataset="csv:train.csv" --dataspec="spec.pbtxt" --config="config.pbtxt" --output="my_model" # Evaluate the model evaluate --dataset="csv:test.csv" --model="my_model" > evaluation.txt # Benchmark the speed of the model benchmark_inference --dataset="csv:test.csv" --model="my_model" > benchmark.txt ">
# Training configuration
echo 'label:"my_label" learner:"RANDOM_FOREST" ' > config.pbtxt
# Scan the dataset
infer_dataspec --dataset="csv:train.csv" --output="spec.pbtxt"
# Train a model
train --dataset="csv:train.csv" --dataspec="spec.pbtxt" --config="config.pbtxt" --output="my_model"
# Evaluate the model
evaluate --dataset="csv:test.csv" --model="my_model" > evaluation.txt
# Benchmark the speed of the model
benchmark_inference --dataset="csv:test.csv" --model="my_model" > benchmark.txt

(see the examples/beginner.sh for more details)

or use the C++ interface:

learner; GetLearner(train_config, &learner); auto model = learner->Train(dataset_path, spec); // Export the model SaveModel("my_model", model.get()); ">
auto dataset_path = "csv:/tr[email protected]";
// Training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");
// Scan the dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);
// Train a model
std::unique_ptr learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);
// Export the model
SaveModel("my_model", model.get());

(see the examples/beginner.cc for more details)

or use the Keras/Python interface of TensorFlow Decision Forests:

import tensorflow_decision_forests as tfdf
import pandas as pd
# Load the dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
# Convert the dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
# Export a SavedModel.
model.save("project/model")

(see TensorFlow Decision Forests for more details)

Documentation & Resources

The following resources are available:

Installation from pre-compiled binaries

Download one of the build releases, and then run examples/beginner.{sh,bat}.

Installation from Source

Install Bazel and run:

git clone https://github.com/google/yggdrasil-decision-forests.git
cd yggdrasil_decision_forests
bazel build //yggdrasil_decision_forests/cli:all --config=linux_cpp17 --config=linux_avx2

# Then, run the example:
examples/beginner.sh

See the installation page for more details, troubleshooting and alternative installation solutions.

Long-time-support commitments

Inference and serving

  • The serving code is isolated from the rest of the framework (i.e., training, evaluation) and has minimal dependencies.
  • Changes to serving-related code are guaranteed to be backward compatible.
  • Model inference is deterministic: the same example is guaranteed to yield the same prediction.
  • Learners and models are extensively tested, including integration testing on real datasets; and, there exists no execution path in the serving code that crashes as a result of an error; Instead, in case of failure (e.g., malformed input example), the inference code returns a util::Status.

Training

  • Hyper-parameters' semantic is never modified.
  • The default value of hyper-parameters is never modified.
  • The default value of a newly-introduced hyper-parameter is set in such a way that the hyper-parameter is effectively disabled.

Quality Assurance

The following mechanisms will be put in place to ensure the quality of the library:

  • Peer-reviewing.
  • Unit testing.
  • Training benchmarks with ranges of acceptable evaluation metrics.
  • Sanitizers.

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, make sure to review the user manual, developer manual and contribution guidelines.

Credits

TensorFlow Decision Forests was developed by:

  • Mathieu Guillame-Bert (gbm AT google DOT com)
  • Jan Pfeifer (janpf AT google DOT com)
  • Sebastian Bruch (sebastian AT bruch DOT io)
  • Arvind Srinivasan (arvnd AT google DOT com)

License

Apache License 2.0

Issues
  • release-tag pre-compiled binaries for armv7

    release-tag pre-compiled binaries for armv7

    Could we get some pre-compiled binaries for armv7. I'm currently trying to compile on a rpi 3b and stuck on:

    Compiling org_tensorflow/tensorflow/core/framework/node_def_util.cc; 12299s local
    Compiling org_tensorflow/tensorflow/core/util/batch_util.cc; 12280s local
    [Sched] Compiling org_tensorflow/tensorflow/core/util/tensor_slice_set.cc; 12299s
    [Sched] Compiling org_tensorflow/tensorflow/core/util/matmul_autotune.cc; 12280s
    

    Note I used the config=use_tensorflow_io flag which I think installs the tensorflow io libraries. However makes the compiling much longer. I had to do it this way because of an error I received using default settings. Also had to take out the

    # Instruction set optimizations
    build:linux_avx2 --copt=-mavx2
    

    because it would error out with -maxvx2 is a unknown command.

    Also unrelated question: is there anyway we could get hypertuner support for tensorflow decision forest? https://www.tensorflow.org/tutorials/keras/keras_tuner#instantiate_the_tuner_and_perform_hypertuning

    Like on sklearn you have gridsearchcv, keras NN's have hypertuner. Would be nice to have this optimization for decision forests.

    Best Regards,

    enhancement 
    opened by jdubz93 5
  • Not able to use yggdrasil-decision-forests as a dependency through Bazel

    Not able to use yggdrasil-decision-forests as a dependency through Bazel

    I am trying to use yggdrasil-decision-forests as a C++ dependency in other project, that uses Bazel. I am using the suggestions in the documentation, mainly:

    cc_library(
        name = "models",
        srcs = ["models.cpp"],
        hdrs = ["models.h"],
        deps = [
            "@ydf//yggdrasil_decision_forests/model:all_models",
            "@ydf//yggdrasil_decision_forests/learners:all_learners",
        ]
    )
    

    In the BUILD file and:

    http_archive(
        name = "ydf",
        strip_prefix = "yggdrasil_decision_forests-master",
        urls = ["https://github.com/google/yggdrasil_decision_forests/archive/master.zip"],
    )
    
    load("@ydf//yggdrasil_decision_forests:library.bzl", ydf_load_deps = "load_dependencies")
    ydf_load_deps(repo_name = "@ydf")
    

    In the WORKSPACE file.

    Nonetheless, I am getting the following error, when making bazel build models:

    ERROR: /home/jfilipe/Repos/vvc-early-term-models/deploy/WORKSPACE:9:1: name 'http_archive' is not defined
    ERROR: error loading package '': Encountered error while reading extension file 'yggdrasil_decision_forests/library.bzl': no such package '@ydf//yggdrasil_decision_forests': error loading package 'external': Could not load //external package
    
    opened by JNSFilipe 5
  • Follow Up Question

    Follow Up Question

    Follow up question of last issue.

    Hi again. We looked further into this matter, comparing the implemented algorithms, and I think we found a difference. To this time we are unsure, if this difference is of significance but wanted to share in case this information is useful.

    While comparing we were looking at the trees that the algorithms produced. We figured that trees in Yggdrasil were getting way larger than in XGBoost. In order to find out why, we compared the way the two algorithms are growing their trees and shed a light on how the algorithms decide if the split will happen. We think we found the reason on why the trees grow larger.

    To get a better view on the internals we used this small dataset:

    Index | Feature_a | Feature_b | Label --- | --- | --- | --- 0 | 0 | 1 | 0 1 | 0 | 0 | 0 2 | 1 | 1 | 1 3 | 0 | 1 | 0

    Splits

    Xgboost split

    Split 1

    Best split candidate:

    • Feature: Feature_a
    • Root Gain: 1
    • Index Left: 2
    • Index Right: 0, 1, 3
    • Score Left: 1
    • Score Right: 3
    • Split happening: Yes

    Split 2

    Index | Feature_a | Feature_b | Label --- | --- | --- | --- 0 | 0 | 1 | 0 1 | 0 | 0 | 0 2 | 0 | 1 | 0

    Best split candidate:

    • Feature: Feature_b
    • Root Gain: 3
    • Index Left: 0, 2
    • Index Right: 1
    • Score Left: 2
    • Score Right: 1
    • Split happening: No

    Yggdrasil split

    Split 1

    Best split candidate:

    • Feature: Feature_a
    • Root Gain: 0
    • Index Left: 2
    • Index Right: 0, 1, 3
    • Score Left: 1
    • Score Right: 3
    • Split happening: Yes

    Split 2

    Index | Feature_a | Feature_b | Label --- | --- | --- | --- 0 | 0 | 1 | 0 1 | 0 | 0 | 0 2 | 0 | 1 | 0

    Best split candidate:

    • Feature: Feature_b
    • Root Gain: 0
    • Index Left: 0, 2
    • Index Right: 1
    • Score Left: 0.666667
    • Score Right: 0.33333
    • Split happening: Yes

    Explanation

    Other than XGBoost Yggdrasil does not seem to carry through the gain of the previous split. It seems to get set to 0 before each new split is evaluated. We think this is why we found Yggdrasil trees to grow larger than XGBoost. Yggdrasil sets the score to 0 on every iteration, while XGBoost tries to find splits that get a better score than the one of the parent node. The NodeCondition is initialized every time the split function is called. And a new NodeCondition defaults to 0. In this case this is no issue but it may or may not be an issue for more complex problems.

    Comparison

    We tried to find some difference in performance on different datasets. These datasets were small toy datasets so far.

    Setup

    XGBoost was running on MAC Os Laptop.

    • 2,6 GHz 6-Core Intel Core i7
    • 16 GB 2667 MHz DDR4
    • AMD Radeon Pro 5300M 4 GB Intel UHD Graphics 630 1536 MB

    TFDF was running von GoogleColab and Multipass.

    Multipass:

    • Ubuntu 20.04.3 LTS
    • 14.4G Disk
    • 3.8G Memory

    Datasets

    Sklearn - Breast Cancer Dataset

    • Classes: 2
    • Samples per class: 212 (M), 357 (B)
    • Samples total: 569
    • Dimensionality: 30
    • Features: real, positive

    Sklearn - Digit Dataset

    • Classes: 10
    • Samples per class: 180
    • Samples total: 1797
    • Dimensionality: 64
    • Features: integers 0-16

    Both dataset were splitted with sklearns train_test_split with test_size=0.2 and random_state=42

    Comparison

    The datasets were loaded with sklearn and compared with different configurations of both models. We tried first to make them as equal as possible.

    Parameter

    Yggdrasil

    dt_kwargs_base = {
        'num_trees':1000,
        'growing_strategy':"BEST_FIRST_GLOBAL",
        'max_depth':6,
        'use_hessian_gain':True,
        'sorting_strategy':"IN_NODE",
        'shrinkage':1.,
        'subsample':1.,
        'sampling_method': 'RANDOM',
        'l1_regularization':1.,
        'l2_regularization':1.,
        'l2_categorical_regularization':1.,
        'num_candidate_attributes': -1,
        'num_candidate_attributes_ratio': -1.,
        'min_examples':1,
        'validation_ratio':0.,
        'early_stopping':"NONE",
        'in_split_min_examples_check':False,
        'max_num_nodes': -1,
        'verbose': 0,
    }
    

    XGBoost

    gb_kwargs = {
            'n_estimators': 1000,
            'max_depth': 6,
            'colsample_bytree': 1.,
            'colsample_bynode': 1.,
            'colsample_bylevel': 1.,
            'use_label_encoder': False,
            'reg_lambda': 1.,
            'reg_alpha': 1.,
            'min_child_weight': 0,
            'min_split_loss': 0,
            'max_delta_step': 0,
            'base_score': 0.5,
            'learning_rate': 1.,
            'tree_method': 'exact',
            'booster': 'gbtree',
            'nthread': 1,
            'eval_metric': eval_metric,
            'objective': objective,
            'subsample': 1,
            'verbosity': 0,
            'validate_parameters': False,
            'scale_pos_weight': 1,
            'refresh_leaf': 1,
            #'early_stopping_rounds': 10,
        }
    

    Model/Dataset | Breast Cancer | Digits --- | --- | ---
    XGBoost | 0.1072 | 0.1606
    Yggdrasil | 0.0944 | 0.2350

    Conclusion

    So far we are not sure if the different approaches of the algorithms is making a huge difference in performance of the model. After tweaking the hyperparameter Yggdrasil was able to perform equally as good or even better than XGBoost. So I wanted to reach out to you, if you can (un)verify our observation and give further insight.

    Thanks a lot and best Regards

    Timo

    opened by omit-ai 2
  • Compiling on Mac

    Compiling on Mac

    Hi all,

    I am currently experimenting with Yggdrasil and trying to compile it on macOS Big Sur. On Linux it runs without problems.

    My setup:

    • Apple clang version 13.0.0 (clang-1300.0.29.3) or gcc/g++ (Homebrew GCC 9.4.0) 9.4.0
    • bazel 4.0.0 (did tried bazelisk aswell)
    • Python 3.9.5
    • numpy Version: 1.21.3

    I tried different flags out of the .bashrc for config macos and linux.

    Whatever I do the compiler crashes with following error:

    yggdrasil_decision_forests/utils/bitmap.cc:198:12: error: out-of-line definition of 'BitWriter' does not match any declaration in 'yggdrasil_decision_forests::utils::bitmap::BitWriter' BitWriter::BitWriter(const uint64_t size, std::string* bitmap)

    In other branches you have different versions of the BitWriter Class. The ones not in main are adding ~BitWriter(); in Public of BitWriter class in bitmap.h. I added this but got same result.

    So can you tell me. what I am doing wrong? Is there a way to get it running on mac?

    Thanks so much for response.

    Best regards,

    Timo

    Edit 1:

    We seem to have found the error.

    Since in bitmap.h the declaration is as follows:

    BitWriter(size_t size, std::string* bitmap);

    We have another type in bitmap.cc:

    BitWriter::BitWriter(const uint64_t size, std::string* bitmap)

    After changing uint64_t size to size_t size in bitmap.cc, compiling on macOS worked.

    opened by omit-ai 2
  • XGBoost implementation

    XGBoost implementation

    Hi all,

    I'm trying to implement XGBoost, using gradient_boosted_trees with use_hessian_gain. After some tweaking of the parameter, while working with different datasets on binary classification problem, I cannot replicate the trees and results of xgboost's XGBClassifier; but as far as I understand the code, it should produce the same algorithm.

    Did somebody tried this, or is this even possible? If yes can you point me in the direction of how I need to configure GradientBoostedTreesModel and if not is there an implementation of XGBoost planned?

    Thanks so much in advance,

    Timo

    opened by omit-ai 2
  • platforms dependency

    platforms dependency

    Hi, i was following installation instruction but build failed with this error:

     ERROR: .../external/com_google_absl/absl/BUILD.bazel:84:15: no such target '@platforms//cpu:wasm32': target 'wasm32' not declared in package 'cpu' defined by .../external/platforms/cpu/BUILD and referenced by '@com_google_absl//absl:platforms_wasm32'
    ERROR: While resolving configuration keys for @com_google_absl//absl:wasm_3: Analysis failed
    ERROR: While resolving configuration keys for @com_google_absl//absl/synchronization:synchronization: Analysis failed
    ERROR: Analysis of target '//yggdrasil_decision_forests/cli:cli_test' failed; build aborted: Analysis failed
    

    but it resolved by adding this to yggdrasil/yggdrasil-decision-forests/third_party/absl/workspace.bzl:

    http_archive(
            name = "platforms",
            sha256 = "b601beaf841244de5c5a50d2b2eddd34839788000fa1be4260ce6603ca0d8eb7",
            strip_prefix = "platforms-98939346da932eef0b54cf808622f5bb0928f00b",
            urls = ["https://github.com/bazelbuild/platforms/archive/98939346da932eef0b54cf808622f5bb0928f00b.zip"],
        )
    

    ububtu 18.04, bazel 4.0.0 bazel build //yggdrasil_decision_forests/cli/...:all --config=linux_cpp17 --config=linux_avx2 --repo_env=CC=gcc-9 --copt=-mavx2 --config=use_tensorflow_io --define=no_absl_statusor=1

    is it because of some dependency changes or have i done sth wrong?

    opened by amir2040 1
  • Is the model prediction probability calibrated?

    Is the model prediction probability calibrated?

    I understand that the sklearn random forest models prediction probabilities are not calibrated and we need to add steps in between to calibrate it.

    Just wanted to understand if the prediction probabilities of yggdrasil-decision-forests are calibrated?

    If not, how can be calibrate it?

    opened by Vedant-R 1
  • Is hessian gain supported or not? on what loss?

    Is hessian gain supported or not? on what loss?

    Yggdrasil is such a nice project as it has such versatile support on GBT. One intriguing is that Yggdrasil supports use_hessian_gain as a toggle for different training style.

    I'm currently looking into implementing my own Loss function for certain tasks, yet question raises when dealing with Hessian.

    • Is hessian essential for a new loss func implemented
    • Is use_hessian_gain behavior actually defaulted to what's set in Loss, or really just from use_hessian_gain=false proto?
    • if wanted, how to properly support hessian training
      • Is Yggdrasil training on gain = G^2/(H+λ)

    S: use_hessian_gain "Available for all losses except regression."

    https://github.com/google/yggdrasil-decision-forests/blob/be943445871d61726da063c838b213d20a8104a5/documentation/learners.md#L303-L309

    A: hessian available for LogLikelihood, not MSE

    A.1 . Error when use_hessian_gain = true and hessian_col_idx not set

    https://github.com/google/yggdrasil-decision-forests/blob/1700e1908e04d8af9fc40c207b685786f001a541/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc#L1231-L1235

    A.2 gradient.hessian_col_idx set iff hessian = true

    https://github.com/google/yggdrasil-decision-forests/blob/1700e1908e04d8af9fc40c207b685786f001a541/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees.cc#L2378-L2390

    A.3 has hessian = false for MSE

    https://github.com/google/yggdrasil-decision-forests/blob/15781866817e2c0d9d94ebca3e7a5fea50242678/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees_loss.h#L288-L299

    A.4 has hessian = use_hessian_gain for LogLikelihoodLoss

    https://github.com/google/yggdrasil-decision-forests/blob/15781866817e2c0d9d94ebca3e7a5fea50242678/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees_loss.h#L226-L238

    B hessian available for Regression, not Classification

    B.1 Error when use_hessian_gain on Task::CLASSIFICATION

    https://github.com/google/yggdrasil-decision-forests/blob/f897874b809520e1d8f8296d995de9fb8ef59671/yggdrasil_decision_forests/learner/decision_tree/training.cc#L1207-L1210

    B.2 No error when use_hessian_gain on Task::REGRESSION

    https://github.com/google/yggdrasil-decision-forests/blob/f897874b809520e1d8f8296d995de9fb8ef59671/yggdrasil_decision_forests/learner/decision_tree/training.cc#L1237-L1240

    C hessian only trainable on Regression task with LogLikelihood Loss

    C.1 : G, H, W (sum of gradience hessian and weights) available for LogLikelihoodLoss, nothing else

    https://github.com/google/yggdrasil-decision-forests/blob/15781866817e2c0d9d94ebca3e7a5fea50242678/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees_loss.cc#L283-L287

    C.2: G, H, W only accessed on Task::REGRESSION, nothing else

    https://github.com/google/yggdrasil-decision-forests/blob/f897874b809520e1d8f8296d995de9fb8ef59671/yggdrasil_decision_forests/learner/decision_tree/training.cc#L1237-L1251

    D: MSE = Regression+Ranking, LogLikelihood = Classification

    D.1: MSE only available for Regression or Ranking, not Classification

    https://github.com/google/yggdrasil-decision-forests/blob/15781866817e2c0d9d94ebca3e7a5fea50242678/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees_loss.cc#L369-L375

    D.2 LogLikelihood Loss only available for Classification

    https://github.com/google/yggdrasil-decision-forests/blob/15781866817e2c0d9d94ebca3e7a5fea50242678/yggdrasil_decision_forests/learner/gradient_boosted_trees/gradient_boosted_trees_loss.cc#L150-L154

    Conclusion draw from S, A, B, C, D:

    S,B => conflicts either use_hessian_gain is documented wrong, or it could be feature not implemented yet, or there is something really wrong going on here.

    C,D => hessian not trainable as LogLikelihood Algo does not make much sense in Regression tasks, It's much likely C is invalid

    A, D => E: hessian available for Classification, not Regression/Ranking B, E => conflicts as B also conflicts in 1, it's likely B is the actual False one.

    opened by Willian-Zhang 1
  • building off external hard drive

    building off external hard drive

    [email protected]:/mnt/sdcard/yggdrasil-decision-forests# bazel --output_user_root=/mnt/sdcard/install build //examples:beg
    inner_cc --config=linux_cpp17 --config=linux_avx2
    
    Extracting Bazel installation...
    FATAL: failed to create installation symlink '/mnt/sdcard/install/32ee77bc3907dda3edc97c30cd47096e/install': (error: 1): Operation not permitted
    [email protected]:/mnt/sdcard/yggdrasil-decision-forests#
    

    Does anyone know if you can use bazel on a external hard disk. For example say I mounted an sd card on my raspberry pi. Can I use bazel to build from the mounted drive? I know bazel uses java which can cause some weird permission issues im assuming.

    opened by jdubz93 1
  • Simple Model problem

    Simple Model problem

    Hi,

    I have the following example. How to rewrite the code - so I can use the model Predict for the given input? There is no example in the docs...

    #include "yggdrasil_decision_forests/dataset/data_spec.h"
    #include "yggdrasil_decision_forests/dataset/data_spec.pb.h"
    #include "yggdrasil_decision_forests/dataset/data_spec_inference.h"
    #include "yggdrasil_decision_forests/dataset/vertical_dataset_io.h"
    #include "yggdrasil_decision_forests/learner/learner_library.h"
    #include "yggdrasil_decision_forests/metric/metric.h"
    #include "yggdrasil_decision_forests/metric/report.h"
    #include "yggdrasil_decision_forests/model/model_library.h"
    #include "yggdrasil_decision_forests/utils/filesystem.h"
    #include "yggdrasil_decision_forests/utils/logging.h"
    #include "yggdrasil_decision_forests/serving/decision_forest/decision_forest.h"
    #include <chrono>
    
    namespace ygg = yggdrasil_decision_forests;
    
    int main(int argc, char** argv) {
      // Enable the logging. Optional in most cases.
      InitLogging(argv[0], &argc, &argv, true);
    
      // Import the model.
      LOG(INFO) << "Import the model";
      const std::string model_path = "/tmp/my_saved_model/1/assets";
      std::unique_ptr<ygg::model::AbstractModel> model;
      QCHECK_OK(ygg::model::LoadModel(model_path, &model));
    
      // Show information about the model.
      // Like :show_model, but without the list of compatible engines.
      std::string model_description;
      model->AppendDescriptionAndStatistics(/*full_definition=*/false,
                                            &model_description);
      LOG(INFO) << "Model:\n" << model_description;
    
      auto start = std::chrono::high_resolution_clock::now();
    
      // Compile the model for fast inference.
      const std::unique_ptr<ygg::serving::FastEngine> serving_engine =
          model->BuildFastEngine().value();
      const auto& features = serving_engine->features();
    
      // Handle to two features.
      const auto age_feature = features.GetNumericalFeatureId("age").value();
      const auto sex_feature =
          features.GetCategoricalFeatureId("sex").value();
    
      // Allocate a batch of 1 examples.
      std::unique_ptr<ygg::serving::AbstractExampleSet> examples =
          serving_engine->AllocateExamples(1);
    
      // Set all the values as missing. This is only necessary if you don't set all
      // the feature values manually e.g. SetNumerical.
      //examples->FillMissing(features);
    
      // Set the value of "age" and "eduction" for the first example.
      examples->SetNumerical(/*example_idx=*/0, age_feature, 50.f, features);
      examples->SetCategorical(/*example_idx=*/0, sex_feature, "Male",
                               features);
    
      // Run the predictions on the first two examples.
      std::vector<float> batch_of_predictions;
      serving_engine->Predict(*examples, 1, &batch_of_predictions);
    
      auto stop = high_resolution_clock::now();
      auto duration = std::chrono::duration_cast<milliseconds>(stop - start);
    
      // To get the value of duration use the count()
      // member function on the duration object
      LOG(INFO) << duration.count();
    
      LOG(INFO) << "Predictions:";
      for (const float prediction : batch_of_predictions) {
        LOG(INFO) << "\t" << prediction;
      }
    
      return 0;
    }
    

    Output:

    [INFO beginner4.cc:31] Import the model
    [INFO beginner4.cc:41] Model:
    Type: "RANDOM_FOREST"
    Task: CLASSIFICATION
    Label: "__LABEL"
    
    Input Features (2):
    	age
    	sex
    
    No weights
    
    Variable Importance: MEAN_MIN_DEPTH:
        1. "__LABEL"  8.480250 ################
        2.     "sex"  1.313142 ##
        3.     "age"  0.000000 
    
    Variable Importance: NUM_AS_ROOT:
        1. "age" 300.000000 
    
    Variable Importance: NUM_NODES:
        1. "age" 34255.000000 ################
        2. "sex" 1584.000000 
    
    Variable Importance: SUM_SCORE:
        1. "age" 516361.208020 ################
        2. "sex" 148174.885377 
    
    
    
    Winner take all: true
    Out-of-bag evaluation: accuracy:0.756613 logloss:5.68258
    Number of trees: 300
    Total number of nodes: 71978
    
    Number of nodes by tree:
    Count: 300 Average: 239.927 StdDev: 4.94044
    Min: 223 Max: 251 Ignored: 0
    ----------------------------------------------
    [ 223, 224)  1   0.33%   0.33%
    [ 224, 225)  0   0.00%   0.33%
    [ 225, 227)  1   0.33%   0.67%
    [ 227, 228)  3   1.00%   1.67% #
    [ 228, 230)  3   1.00%   2.67% #
    [ 230, 231)  0   0.00%   2.67%
    [ 231, 233)  9   3.00%   5.67% ##
    [ 233, 234) 17   5.67%  11.33% ###
    [ 234, 236) 33  11.00%  22.33% ######
    [ 236, 237)  0   0.00%  22.33%
    [ 237, 238) 28   9.33%  31.67% #####
    [ 238, 240) 51  17.00%  48.67% ##########
    [ 240, 241)  0   0.00%  48.67%
    [ 241, 243) 46  15.33%  64.00% #########
    [ 243, 244) 47  15.67%  79.67% #########
    [ 244, 246) 34  11.33%  91.00% #######
    [ 246, 247)  0   0.00%  91.00%
    [ 247, 249) 13   4.33%  95.33% ###
    [ 249, 250) 10   3.33%  98.67% ##
    [ 250, 251]  4   1.33% 100.00% #
    
    Depth by leafs:
    Count: 36139 Average: 8.48037 StdDev: 2.34049
    Min: 3 Max: 15 Ignored: 0
    ----------------------------------------------
    [  3,  4)   70   0.19%   0.19%
    [  4,  5)  662   1.83%   2.03% #
    [  5,  6) 3633  10.05%  12.08% ######
    [  6,  7) 3980  11.01%  23.09% #######
    [  7,  8) 4520  12.51%  35.60% ########
    [  8,  9) 5143  14.23%  49.83% #########
    [  9, 10) 5817  16.10%  65.93% ##########
    [ 10, 11) 5152  14.26%  80.18% #########
    [ 11, 12) 3519   9.74%  89.92% ######
    [ 12, 13) 2003   5.54%  95.46% ###
    [ 13, 14)  975   2.70%  98.16% ##
    [ 14, 15)  483   1.34%  99.50% #
    [ 15, 15]  182   0.50% 100.00%
    
    Number of training obs by leaf:
    Count: 36139 Average: 190.182 StdDev: 179.903
    Min: 5 Max: 2370 Ignored: 0
    ----------------------------------------------
    [    5,  123) 14849  41.09%  41.09% ##########
    [  123,  241) 10680  29.55%  70.64% #######
    [  241,  359)  3917  10.84%  81.48% ###
    [  359,  478)  6234  17.25%  98.73% ####
    [  478,  596)   137   0.38%  99.11%
    [  596,  714)     7   0.02%  99.13%
    [  714,  833)    17   0.05%  99.18%
    [  833,  951)    19   0.05%  99.23%
    [  951, 1069)     6   0.02%  99.24%
    [ 1069, 1188)   135   0.37%  99.62%
    [ 1188, 1306)    33   0.09%  99.71%
    [ 1306, 1424)     0   0.00%  99.71%
    [ 1424, 1542)     0   0.00%  99.71%
    [ 1542, 1661)     8   0.02%  99.73%
    [ 1661, 1779)    53   0.15%  99.88%
    [ 1779, 1897)     6   0.02%  99.89%
    [ 1897, 2016)     0   0.00%  99.89%
    [ 2016, 2134)     2   0.01%  99.90%
    [ 2134, 2252)    28   0.08%  99.98%
    [ 2252, 2370]     8   0.02% 100.00%
    
    Attribute in nodes:
    	34255 : age [NUMERICAL]
    	1584 : sex [CATEGORICAL]
    
    Attribute in nodes with depth <= 0:
    	300 : age [NUMERICAL]
    
    Attribute in nodes with depth <= 1:
    	600 : age [NUMERICAL]
    	300 : sex [CATEGORICAL]
    
    Attribute in nodes with depth <= 2:
    	1721 : age [NUMERICAL]
    	379 : sex [CATEGORICAL]
    
    Attribute in nodes with depth <= 3:
    	3328 : age [NUMERICAL]
    	1102 : sex [CATEGORICAL]
    
    Attribute in nodes with depth <= 5:
    	11208 : age [NUMERICAL]
    	1583 : sex [CATEGORICAL]
    
    Condition type in nodes:
    	34255 : HigherCondition
    	1584 : ContainsBitmapCondition
    Condition type in nodes with depth <= 0:
    	300 : HigherCondition
    Condition type in nodes with depth <= 1:
    	600 : HigherCondition
    	300 : ContainsBitmapCondition
    Condition type in nodes with depth <= 2:
    	1721 : HigherCondition
    	379 : ContainsBitmapCondition
    Condition type in nodes with depth <= 3:
    	3328 : HigherCondition
    	1102 : ContainsBitmapCondition
    Condition type in nodes with depth <= 5:
    	11208 : HigherCondition
    	1583 : ContainsBitmapCondition
    Node format: BLOB_SEQUENCE
    
    Training OOB:
    	trees: 1, Out-of-bag evaluation: accuracy:0.750237 logloss:9.00239
    	trees: 13, Out-of-bag evaluation: accuracy:0.754722 logloss:7.09704
    	trees: 23, Out-of-bag evaluation: accuracy:0.753863 logloss:6.3117
    	trees: 33, Out-of-bag evaluation: accuracy:0.75395 logloss:6.19856
    	trees: 43, Out-of-bag evaluation: accuracy:0.754299 logloss:6.0429
    	trees: 53, Out-of-bag evaluation: accuracy:0.753165 logloss:5.9747
    	trees: 63, Out-of-bag evaluation: accuracy:0.754867 logloss:5.96594
    	trees: 73, Out-of-bag evaluation: accuracy:0.75443 logloss:5.92934
    	trees: 83, Out-of-bag evaluation: accuracy:0.754954 logloss:5.92356
    	trees: 93, Out-of-bag evaluation: accuracy:0.756307 logloss:5.89293
    	trees: 103, Out-of-bag evaluation: accuracy:0.756569 logloss:5.89233
    	trees: 113, Out-of-bag evaluation: accuracy:0.755696 logloss:5.81216
    	trees: 123, Out-of-bag evaluation: accuracy:0.755653 logloss:5.81
    	trees: 133, Out-of-bag evaluation: accuracy:0.755347 logloss:5.80588
    	trees: 143, Out-of-bag evaluation: accuracy:0.755914 logloss:5.77847
    	trees: 153, Out-of-bag evaluation: accuracy:0.755783 logloss:5.77828
    	trees: 163, Out-of-bag evaluation: accuracy:0.755522 logloss:5.74301
    	trees: 173, Out-of-bag evaluation: accuracy:0.756264 logloss:5.73794
    	trees: 183, Out-of-bag evaluation: accuracy:0.756176 logloss:5.7384
    	trees: 193, Out-of-bag evaluation: accuracy:0.756831 logloss:5.73852
    	trees: 203, Out-of-bag evaluation: accuracy:0.756613 logloss:5.73565
    	trees: 213, Out-of-bag evaluation: accuracy:0.757268 logloss:5.73545
    	trees: 223, Out-of-bag evaluation: accuracy:0.757486 logloss:5.7286
    	trees: 233, Out-of-bag evaluation: accuracy:0.757093 logloss:5.72556
    	trees: 243, Out-of-bag evaluation: accuracy:0.757093 logloss:5.7087
    	trees: 253, Out-of-bag evaluation: accuracy:0.757224 logloss:5.70704
    	trees: 263, Out-of-bag evaluation: accuracy:0.757006 logloss:5.69758
    	trees: 273, Out-of-bag evaluation: accuracy:0.756831 logloss:5.69816
    	trees: 283, Out-of-bag evaluation: accuracy:0.756526 logloss:5.69813
    	trees: 293, Out-of-bag evaluation: accuracy:0.756526 logloss:5.68217
    	trees: 300, Out-of-bag evaluation: accuracy:0.756613 logloss:5.68258
    
    [INFO decision_forest.cc:639] Model loaded with 300 root(s), 71978 node(s), and 2 input feature(s).
    [INFO abstract_model.cc:1158] Engine "RandomForestOptPred" built
    [INFO beginner4.cc:79] 18
    [INFO beginner4.cc:81] Predictions:
    [INFO beginner4.cc:83] 	0.649999
    

    Lets say I want to continuously read the inference input data from stdin - I only need to load model and initialized serving_engine once?

    every time stdin has new data to run a prediction I just need to run the following code? does anything change when I change to a http inference - anything to consider regarding thread safety - I cant share the model and serving_engine across multiple threads?

    Is there anything I should change when I dont need batching? I still need to use std::unique_ptr<ygg::serving::AbstractExampleSet> examples = serving_engine->AllocateExamples(1)?

      // Handle to two features.
      const auto age_feature = features.GetNumericalFeatureId("age").value();
      const auto sex_feature =
          features.GetCategoricalFeatureId("sex").value();
    
      // Allocate a batch of 1 examples.
      std::unique_ptr<ygg::serving::AbstractExampleSet> examples =
          serving_engine->AllocateExamples(1);
    
      // Set all the values as missing. This is only necessary if you don't set all
      // the feature values manually e.g. SetNumerical.
      //examples->FillMissing(features);
    
      // Set the value of "age" and "eduction" for the first example.
      examples->SetNumerical(/*example_idx=*/0, age_feature, 50.f, features);
      examples->SetCategorical(/*example_idx=*/0, sex_feature, "Male",
                               features);
    
      // Run the predictions on the first two examples.
      std::vector<float> batch_of_predictions;
      serving_engine->Predict(*examples, 1, &batch_of_predictions);
    
      auto stop = high_resolution_clock::now();
      auto duration = std::chrono::duration_cast<milliseconds>(stop - start);
    
      // To get the value of duration use the count()
      // member function on the duration object
      LOG(INFO) << duration.count();
    
      LOG(INFO) << "Predictions:";
      for (const float prediction : batch_of_predictions) {
        LOG(INFO) << "\t" << prediction;
      }
    

    I also tried to use the c-api here, but always get as Result Tensor: 1.0? any idea why?

    // gcc -I/usr/local/include -L/usr/local/lib main.c -ltensorflow -o main
    #include <stdio.h>
    #include <tensorflow/c/c_api.h>
    
    void NoOpDeallocator(void* data, size_t a, void* b) {}
    
    int main() {
      TF_Graph *Graph = TF_NewGraph();
      TF_Status *Status = TF_NewStatus();
      TF_SessionOptions *SessionOpts = TF_NewSessionOptions();
      TF_Buffer *RunOpts = NULL;
      TF_Library *library;
    
      library = TF_LoadLibrary("/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/inference.so",
                                  Status);
    
      const char *saved_model_dir = "/tmp/my_saved_model/1/";
      const char *tags = "serve";
      int ntags = 1;
    
      TF_Session *Session = TF_LoadSessionFromSavedModel(
          SessionOpts, RunOpts, saved_model_dir, &tags, ntags, Graph, NULL, Status);
    
      printf("status: %s\n", TF_Message(Status));
    
      if(TF_GetCode(Status) == TF_OK) {
        printf("loaded\n");
      }else{
        printf("not loaded\n");
      }
    
      /* Get Input Tensor */
      int NumInputs = 2;
    
      TF_Output* Input = malloc(sizeof(TF_Output) * NumInputs);
      TF_Output t0 = {TF_GraphOperationByName(Graph, "serving_default_age"), 0};
    
      if(t0.oper == NULL)
        printf("ERROR: Failed TF_GraphOperationByName serving_default_input_1\n");
      else
        printf("TF_GraphOperationByName serving_default_input_1 is OK\n");
    
      TF_Output t1 = {TF_GraphOperationByName(Graph, "serving_default_sex"), 0};
    
      if(t1.oper == NULL)
        printf("ERROR: Failed TF_GraphOperationByName serving_default_input_2\n");
      else
        printf("TF_GraphOperationByName serving_default_input_2 is OK\n");
    
      Input[0] = t0;
      Input[1] = t1;
    
      // Get Output tensor
      int NumOutputs = 1;
      TF_Output* Output = malloc(sizeof(TF_Output) * NumOutputs);
      TF_Output tout = {TF_GraphOperationByName(Graph, "StatefulPartitionedCall_1"), 0};
    
      if(tout.oper == NULL)
          printf("ERROR: Failed TF_GraphOperationByName StatefulPartitionedCall\n");
      else
        printf("TF_GraphOperationByName StatefulPartitionedCall is OK\n");
    
      Output[0] = tout;
    
      /* Allocate data for inputs and outputs */
      TF_Tensor** InputValues  = (TF_Tensor**)malloc(sizeof(TF_Tensor*)*NumInputs);
      TF_Tensor** OutputValues = (TF_Tensor**)malloc(sizeof(TF_Tensor*)*NumOutputs);
    
      int ndims = 1;
      int64_t dims[] = {1};
      int64_t data[] = {50};
    
      int ndata = sizeof(int64_t);
      TF_Tensor* int_tensor0 = TF_NewTensor(TF_INT64, dims, ndims, data, ndata, &NoOpDeallocator, 0);
    
      if (int_tensor0 != NULL)
        printf("TF_NewTensor is OK\n");
      else
        printf("ERROR: Failed TF_NewTensor\n");
    
      const char test_string[] = "Male";
      TF_TString tstr[1];
      TF_TString_Init(&tstr[0]);
      TF_TString_Copy(&tstr[0], test_string, sizeof(test_string)-1);
      TF_Tensor* int_tensor1 = TF_NewTensor(TF_STRING, NULL, 0, &tstr[0], sizeof(tstr), &NoOpDeallocator, 0);
    
      if (int_tensor1 != NULL)
        printf("TF_NewTensor is OK\n");
      else
        printf("ERROR: Failed TF_NewTensor\n");
    
      InputValues[0] = int_tensor0;
      InputValues[1] = int_tensor1;
    
      // Run the Session
      TF_SessionRun(Session,
                    NULL, // Run options.
                    Input, InputValues, NumInputs, // Input tensors name, input tensor values, number of inputs.
                    Output, OutputValues, NumOutputs, // Output tensors name, output tensor values, number of outputs.
                    NULL, 0, // Target operations, number of targets.
                    NULL,
                    Status); // Output status.
    
      if(TF_GetCode(Status) == TF_OK)
        printf("Session is OK\n");
      else
        printf("%s",TF_Message(Status));
    
      // Free memory
      TF_DeleteGraph(Graph);
      TF_DeleteSession(Session, Status);
      TF_DeleteSessionOptions(SessionOpts);
      TF_DeleteStatus(Status);
    
      /* Get Output Result */
      void* buff = TF_TensorData(OutputValues[0]);
      float* offsets = (float*)buff;
      printf("Result Tensor :\n");
      printf("%f\n",offsets[0]);
      return 0;
    }
    

    Output:

    # gcc -I/usr/local/include -L/usr/local/lib main.c -ltensorflow -o main; ./main
    2022-06-16 21:05:24.070748: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/my_saved_model/1/
    2022-06-16 21:05:24.072148: I tensorflow/cc/saved_model/reader.cc:81] Reading meta graph with tags { serve }
    2022-06-16 21:05:24.072208: I tensorflow/cc/saved_model/reader.cc:122] Reading SavedModel debug info (if present) from: /tmp/my_saved_model/1/
    2022-06-16 21:05:24.072280: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2022-06-16 21:05:24.085806: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
    2022-06-16 21:05:24.086985: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
    2022-06-16 21:05:24.116570: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /tmp/my_saved_model/1/
    [INFO kernel.cc:1176] Loading model from path /tmp/my_saved_model/1/assets/ with prefix cf8326335a66430a
    [INFO decision_forest.cc:639] Model loaded with 300 root(s), 71978 node(s), and 2 input feature(s).
    [INFO abstract_model.cc:1246] Engine "RandomForestOptPred" built
    [INFO kernel.cc:1022] Use fast generic engine
    2022-06-16 21:05:24.373058: I tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 302321 microseconds.
    status: 
    loaded
    TF_GraphOperationByName serving_default_input_1 is OK
    TF_GraphOperationByName serving_default_input_2 is OK
    TF_GraphOperationByName StatefulPartitionedCall is OK
    TF_NewTensor is OK
    TF_NewTensor is OK
    Session is OK
    Result Tensor :
    1.000000
    
    opened by Arnold1 3
  • Is there a way to add priors to a RF classifier?

    Is there a way to add priors to a RF classifier?

    Hi

    Does anyone know how to add priors (a priori class probabilities) to the class label values when creating a Random Forest classifier using yggdrasil? I haven't found any interface for priors, something similar to what opencv does here:

    https://docs.opencv.org/3.4/d8/d89/classcv_1_1ml_1_1DTrees.html#a66756433f31db77a5511fc3f85403bd9

    Many thanks.

    opened by JoseAF 4
  • Can Yggdrasil run on 8-bit?

    Can Yggdrasil run on 8-bit?

    Hi

    I am using a GBT model with the quickscorer algorithm. For training, I create a yggdrasil_decision_forests::dataset::VerticalDataset and use the AppendExample interface for which I have to convert the features to string (very slow!). Then I train an AbstractLearner using the dataset. For prediction, I cast the abstract model to GradientBoostedTreesBinaryClassificationQuickScorerExtended and create an ExampleSet which I then use to predict. So far I'm using the SetNumerical interface, which takes float, for creating the example set.

    Since all my features come as uint8, I was wondering whether it is possible to use the Yggdrasil library (both training and prediction) in 8-bit mode directly, without having to convert either to string or to float? I haven't found interfaces to do this. If they are not available, is there a way to change/template the code to make this possible? Has anyone done this already? Do you envision problems with this?

    Many thanks.

    opened by JoseAF 7
  • Decision forests prediction question

    Decision forests prediction question

    Hi,

    Is the generated Yggdrasil decision forests model the same format as other tf models? Could I use https://github.com/galeone/tfgo and call predict from a golang app?

    I run into an issue with bazel when building the standalone example - any idea what could be the issue?

    [email protected]:/notebooks/yggdrasil-decision-forests# uname -a
    Linux efc8844082ba 5.10.103-0-virt #1-Alpine SMP Tue, 08 Mar 2022 10:06:11 +0000 x86_64 x86_64 x86_64 GNU/Linux
    [email protected]:/notebooks/yggdrasil-decision-forests# 
    [email protected]:/notebooks/yggdrasil-decision-forests# 
    [email protected]:/notebooks/yggdrasil-decision-forests# cat /etc/os-release
    NAME="Ubuntu"
    VERSION="20.04.4 LTS (Focal Fossa)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 20.04.4 LTS"
    VERSION_ID="20.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=focal
    UBUNTU_CODENAME=focal
    
    [email protected]:/notebooks/yggdrasil-decision-forests# uname -a
    Linux efc8844082ba 5.10.103-0-virt #1-Alpine SMP Tue, 08 Mar 2022 10:06:11 +0000 x86_64 x86_64 x86_64 GNU/Linux
    
    # [email protected]:/notebooks/yggdrasil-decision-forests# bazel --version
    bazel 5.1.1
    
    [email protected]:/notebooks/yggdrasil-decision-forests# bazel build //yggdrasil_decision_forests/cli:all --config=linux_cpp17 --config=linux_avx2
    Extracting Bazel installation...
    Starting local Bazel server and connecting to it...
    INFO: Reading rc options for 'build' from /notebooks/yggdrasil-decision-forests/.bazelrc:
      Inherited 'common' options: --experimental_repo_remote_exec --incompatible_restrict_string_escapes=false
    ERROR: --incompatible_restrict_string_escapes=false :: Unrecognized option: --incompatible_restrict_string_escapes=false
    

    Here is how I install bazel: https://docs.bazel.build/versions/main/install-ubuntu.html#19

    how to fix the issue: disable this line: https://github.com/google/yggdrasil-decision-forests/blob/main/.bazelrc#L43

    but than I got his error:

    [email protected]:/notebooks/yggdrasil-decision-forests# bazel build //yggdrasil_decision_forests/cli:all --config=linux_cpp17 --config=linux_avx2
    Extracting Bazel installation...
    Starting local Bazel server and connecting to it...
    INFO: Options provided by the client:
      Inherited 'common' options: --isatty=1 --terminal_columns=96
    INFO: Reading rc options for 'build' from /notebooks/yggdrasil-decision-forests/.bazelrc:
      Inherited 'common' options: --experimental_repo_remote_exec
    INFO: Reading rc options for 'build' from /notebooks/yggdrasil-decision-forests/.bazelrc:
      'build' options: -c opt --spawn_strategy=standalone --announce_rc --noincompatible_strict_action_env --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --define=grpc_no_ares=true --color=yes
    INFO: Found applicable config definition build:linux_cpp17 in file /notebooks/yggdrasil-decision-forests/.bazelrc: --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=linux
    INFO: Found applicable config definition build:linux in file /notebooks/yggdrasil-decision-forests/.bazelrc: --copt=-fdiagnostics-color=always --copt=-w --host_copt=-w
    INFO: Found applicable config definition build:linux_avx2 in file /notebooks/yggdrasil-decision-forests/.bazelrc: --copt=-mavx2
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'com_google_absl' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'farmhash_archive' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'com_google_protobuf' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'com_google_googletest' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'zlib' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'rules_cc' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'rules_python' because it already exists.
    DEBUG: /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/repo.bzl:108:14: 
    Warning: skipping import of repository 'bazel_skylib' because it already exists.
    INFO: Repository local_execution_config_python instantiated at:
      /notebooks/yggdrasil-decision-forests/WORKSPACE:38:4: in <toplevel>
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/tensorflow/workspace2.bzl:1108:19: in workspace
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/tensorflow/workspace2.bzl:84:27: in _tf_toolchains
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/tf_toolchains/toolchains/remote_config/configs.bzl:6:28: in initialize_rbe_configs
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/tf_toolchains/toolchains/remote_config/rbe_config.bzl:158:27: in _tensorflow_local_config
    Repository rule local_python_configure defined at:
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/py/python_configure.bzl:275:41: in <toplevel>
    ERROR: An error occurred during the fetch of repository 'local_execution_config_python':
       Traceback (most recent call last):
    	File "/root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/py/python_configure.bzl", line 213, column 39, in _create_local_python_repository
    		numpy_include = _get_numpy_include(repository_ctx, python_bin) + "/numpy"
    	File "/root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/py/python_configure.bzl", line 187, column 19, in _get_numpy_include
    		return execute(
    	File "/root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/remote_config/common.bzl", line 219, column 13, in execute
    		fail(
    Error in fail: Problem getting numpy include path.
    OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
    Is numpy installed?
    ERROR: /notebooks/yggdrasil-decision-forests/WORKSPACE:38:4: fetching local_python_configure rule //external:local_execution_config_python: Traceback (most recent call last):
    	File "/root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/py/python_configure.bzl", line 213, column 39, in _create_local_python_repository
    		numpy_include = _get_numpy_include(repository_ctx, python_bin) + "/numpy"
    	File "/root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/py/python_configure.bzl", line 187, column 19, in _get_numpy_include
    		return execute(
    	File "/root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/third_party/remote_config/common.bzl", line 219, column 13, in execute
    		fail(
    Error in fail: Problem getting numpy include path.
    OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
    Is numpy installed?
    INFO: Repository go_sdk instantiated at:
      /notebooks/yggdrasil-decision-forests/WORKSPACE:42:4: in <toplevel>
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/org_tensorflow/tensorflow/workspace0.bzl:117:20: in workspace
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/com_github_grpc_grpc/bazel/grpc_extra_deps.bzl:36:27: in grpc_extra_deps
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/io_bazel_rules_go/go/toolchain/toolchains.bzl:379:28: in go_register_toolchains
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/io_bazel_rules_go/go/private/sdk.bzl:65:21: in go_download_sdk
    Repository rule _go_download_sdk defined at:
      /root/.cache/bazel/_bazel_root/e69e42dd9f08c8f44fd8644c44ecd3fd/external/io_bazel_rules_go/go/private/sdk.bzl:53:35: in <toplevel>
    ERROR: Analysis of target '//yggdrasil_decision_forests/cli:all_file_systems' failed; build aborted: Problem getting numpy include path.
    OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
    Is numpy installed?
    INFO: Elapsed time: 49.501s
    INFO: 0 processes.
    FAILED: Build did NOT complete successfully (11 packages loaded, 15 targets configured)
        currently loading: @bazel_tools//tools/python ... (2 packages)
        Fetching https://dl.google.com/go/go1.12.5.linux-amd64.tar.gz; 1,613,824B
    

    my dockerfile (to reproduce the hazel error):

    # image is based on:
    # https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/dockerfiles/dockerfiles/cpu.Dockerfile
    
    ARG UBUNTU_VERSION=20.04
    
    FROM ubuntu:${UBUNTU_VERSION} as base
    
    ENV DEBIAN_FRONTEND=noninteractive
    
    RUN apt-get update && apt-get install -y curl
    
    # See http://bugs.python.org/issue19846
    ENV LANG C.UTF-8
    
    RUN apt-get update && apt-get install -y \
        python3 \
        python3-pip
    
    RUN python3 -m pip --no-cache-dir install --upgrade \
        "pip<20.3" \
        setuptools
    
    # Some TF tools expect a "python" binary
    RUN ln -s $(which python3) /usr/local/bin/python
    
    # Options:
    #   tensorflow
    #   tensorflow-gpu
    #   tf-nightly
    #   tf-nightly-gpu
    # Set --build-arg TF_PACKAGE_VERSION=1.11.0rc0 to install a specific version.
    # Installs the latest version by default.
    ARG TF_PACKAGE=tensorflow
    ARG TF_PACKAGE_VERSION=
    RUN python3 -m pip install --no-cache-dir ${TF_PACKAGE}${TF_PACKAGE_VERSION:+==${TF_PACKAGE_VERSION}}
    
    # install tensorflow_decision_forests and numpy
    RUN pip3 install tensorflow_decision_forests --upgrade
    RUN python3 -m pip install numpy
    
    # install bazel
    RUN apt install apt-transport-https curl gnupg -y
    RUN curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg
    RUN mv bazel.gpg /etc/apt/trusted.gpg.d/
    RUN echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | tee /etc/apt/sources.list.d/bazel.list
    RUN apt update && apt install bazel -y
    RUN apt update && apt full-upgrade -y
    RUN apt install bazel-1.0.0 -y
    #RUN ln -s /usr/bin/bazel-1.0.0 /usr/bin/bazel
    RUN bazel --version
    
    # WORKDIR /tf
    # VOLUME ["/tf"]
    
    COPY bashrc /etc/bash.bashrc
    RUN chmod a+rwx /etc/bash.bashrc
    

    cc @achoum

    opened by Arnold1 8
  • performance issue with random forests

    performance issue with random forests

    Hi, I'm running a RANDOM_FOREST model trained in tf_df, by using yggdrasil c++ api and inference time taking about 50 μs, But as you said it probably shouldn't take more than 10 μs.

    Also running in large batches(vs batch size =1) or using --copt=-mavx2 doesn't make a difference at all! I've used benchmark_inference tool and result was the same. Another interesting observation was difference between min & max execution time per instance, exec times for 10 run: ######################################## 0 max 2133059 min 17293 avg 52250 1 max 1054634 min 16696 avg 52982 2 max 1038110 min 14949 avg 45468 3 max 1068611 min 16752 avg 53064 4 max 1657415 min 16790 avg 54514 5 max 1125537 min 16432 avg 53145 6 max 1939590 min 17591 avg 74354 7 max 2997816 min 17284 avg 70325 8 max 1064365 min 16554 avg 56063 9 max 1044182 min 16145 avg 51429 ########################################

    even if i ignore some of first execution times (for cache miss) the variance between exec times are still high. ######################################## 0 max 1488318 min 15841 avg 51085 1 max 955384 min 16501 avg 45567 2 max 928377 min 16370 avg 44606 3 max 1018261 min 15124 avg 44204 4 max 1429345 min 17299 avg 79810 5 max 1628887 min 17539 avg 80997 6 max 2126679 min 16487 avg 67346 7 max 1058939 min 16616 avg 53941 8 max 1098449 min 16242 avg 48047 9 max 1103341 min 16659 avg 53750 ########################################

    Wondering if there is a problem in model or inference setup or this is the best performance i can get. model spec: RANDOM_FOREST 300 root(s), 618972 node(s), and 28 input feature(s). RandomForestOptPred engine

    compiled with this flags: --config=linux_cpp17 --config=linux_avx2 --repo_env=CC=gcc-9 --copt=-mavx2

    system spec: Ubuntu 18.04.4 cpu Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz on esxi virtual machine

    opened by amir2040 4
Releases(0.2.4)
Owner
Google
Google ❤️ Open Source
Google
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 311 Jun 9, 2022
R2LIVE is a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses the measurement from the LiDAR, inertial sensor, visual camera to achieve robust, accurate state estimation.

R2LIVE is a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses the measurement from the LiDAR, inertial sensor, visual camera to achieve robust, accurate state estimation.

HKU-Mars-Lab 568 Jun 21, 2022
This robot lcoalisation package for lidar-map based localisation using multi-sensor state estimation.

A ROS-based NDT localizer with multi-sensor state estimation This repo is a ROS based multi-sensor robot localisation. An NDT localizer is loosely-cou

null 39 Jun 6, 2022
An open library of computer vision algorithms

VLFeat -- Vision Lab Features Library Version 0.9.21 The VLFeat open source library implements popular computer vision algorithms specialising in imag

VLFeat.org 1.5k Jun 20, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 22.9k Jun 21, 2022
Distributed (Deep) Machine Learning Community 677 Apr 14, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.2k Jun 26, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

The Fido Project 411 May 31, 2022
A RGB-D SLAM system for structural scenes, which makes use of point-line-plane features and the Manhattan World assumption.

This repo proposes a RGB-D SLAM system specifically designed for structured environments and aimed at improved tracking and mapping accuracy by relying on geometric features that are extracted from the surrounding.

Yanyan Li 223 Jun 21, 2022
MITIE: library and tools for information extraction

MITIE: MIT Information Extraction This project provides free (even for commercial use) state-of-the-art information extraction tools. The current rele

null 2.8k Jun 27, 2022
LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping

A real-time lidar-inertial odometry package. We strongly recommend the users read this document thoroughly and test the package with the provided dataset first.

Tixiao Shan 1.8k Jun 27, 2022
MATLAB and C++ implementations of sideslip angle estimators

sideslip-angle-vehicle-estimation MATLAB and C++ implementations of sideslip angle estimators Factor graph sideslip angle estimator Papers: "A Factor

Libraries for Multibody Dynamics Simulation (MBDS) 9 Jan 20, 2022
null 5.5k Jun 21, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

iQIYI 115 Jun 20, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 624 Jun 13, 2022
A programming game, in which your goal is to help a group of dwarves establish a small outpost in the middle of a dangerous forest.

"Since they were to come in the days of the power of Melkor, Aulë made the dwarves strong to endure. Therefore they are stone-hard, stubborn, fast in

Alexey Nikolaev 4 Jan 8, 2022
✔️The smallest header-only GUI library(4 KLOC) for all platforms

Welcome to GUI-lite The smallest header-only GUI library (4 KLOC) for all platforms. 中文 Lightweight ✂️ Small: 4,000+ lines of C++ code, zero dependenc

null 6.3k Jun 27, 2022
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 604 Jun 24, 2022
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 604 Jun 24, 2022
Static analyzer for C/C++ based on the theory of Abstract Interpretation.

IKOS IKOS (Inference Kernel for Open Static Analyzers) is a static analyzer for C/C++ based on the theory of Abstract Interpretation. Introduction IKO

NASA - Software V&V 1.7k Jun 27, 2022
Static analyzer for C/C++ based on the theory of Abstract Interpretation.

IKOS IKOS (Inference Kernel for Open Static Analyzers) is a static analyzer for C/C++ based on the theory of Abstract Interpretation. Introduction IKO

NASA - Software V&V 1.7k Jun 27, 2022
Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase.

CFace Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase. Dependancies Tensorflow 2

null 8 Nov 23, 2021
null 4.8k Jun 20, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 13.9k Jun 18, 2022
Simple embeddable C++11 async tcp,http and websocket serving.

net11 Simple embeddable C++11 async tcp,http and websocket serving. What is it? An easily embeddable C++11 networking library designed to make buildin

Jonas Lund 9 Mar 28, 2020
MozoLM: A language model (LM) serving library

A language model serving library, with middleware functionality including mixing of probabilities from disparate base language model types and tokenizations along with RPC client/server interactions.

Google Research 35 Jun 19, 2022
Flutter plugin serving utilities related to Windows taskbar. 💙

windows_taskbar Flutter plugin serving utilities related to Windows taskbar ?? Install dependencies: windows_taskbar: ^0.0.1 Demo Checkout the exam

Hitesh Kumar Saini 83 Jun 25, 2022