A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Overview

Website | Documentation | Tutorials | Installation | Release Notes

GitHub license PyPI version Conda Version GitHub issues Telegram

CatBoost is a machine learning method based on gradient boosting over decision trees.

Main advantages of CatBoost:

Get Started and Documentation

All CatBoost documentation is available here.

Install CatBoost by following the guide for the

Next you may want to investigate:

If you cannot open documentation in your browser try adding yastatic.net and yastat.net to the list of allowed domains in your privacy badger.

Catboost models in production

If you want to evaluate Catboost model in your application read model api documentation.

Questions and bug reports

Help to Make CatBoost Better

  • Check out open problems and help wanted issues to see what can be improved, or open an issue if you want something.
  • Add your stories and experience to Awesome CatBoost.
  • To contribute to CatBoost you need to first read CLA text and add to your pull request, that you agree to the terms of the CLA. More information can be found in CONTRIBUTING.md
  • Instructions for contributors can be found here.

News

Latest news are published on twitter.

Reference Paper

Anna Veronika Dorogush, Andrey Gulin, Gleb Gusev, Nikita Kazeev, Liudmila Ostroumova Prokhorenkova, Aleksandr Vorobev "Fighting biases with dynamic boosting". arXiv:1706.09516, 2017.

Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin "CatBoost: gradient boosting with categorical features support". Workshop on ML Systems at NIPS 2017.

License

© YANDEX LLC, 2017-2019. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    Problem:UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128) catboost version: catboost 0.25 Operating System:win10

    When I use setup.py to install Catboost, this error occurs, and if I look closely it is divided into two parts: 1. Using CUDA to create _catboost.pyd will cause an error like 'UnicodeDecodeError:' ASCII 'codec can't decode byte 0xCD in position 9: Ordinal not in range(128). 2. Do not use the CUDA to create _catboost. pyd, there will be "subprocess. CalledProcessError:Command '['D:\anaconda3\python.exe', 'D:\learn\catboost-master\ya', 'make', 'D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost', '--no-src-links', '--output', 'D:\ learn\ catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release', '-dpython_config =python3-config',' -duse_arcadia_python =no', '-dos_sdk =local', '-r','-DNO_DEBUGINFO', '-DHAVE_CUDA= NO '] returned non-zero exit status 1." I also tried converting _catboost.pyx from GitHub to _catboost.pyd using 'python setup.py build_ext --inplace' directly, but I got the same error as when installing CatBoost.

    C:\Users\王普聪>pip install -e D:\learn\catboost-master\catboost\python-package
    Obtaining file:///D:/learn/catboost-master/catboost/python-package
    Requirement already satisfied: graphviz in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (0.16)
    Requirement already satisfied: plotly in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (4.14.3)
    Requirement already satisfied: six in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.15.0)
    Requirement already satisfied: matplotlib in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (3.2.2)
    Requirement already satisfied: numpy>=1.16.0 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.18.5)
    Requirement already satisfied: pandas>=0.24 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.0.5)
    Requirement already satisfied: scipy in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.5.0)
    Requirement already satisfied: retrying>=1.3.3 in d:\anaconda3\lib\site-packages (from plotly->catboost==0.24.4) (1.3.3)
    Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.4.7)
    Requirement already satisfied: cycler>=0.10 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (0.10.0)
    Requirement already satisfied: kiwisolver>=1.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (1.2.0)
    Requirement already satisfied: python-dateutil>=2.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.8.1)
    Requirement already satisfied: pytz>=2017.2 in d:\anaconda3\lib\site-packages (from pandas>=0.24->catboost==0.24.4) (2020.1)
    Installing collected packages: catboost
      Running setup.py develop for catboost
        ERROR: Command errored out with exit status 1:
         command: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
             cwd: D:\learn\catboost-master\catboost\python-package\
        Complete output (159 lines):
        running develop
        15:30:22 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        running egg_info
        writing catboost.egg-info\PKG-INFO
        writing dependency_links to catboost.egg-info\dependency_links.txt
        writing requirements to catboost.egg-info\requires.txt
        writing top-level names to catboost.egg-info\top_level.txt
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        reading manifest file 'catboost.egg-info\SOURCES.txt'
        writing manifest file 'catboost.egg-info\SOURCES.txt'
        running build_ext
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        15:30:24 I Buildling _catboost.pyd with ymake
        15:30:24 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=yes "-DCUDA_ROOT=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1"
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        15:30:37 E Cannot build _catboost.pyd with CUDA support, will build without CUDA
        15:30:37 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=no
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 259, in <module>
            setup(
          File "D:\anaconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
            return distutils.core.setup(**attrs)
          File "D:\anaconda3\lib\distutils\core.py", line 148, in setup
            dist.run_commands()
          File "D:\anaconda3\lib\distutils\dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 34, in run
            self.install_for_development()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 136, in install_for_development
            self.run_command('build_ext')
          File "D:\anaconda3\lib\distutils\cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 186, in run
            self.build_with_ymake(topsrc_dir, build_dir, catboost_ext, put_dir, verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 219, in build_with_ymake
            logging_execute(ymake_cmd + ['-DHAVE_CUDA=no'], verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 62, in logging_execute
            subprocess.check_call(cmd, universal_newlines=True)
          File "D:\anaconda3\lib\subprocess.py", line 364, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['D:\\anaconda3\\python.exe', 'D:\\learn\\catboost-master\\ya', 'make', 'D:\\learn\\catboost-master\\catboost\\python-package\\..\\..\\catboost\\python-package\\catboost', '--no-src-links', '--output', 'D:\\learn\\catboost-master\\catboost\\python-package\\build\\temp.win-amd64-3.8\\Release', '-DPYTHON_CONFIG=python3-config', '-DUSE_ARCADIA_PYTHON=no', '-DOS_SDK=local', '-r', '-DNO_DEBUGINFO', '-DHAVE_CUDA=no']' returned non-zero exit status 1.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
    
    opened by Wangpc-972 67
  • User description is used by default. Move metric creation metric to corresponding class factories.

    User description is used by default. Move metric creation metric to corresponding class factories.

    Each metric now uses user-specified parameters in their descriptions by default.

    Design

    TMetric now stores a TMap<TString, TString> of user parameters, which are used to construct a metric description (e.g. MetricName:key1=value1;key2=value2). This implementation is defined in the base class and is now the default behaviour for building metric descriptions.

    Some of specifiv GetDescription method implementations are kept in order to be consistent with the existing behaviour.

    Note

    UserQuerywiseMetric now uses the options in its representation as well.

    opened by ivanychev 38
  • Sum of shap values does not equal to the prediction

    Sum of shap values does not equal to the prediction

    Problem: Sum of shap values does not equal to the prediction catboost version: 0.18.1 Operating System: Ubuntu 19.10 CPU: i7-8565U

    It only happens sometimes but we find that the of shap values does not equal to the prediction. Please let us know how we can provide further information

    in progress bug 
    opened by hopoluicha 27
  • How catboost handle with big data?

    How catboost handle with big data?

    Hi! I try to use catboost in kaggle competition. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection The size of my train set about 40m rows with 14 features. When i try to train model, kernel always dies without any errors...

    need info 
    opened by Mechanix12 27
  • Unknown class labels

    Unknown class labels

    I'm beginner using boosting models ,I'm trying to implement catboost . My input data has 6 categorical features and 2 numerical feature . My target variable is numerical data. I'm running on GPU . I'm facing the problem below please help me. Cannot chare data due privacy issue.

    Traceback (most recent call last): File "/work/ilt/css8222/cat_boost/cat_boost.py", line 127, in save_snapshot = True File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 4718, in fit silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr) File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 2042, in _fit train_params["init_model"] File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 1464, in _train self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None) File "_catboost.pyx", line 4393, in _catboost._CatBoost._train File "_catboost.pyx", line 4442, in _catboost._CatBoost._train _catboost.CatBoostError: catboost/private/libs/target/target_converter.cpp:226: Unknown class label: "14289"

    opened by sujay003 25
  • Faster SHAP values for small batches

    Faster SHAP values for small batches

    For small batches use direct SHAP values calculation. Direct algorithm (without precalculation) is faster when (where DocumentsNumber < MeanLeafCount), because for preprocessing we find SHAP values for MeanLeafCount documents.

    (algorithm from https://arxiv.org/abs/1802.03888)

    With preprocessing final complexity was O(NT(D+F))+O(TL^2 D^2) where N is the number of documents(objects), T - number of trees, D - average tree depth, F - average number of features in tree, L - average number of leaves in tree. But if the batch is small we can use default algorithm with complexity O(NTLD^2), which is better when N < L.

    Example: On dataset gisette (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) with 100 first features train CatBoostRegressor(iterations=500, depth=6, random_seed=42) and then use get_feature_importance to find SHAP values for the first object in test.

    Old:

    • 0.32 s

    New:

    • shap_mode="Auto" or "NoPreCalc"- 0.015 s
    • shap_mode="UsePreCalc" - 0.32 s (this is like it was before)

    I hereby agree to the terms of the CLA available at: link

    opened by Lokutrus 25
  • Tutorial for ranking modes in CatBoost

    Tutorial for ranking modes in CatBoost

    Hello.

    Looks like the current version of CatBoost supports learning to rank. There are some clues about it in the documentation, but I couldn't find any minimal working examples. I wonder which methods should be considered as a baseline approach and what are the prerequisites?

    Should we use YetiRank as the training metric and just provide a query id as the Pool group_id parameter? What other CatBoost parameters should be taken into account specifically for a ranking problem?

    Thank you!

    planned documentation 
    opened by hanky 24
  • GPU yields worse metric than CPU

    GPU yields worse metric than CPU

    Problem:various measurements become worse when I switch from CPU to GPU catboost version:0.22 Operating System:Linux 4.4.0-1100-aws x86_64 CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

    GPU: Tesla M60

    I wanted to reduce the training time and so I specified 'task_type' as 'GPU'. I immediately noticed that its metrics got worse. The only change I made was setting task_type as GPU. The rest are the same.

    The training dataset has 1.2M rows and 218 columns. Among these 218 columns, 42 are categorical features. The rest are floats or integers, no text features. The validation dataset has 120K rows.

    The following are the parameters for the CPU version: {'nan_mode': 'Min', 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'iterations': 1000, 'sampling_frequency': 'PerTree', 'fold_permutation_block': 0, 'leaf_estimation_method': 'Newton', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'ctr_leaf_count_limit': 18446744073709551615, 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'max_ctr_complexity': 4, 'model_size_reg': 0.5, 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'subsample': 0.800000011920929, 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'store_all_simple_ctr': False, 'border_count': 254, 'classes_count': 0, 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate': 0, 'min_data_in_leaf': 1, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'CPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'MVS', 'max_leaves': 64, 'permutation_count': 4}

    The following are the parameters for the GPU version: {'nan_mode': 'Min', 'gpu_ram_part': 0.95, 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=Median:Prior=0/1'], 'iterations': 1000, 'fold_permutation_block': 64, 'leaf_estimation_method': 'Newton', 'observations_to_bootstrap': 'TestOnly', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'ctr_history_unit': 'Sample', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'devices': '-1', 'pinned_memory_bytes': '104857600', 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'fold_size_loss_normalization': False, 'max_ctr_complexity': 4, 'gpu_cat_features_storage': 'GpuRam', 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=MinEntropy:Prior=0/1'], 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'border_count': 128, 'min_fold_size': 100, 'data_partition': 'FeatureParallel', 'bagging_temperature': 1, 'classes_count': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'min_data_in_leaf': 1, 'add_ridge_penalty_to_loss_function': False, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'GPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'Bayesian', 'max_leaves': 64, 'permutation_count': 4}

    opened by kdlin 23
  • Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Problem: "Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set" printed by cv function after loading from file previously saved model. catboost version: 0.12.2 Operating System: CentOS Linux release 7.4.1708 CPU: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz

    model = CatBoostClassifier(loss_function='MultiClass')
    model.fit(train_pool, 
      verbose=False, 
      plot=True,
      eval_set=validation_pool)
    model.save_model(str(model_path.absolute()))
    model = CatBoostClassifier()
    model.load_model(str(model_path.absolute()))
    cv_data = cv(
        whole_pool,
        params=model.get_params()
    )
    
    ---------------------------------------------------------------------------
    CatboostError                             Traceback (most recent call last)
    <ipython-input-40-f150897615b8> in <module>
          1 cv_data = cv(
          2     whole_pool,
    ----> 3     params=model.get_params()
          4 )
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, max_time_spent_on_fixed_cost_ratio, dev_max_iterations_batch_size)
       2876 
       2877     params = deepcopy(params)
    -> 2878     _process_synonyms(params)
       2879 
       2880     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval)
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_synonyms(params)
        754         del params['silent']
        755 
    --> 756     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        757 
        758     if metric_period is not None:
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        133     at_most_one = sum(params.get(exclusive) is not None for exclusive in exclusive_params)
        134     if at_most_one > 1:
    --> 135         raise CatboostError('Only one of parameters {} should be set'.format(exclusive_params))
        136 
        137     if verbose is None:
    
    CatboostError: Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set
    
    bug 
    opened by protsenkovi 23
  • Flag not copied unnecessarily with blank and whitespace

    Flag not copied unnecessarily with blank and whitespace

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors here.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA. I hereby agree to the terms of the CLA available at https://yandex.ru/legal/cla/?lang=en.
    opened by sharaalfa 23
  • Issue trying to compile with specified gcc version

    Issue trying to compile with specified gcc version

    I'm trying to compile the catboost python wheel on my system. The default gcc version I have is 8, but I also have 7 installed so I'm trying to use that by setting the CC and CXX environment variables. However, when running:

    python mk_wheel.py -DCUDA_ROOT="/opt/cuda"
    

    I get the message:

    Info: Attention! Using system user-defined compiler: g++-7 (check CC and CXX env vars).
    Cross compilation with system CXX is not supported
    

    catboost version: git master Operating System: Linux CPU: i7 GPU: GTX 1080

    Thanks!

    build issues 
    opened by ctlaltdefeat 23
  • Which CUDA version is compatible with catboost-1.1.1

    Which CUDA version is compatible with catboost-1.1.1

    CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 35: CUDA driver version is insufficient for CUDA runtime version

    catboost version: catboost-1.1.1 Operating System: ubuntu 18.04 GPU: Tesla V100 Driver Version: 418.181.07
    CUDA Version: 10.1

    GPU doesn't work after I updated the catboost , so can anyone tell me which cuda shoud I install ? How do I configure my system to use the gpu?

    opened by skyking363 0
  • Fix the R parameters

    Fix the R parameters

    Fix the R parameters

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA.

    I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

    opened by elenaspb2019 0
  • MultiLogloss metric produces different results in train and eval_metric functions

    MultiLogloss metric produces different results in train and eval_metric functions

    Catboost version: 1.1.1 Operating System: Ubuntu 20.04

    Problem: I train a model for multilabel classification

    from sklearn import datasets
    import catboost
    from catboost.utils import eval_metric
    from sklearn.model_selection import train_test_split
    
    X, y = datasets.make_multilabel_classification(n_samples=500, n_features=20, n_classes=5, random_state=0)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    
    data = catboost.Pool(X_train, y_train)
    eval_data = catboost.Pool(X_test, y_test)
    
    classifier = catboost.CatBoostClassifier(iterations=5, learning_rate=0.05, min_data_in_leaf=1024, used_ram_limit='60Gb',
                                             max_ctr_complexity=1, border_count=None, loss_function='MultiLogloss',
                                             eval_metric='MultiLogloss', random_seed=32))
    
    classifier.fit(data, eval_set=eval_data, logging_level='Verbose', early_stopping_rounds=10)
    
    print(classifier.best_score_)
    

    0: learn: 0.6763151 test: 0.6802836 best: 0.6802836 (0) total: 1.92ms remaining: 7.69ms 1: learn: 0.6592310 test: 0.6696275 best: 0.6696275 (1) total: 3.65ms remaining: 5.47ms 2: learn: 0.6460435 test: 0.6585833 best: 0.6585833 (2) total: 5.32ms remaining: 3.55ms 3: learn: 0.6323740 test: 0.6514793 best: 0.6514793 (3) total: 6.99ms remaining: 1.75ms 4: learn: 0.6211555 test: 0.6421816 best: 0.6421816 (4) total: 8.69ms remaining: 0us

    bestTest = 0.6421816043 bestIteration = 4

    {'learn': {'MultiLogloss': 0.6211555005044889}, 'validation': {'MultiLogloss': 0.6421816043238042}}

    But, when I want to calculate test metrics directly, I see the strange results

    y_predict = classifier.predict_proba(X_test)
    
    print(eval_metric(label=y_test, approx=y_predict, metric='MultiLogloss'))
    print(catboost.metrics.MultiLogloss().eval(label=y_test, approx=y_predict))
    

    [0.7810892821977388 [0.7810892821977388]

    I implemented multi_log_loss function: Also, I implemented the multi_log_loss function:

    def multi_log_loss(approxes, label, weight=None):
        error_sum = 0.0
        weight_sum = 0.0
    
        for ind in range(approxes.shape[1]):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            e = numpy.exp(approxes[:, ind])
            p = e / (1 + e)
            error = -w * (label[:, ind] * numpy.log(p) + (1 - label[:, ind]) * numpy.log(1 - p))
            error_sum += error.sum()
        return error_sum / (len(approxes) * weight_sum)
        
        print(multi_log_loss(label=y_test, approxes=y_predict))
    

    0.7810892821977385

    Why do we have here 0.78, but during the train, it was 0.64? What am I doing wrong? P.S. HammingLoss is ok.

    opened by FiksII 0
  • Some structure fixes and improvements

    Some structure fixes and improvements

    Some structure fixes and improvements

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA.

    I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en.

    opened by elenaspb2019 5
  • Added the title

    Added the title

    Added the title, fixed the order

    I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA.
    opened by elenaspb2019 7
  • CatBoost crashes Colab session while training on GPU after several successful iterations

    CatBoost crashes Colab session while training on GPU after several successful iterations

    I am trying to train CatBoost on GPU in Colab to find optimal hyperparameters in optuna, but it crashes every time after several successful iterations. But when I train it on CPU - no errors occur.

    Here is my code:

    def objective(trial):
    
        params = {'objective': 'MultiClass',
                  'task_type': 'GPU',
                  'eval_metric': 'TotalF1',
                  'logging_level': 'Silent',
                  'n_estimators': trial.suggest_int('n_estimators', 500, 1000),
                  'learning_rate': trial.suggest_loguniform('learning_rate', 1e-3, 0.1),
                  'l2_leaf_reg': trial.suggest_loguniform('l2_leaf_reg', 1e-3, 1.0),
                  'bootstrap_type': 'Bayesian',
                  'depth': trial.suggest_int('depth', 3, 12),
                  'boosting_type': 'Plain'}
        
        model = CatBoostClassifier(**params)
        model.fit(train_data,
                  train_labels,
                  cat_features=categorical_features,
                  eval_set=[(valid_data, valid_labels)],
                  early_stopping_rounds=5)
        preds = model.predict(valid_data)
        score = f1_score(valid_labels, preds)
        return score
    
    study = optuna.create_study(direction='maximize', sampler=TPESampler())
    study.optimize(objective, n_trials=50, show_progress_bar=True)
    

    This code does 10-20 successful iterations and after that Colab session crashes. My train and validation dataset have no missing values.

    catboost version: 1.1.1 Operating System: Ubuntu

    UPD: This code works fine when using Kaggle GPU. Session crushes only in Colab.

    opened by MaxTeselkin 3
Releases(v1.1.1)
Owner
CatBoost
CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks.
CatBoost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.5k Dec 2, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
Learning embeddings for classification, retrieval and ranking.

StarSpace StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems: Learning wor

Facebook Research 3.8k Nov 29, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Nov 25, 2022
Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.

Python Inference Script(PyIS) Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and d

Microsoft 13 Nov 4, 2022
heuristically and dynamically sample (more) uniformly from large decision trees of unknown shape

PROBLEM STATEMENT When writing a randomized generator for some file format in a general-purpose programming language, we can view the resulting progra

John Regehr 4 Feb 15, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Nov 27, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Nov 17, 2022
SecMML: Secure MPC(multi-party computation) Machine Learning Framework

SecMML 介绍 SecMML是FudanMPL(Multi-Party Computation + Machine Learning)的一个分支,是用于训练机器学习模型的高效可扩展的安全多方计算(MPC)框架,基于BGW协议实现。此框架可以应用到三个及以上参与方联合训练的场景中。目前,SecMM

null 84 Nov 25, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 178 Nov 16, 2022
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 651 Nov 25, 2022
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

JinquanPan 56 Nov 28, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

nod.ai 85 Dec 1, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

iQIYI 128 Nov 18, 2022
Toy path tracer for my own learning purposes (CPU/GPU, C++/C#, Win/Mac/Wasm, DX11/Metal, also Unity)

Toy Path Tracer Toy path tracer for my own learning purposes, using various approaches/techs. Somewhat based on Peter Shirley's Ray Tracing in One Wee

Aras Pranckevičius 922 Nov 24, 2022
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.

OpenNMT 370 Nov 23, 2022
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Tencent 1.2k Nov 23, 2022
A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems

mpi-histo A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. T

Raj Shrestha 2 Dec 21, 2021
fast face classification

Fast Face Classification (F²C)—— An Efficient Training Approach for Very Large Scale Face Recognition Training on ultra-large-scale datasets is time-c

null 66 Nov 8, 2022