Table of contents
❗ We're keeping our codebase healthy by removing features that are end of life. Read the deprecation notice to check if you are affected.
Netdata open-source growth
- 7.6M+ troubleshooters monitor with Netdata
- 1.6M unique nodes currently live
- 3.3k+ new nodes per day
- Over 557M Docker pulls all-time total
- Over 60,000 stargazers on GitHub
New metric correlation algorithm (tech preview)
The Agent's default algorithm to run a metric correlations job (ks2) is based on Kolmogorov-Smirnov test. In this release, we also included the Volume algorithm, which is an heuristic algorithm based on the percentage change in averages between the highlighted window and a baseline, where various edge cases are sensibly controlled. You can explore our implementation in the Agent's source code
This algorithm is almost 73 times faster than the default algorithm (named ks2) with near the same accuracy. Give it a try by enabling it by default in your netdata.conf.
# enable metric correlations = yes
metric correlations method = volume
Cooperation of the Metric Correlations (MC) component with the Anomaly Advisor
The Anomaly Advisor feature lets you quickly surface potentially anomalous metrics and charts related to a particular highlight window of interest. When the Agent trains its internal Machine Learning models, it produces an Anomaly Rate for each metric.
With this release, Netdata can now perform Metric Correlation jobs based on these Anomalous Rate values for your metrics.
Metric correlations dashboard
In the past, you used to run MC jobs from the Node's dashboard with all the settings predefined. Now, Netdata gives you some extra functionality to run an MC job for a window of interest with the following options:
- To run an MC job on both Metrics and their Anomaly Rate
- To change the aggregation method of datapoints for the metrics.
- To choose between different algorithms
All this from the same, single dashboard.
What's next with Metric Correlations
Troubleshooting complicated infrastructures can get increasingly hard, but Netdata wants to continually provide you with the best troubleshooting experience. On that note, here are some next logical steps for for our Metric Correlations feature, planned for upcoming releases:
- Enriching the Agent with more Metric Correlation algorithms.
- Making the Metric Correlation component run seamless (you can explore the
/weights endpoint in the Agent's API; this is a WIP).
- Giving you the ability to run Metric Correlation Jobs across multiple nodes.
Be on the lookout for these upgrades and feel free to reach us in our channels with your ideas.
Tiering, providing almost unlimited metrics for your nodes
Netdata is a high fidelity monitoring solution. That comes with a cost, the cost of keeping those data in your disks. To help remedy this cost issue, Netdata introduces with this release the Tiering mechanism for the Agent's time-series database (dbengine).
Tiering is the mechanism of providing multiple tiers of data with different granularity on metrics by doing the following:
- Downsampling the data into lower resolution data.
- Keeping statistical information about the metrics to recreate the original* metrics.
Visit the Tiering in a nutshell section in our docs to understand the maximum potential of this feature. Also, don't hesitate to enable this feature to change the retention of your metrics
Note: *Of course the metric may vary; you can just recreate the exact time series without taking into consideration other parameters.
A Kubernetes Cluster can easily have hundreds (or even thousands) of pods running containers. Netdata is now able to provide you with an overview of the workloads and the nodes of your Cluster. Explore the full capabilities of the k8s_state module
Anomaly Rate on every chart
In a previous release, we introduced unsupervised ML & Anomaly Detection in Netdata with Anomaly Advisor. With this next step, we’re bringing anomaly rates to every chart in Netdata Cloud. Anomaly information is no longer limited to the Anomalies tab and will be accessible to you from the Overview and Single Node view tabs as well. We hope this will make your troubleshooting journey easier, as you will have the anomaly rates for any metric available with a single click, whichever metric or chart you happen to be exploring at that instant.
If you are looking at a particular metric in the overview or single node dashboard and are wondering if the metric was truly anomalous or not, you can now confirm or disprove that feeling by clicking on the anomaly icon and expanding the anomaly rate view. Anomaly rates are calculated per second based on ML models that are trained every hour.
For more details please check our blog post and video walkthrough.
Centralized Admin Interface & Bulk deletion of offline nodes
We've listened and understood the your pain around Space and War Room settings in Netdata Cloud. In response, we have simplified and organized these settings into a Centralized Administration Interface!
In a single place, you're now able to access and change attributes around:
- War Rooms
Along with this change, the deletion of individual offline nodes has been greatly improved. You can now access the Space settings, and on Nodes within which it is possible to filter all Offline nodes, you can now mass select and bulk delete them.
Agent and Cloud chart metadata syncing
On this release, we are doing a major improvement on our chart metadata syncing protocol. We moved from a very granular message exchange at chart dimension level to a higher level at context.
This approach will allow us to decrease the complexity and points of failure on this flow, since we reduced the number of events being exchanged and scenarios that need to be dealt with. We will continuously fix complex and hard-to-track existing bugs and any potential unknown ones.
This will also bring a lot of benefits to data transfer between Agents to Cloud, since we reduced the number of messages being transmitted.
To sum up these changes:
- The traffic between Netdata cloud and Agents is reduced significantly.
- Netdata Cloud scales smoother with hundreds of nodes.
- Netdata Cloud is aware of charts and nodes metadata.
Composite chart enhancements
We have restructured composite charts into a more natural presentation. You can now read composite charts as if reading a simple sentence, and make better sense of how and what queries are being triggered.
In addition to this, we've added additional control over time aggregations. You can now instruct the agent nodes on what type of aggregation you want to apply when multiple points are grouped into a single one.
The options available are: min, max, average, sum, incremental sum (delta), standard deviation, coefficient of variation, media, exponential weighted moving average and double exponential smoothing.
We've also put some effort to improve our light and dark themes. The focus was put on:
- optimizing space for the information that is crucial to you when you're exploring and/or troubleshooting your nodes.
- improving contrast ratios so that the components and data that are more relevant don't get lost among other noise.
Labels on every chart
Most of the time, you will group metrics by their dimension or their instance, but there are some benefits to other groupings. So, you can now group them by logical representations.
For instance, you can represent the traffic in your network interfaces by their interface type, virtual or physical.
This is still a work in progress, but you can explore the newly added labels on the following areas/charts:
- Mountpoints in your system
- Network interfaces both wired and wireless
- MD arrays
- Power supply units
- Filesystem (like BTRFS)
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer is essential to our success. We thank you and look forward to continue to grow together to build a remarkable product.
- @didier13150 for fixing boolean value for ProtectControlGroups in the systemd unit file.
- @kklionz for fixing a base64_encode bug in Exporting Engine.
- @kralewitz for fix parsing multiple values in nginx upstream_response_time in go.d/web_log.
- @mhkarimi1383 for adding an alternative way to get ansible plays to Ansible quickstart.
- @tnyeanderson for fixing netdata-updater.sh sha256sum on BSDs.
- @xkisu for fixing cgroup name detection for docker containers in containerd cgroup.
- @boxjan for adding Chrony collector.
⚙️ Enhancing our collectors to collect all the data you need.
- Add PgBouncer collector (go.d/pgbouncer) (#748, @ilyam8)
- Add WireGuard collector (go.d/wireguard) (#744, @ilyam8)
- Add PostgresSQL collector (go.d/postgres) (#718, @ilyam8)
- Add Chrony collector (go.d/chrony) (#678, @boxjan)
- Add Kubernetes State collector (go.d/k8s_state) (#673, @ilyam8)
⚙️ Enhancing our collectors to collect all the data you need.
Show 20 more contributions
- Add WireGuard description and icon to dashboard info (#13483, @ilyam8)
- Resolve nomad containers name (cgroups.plugin) (#13481, @ilyam8)
- Update postgres dashboard info (#13474, @ilyam8)
- Improve Chrony dashboard info (#13371, @ilyam8)
- Improve config file parsing error message (python.d) (#13363, @ilyam8)
- Rename the chart of real memory usage in FreeBSD (freebsd.plugin) (#13271, @vlvkobal)
- Add fstype label to disk charts (diskspace.plugin) (#13245, @vlvkobal)
- Add support for loadin modules from user plugin directories (python.d) (#13214, @ilyam8)
- Add user plugin dirs to environment variables (#13203, @vlvkobal)
- Add second data collection job that tries to read from '/var/lib/smartmontools/' (python.d/smartd) (#13188, @ilyam8)
- Add type label for network interfaces (proc.plugin) (#13187, @vlvkobal)
- Add k8s_state dashboard_info (#13181, @ilyam8)
- Add dimension per physical link state to the "Interface Physical Link State" chart (proc.plugin) (#13176, @ilyam8)
- Add dimension per operational state to the "Interface Operational State" chart (proc.plugin) (#13167, @ilyam8)
- Add dimension per duplex state to the "Interface Duplex State" chart (proc.plugin) (#13165, @ilyam8)
- Add cargo/rustc/bazel/buck to apps_groups.conf (apps.plugin) (#13143, @vkalintiris)
- Add Memory Available chart to FreeBSD (freebsd.plugin) (#13140, @MrZammler)
- Add a separate thread for slow mountpoints in the diskspace plugin (diskspace.plugin) (#13067, @vlvkobal)
- Add simple dimension algorithm guess logic when algorithm is not set (go.d/snmp) (#737, @ilyam8)
- Add common stub_status locations (go.d/nginx) (#702, @cpipilas)
🐞 Improving our collectors one bug fix at a time.
Show 17 more contributions
- Fix cgroup name detection for docker containers in containerd cgroup (cgroups.plugin) (#13470, @xkisu)
- Fix not handling log rotation (python.d/smartd) (#13460, @ilyam8)
- Fix kubepods patterns to filter pods when using Kind cluster (cgroups.plugin) (#13324, @ilyam8)
- Fix 'zmstat*' pattern to exclude zoneminder scripts (apps.plugin) (#13314, @ilyam8)
- Fix kubepods name resolution in a kind cluster (cgroups.plugin) (#13302, @ilyam8)
- Fix extensive error logging (cgroups.plugin) (#13274, @vlvkobal)
- Fix qemu VMs and LXC containers name resolution (cgroups.plugin) (#13220, @ilyam8)
- Fix duplicate mountinfo (proc.plugin) (#13215, @ktsaou)
- Fix removing netdev chart labels (cgroups.plugin) (#13200, @vlvkobal)
- Fix wired/cached/avail memory calculation on FreeBSD with ZFS (freebsd.plugin) (#13183, @ilyam8)
- Fix import collection for py3.10+ (python.d) (#13136, @ilyam8)
- Fix not setting connection timeout for pymongo4+ (python.d/mongodb) (#13135, @ilyam8)
- Fix not handling slow setting spec.NodeName for Pods (go.d/k8s_state) (#717, @ilyam8)
- Fix empty charts when ServerMPM is prefork (#715, @ilyam8)
- Fix parsing multiple values in nginx upstream_response_time (go.d/web_log) (#711, @kralewitz)
- Fix collecting metrics for Nodes with dots in name (go.d/k8s_state) (#710, @ilyam8)
- Fix adding dimensions to User CPU Time chart at runtime (go.d/mysql) (#689, @ilyam8)
Show 6 more contributions
📄 Keeping our documentation healthy together with our awesome community.
Show 23 more contributions
- Add a note about network interface monitoring when running in a Docker container (#13458, @ilyam8)
- Fix Anomaly Detection guide, so we can reference its subsections (#13455, @tkatsoulas)
- Fix a typo in PostgreSQL section header (#13440, @shyamvalsan)
- Add Discord, YouTube, LinkedIn links to README (#13419, @andrewm4894)
- Add ML bullet point to features section on README (#13418, @andrewm4894)
- Fix docs metadata fields (#13406, @tkatsoulas)
- Clarify python.d haproxy module readme (#13388, @ilyam8)
- Add missing openSUSE 15.4 to platform support list. (#13373, @Ferroin)
- Add another way to get ansible plays to Ansible quickstart (#13349, @mhkarimi1383)
- Add GitHub stars badge to readme (#13338, @andrewm4894)
- Explain new tiering mechanism in metric storage docs (#13327, @tkatsoulas)
- Add link to docker config section (#13323 , @cakrit)
- Add a guide for troubleshooting Agent with Cloud connection for new nodes (#13322, @Ancairon)
- Update External Plugins API doc (#13273, @thiagoftsm)
- Update REST API documentation. (#13269, @Ferroin)
- Add document explaining how to proxy Netdata via H2O (#13266, @Ferroin)
- Improve anomaly detection guide (#13238, @andrewm4894)
- Improve configuration example in ML readme (#13182, @andrewm4894)
- Docs housekeeping (#13179, @tkatsoulas)
- Add ML alerts examples (#13173, @andrewm4894)
- Improve "if collector not there" section in Collectors readme (#13152, @cakrit)
- Fix indentation in StatsD readme (#13096, @ilyam8)
- Add missing commands to daemon readme (#13080, @tkatsoulas)
Packaging / Installation
📦 "Handle with care" - Just like handling physical packages, we put in a lot of care and effort to publish beautiful software packages.
Show 25 more contributions
- Update go.d.plugin version to v0.34.0 (#13484, @ilyam8)
- Fix netdata-updater.sh sha256sum on BSDs (#13391, @tnyeanderson)
- Add Oracle Linux 9 to officially supported platforms (#13367, @Ferroin)
- Vendor Judy (#13362, @underhood)
- Add additional Docker image build with debug info included (#13359, @Ferroin)
- Fix not respecting CFLAGS arg when building Docker image (#13340, @ilyam8)
- Remove python-mysql from install-required-packages.sh (#13288, @ilyam8)
- Remove obsolete --use-system-lws option from netdata-installer.sh help (#13272, @Dim-P)
- Fix issues with DEB postinstall script (#13252, @Ferroin)
- Don’t pull in GCC for build if Clang is already present. (#13244, @Ferroin)
- Upload packages to new self-hosted repository infrastructure (#13240, @Ferroin)
- Bump repoconfig package version used in kickstart.sh (#13235, @Ferroin)
- Properly handle interactivity in the updater code (#13209, @Ferroin)
- Don’t use realpath to find kickstart source path (#13208, @Ferroin)
- Ensure tmpdir is set for every function that uses it (#13206, @Ferroin)
- Add netdata user to secondary group in RPM package (#13197, @iigorkarpov)
- Remove a call to 'cleanup_old_netdata_updater()' because it is no longer exists (#13189, @ilyam8)
- Don’t manipulate positional parameters in DEB postinst script (#13169, @Ferroin)
- Add CAP_SYS_RAWIO to Netdata's systemd unit CapabilityBoundingSet (#13154, @ilyam8)
- Add netdata user to secondary group in DEB package (#13109, @iigorkarpov)
- Fix updating when using
--force-update and new version of the updater script is available (#13104, @ilyam8)
- Remove unnecessary ‘cleanup’ code (#13103, @Ferroin)
- Remove official support for Debian 9. (#13065, @Ferroin)
- Add openSUSE Leap 15.4 to CI and package builds. (#12270, @Ferroin)
- Fix boolean value for ProtectControlGroups in the systemd unit file (#11281, @didier13150)
Other Notable Changes
⚙️ Greasing the gears to smoothen your experience with Netdata.
Show 19 more contributions
- Enable rrdcontexts by default (#13471, @stelfrag)
- Add rrdcontext support for hidden charts (#13466, @ktsaou)
- Load host labels for archived hosts (#13464, @stelfrag)
- Add /api/v1/weights endpoint (#13449, @ktsaou)
- Add stats about currently collected metrics and disk space to tiering endpoint (#13445, @ktsaou)
- Show last 15 alerts in notification (#13434, @MrZammler)
- Add tiering statistics API endpoint (#13420, @ktsaou)
- Send chart context with alert events to the cloud (#13409, @MrZammler)
- Send node info message sooner (#13348, @MrZammler)
- Use new MQTT as default (#13287, @underhood)
- Better ACLK debug communication log (#13281, @underhood)
- Add Multi-Tier database backend for long term metrics storage (#13263, @stelfrag)
- Add natural and virtual points support to Query Engine (#13248, @ktsaou)
- Delay health until obsoletions check is complete (#13239, @MrZammler)
- Enable ML by default (#13158, @andrewm4894)
- Add multi-granularity support to Query Engine and MC improvements (#13155, @ktsaou)
- Add an option to use malloc for page cache instead of mmap (#13142, @stelfrag)
- Significantly improve metrics correlations (73x times faster) (#13107, @ktsaou)
- Add SSL received/send bytes statistics to ACLK (#13091, @underhood)
🐞 Increasing Netdata's reliability one bug fix at a time.
Show 16 more contributions
- Fix crash on Agent startup if data rotation needs to be done (#13473, @stelfrag)
- Fix agent crash when archived host has not been registered to the cloud (#13437, @stelfrag)
- Fix gap filling on dbengine gaps (#13417, @MrZammler)
- Fix 32bit calculation on array allocator (#13343, @ktsaou)
- Fix crash on start on slow disks because ml is initialized before dbengine starts (#13342, @ktsaou)
- Fix crash when the host_labels health line contains the name/value of a label that does not exist on the host (#13305, @MrZammler)
- Fix incorrect dimension names in Redis alarms (#13296, @ilyam8)
- Fix Query Engine alignment (#13282, @ktsaou)
- Fix vbi parser in mqtt5 implementation (#13277, @underhood)
- Fix alignment in charts endpoint (#13275, @thiagoftsm)
- Fix RAM calculation on macOS in system-info (#13260, @ilyam8)
- Fix data query on stale chart (#13159, @stelfrag)
- Fix crashes due to misaligned allocations (#13137, @ktsaou)
- Fix buffer overflow detected by the compiler (#13120, @ktsaou)
- Fix 100% CPU when using SSL and a child disconnect from a parent (#13112, @thiagoftsm)
- Fix virtualization detection on FreeBSD (#13087, @ilyam8)
🏋️ Changes to keep our code base in good shape.
Show 49 more contributions
- Handle cases where entries where stored as text (with strftime("%s")) (#13472, @stelfrag)
- Get last_entry_t only when st changes (#13448, @MrZammler)
- Store host label information in the metadata database (#13441, @stelfrag)
- Fix tests so that the actual metadata database is not accessed (#13439, @stelfrag)
- Delete aclk_alert table on start streaming from seq 1 batch 1 (#13438, @MrZammler)
- Query queue only for queries (#13431, @underhood)
- Add missing comma (handle coverity warning CID 379360) (#13413, @stelfrag)
- Remove python.d/web_log alarms (#13404, @ilyam8)
- Store host system information in the database (#13402, @stelfrag)
- Fix coverity issue 379240 (Unchecked return value) (#13401, @stelfrag)
- Fix bitmap unit tests (#13374, @stelfrag)
- Remove python.d collectors announced in v1.35.0 deprecation notice (#13370, @ilyam8)
- Address Coverity issues (#13364, @stelfrag)
- Omit first point if not needed in Query Engine (#13345, @ktsaou)
- Fix coverity 379241 (#13336, @MrZammler)
- Add Rrdcontext in memory indexing (#13335, @ktsaou)
- Detect stored metric size by page type (#13334, @stelfrag)
- Silence compile warnings on external source (#13332, @MrZammler)
- Add UpdateNodeCollectors message (#13330, @MrZammler)
- Fix Cid 379238 379238 (#13328, @stelfrag)
- Fix two helgrind reports (#13325, @vkalintiris)
- Add array allocator for dbengine page descriptors (#13312, @ktsaou)
- Protect shared variables with log lock. (#13306, @vkalintiris)
- Null terminate string if file read was not successful (#13299, @stelfrag)
- Remove deprecated modules from python.d.conf (#13264, @ilyam8)
- Remove warnings while compiling ML on FreeBSD (#13255, @thiagoftsm)
- Remove strftime from statements and use unixepoch instead (#13250, @stelfrag)
- Updates the sqlite version in the agent (#13233, @stelfrag)
- Migrate data when machine GUID changes (#13232, @stelfrag)
- Add more sqlite unittests (#13227, @stelfrag)
- Add Netdata doubles (#13217, @ktsaou)
- Print INTERNAL BUG messages only when NETDATA_INTERNAL_CHECKS is enabled (#13207, @MrZammler)
- Add hostname in the worker structure to avoid constant lookups (#13199, @stelfrag)
- Allow for an easy way to do metadata migrations (#13196, @stelfrag)
- Add dictionaries with reference counters and full deletion support during traversal (#13195, @ktsaou)
- Add configuration for dbengine page fetch timeout and retry count (#13194, @stelfrag)
- Clean sqlite prepared statements on thread shutdown (#13193, @stelfrag)
- Set default for
minimum num samples to train to
900 (#13174, @andrewm4894)
- Remove warnings when openssl 3 is used. (#13170, @thiagoftsm)
- Fix coverity issues (#13168, @stelfrag)
- Allow traversing null-value dictionaries (#13162, @ktsaou)
- Use memset to mark the empty words in the quoted_strings_splitter function (#13161, @stelfrag)
- Fix labels unit test (#13156, @stelfrag)
- Use ks2 as MC default (#13131, @andrewm4894)
- Allow label names to have slashes (#13125 , @ktsaou)
- Fix coveriry 379136 379135 379134 379133 (#13123, @ktsaou)
- Removes Legacy JSON Cloud Protocol Support In Agent (#13111, @underhood)
- Add labels with dictionary (#13070, @ktsaou)
- Fix coverity 378587 (#13024, @MrZammler)
The following items will be removed in our next minor release (v1.37.0):
Patch releases (if any) will not be affected.
| Component | Type | Will be replaced by |
| python.d/postgres | collector | go.d/postgres |
All the deprecated components will be moved to the netdata/community repository.
Deprecated in this release
In accordance with our previous deprecation notice, the following items have been removed in this release:
| Component | Type | Replaced by |
| python.d/chrony | collector | go.d/chrony |
| python.d/ovpn_status_log | collector | go.d/openvpn_status_log |
Netdata Release Meetup
Join the Netdata team on the 11th of August for the Netdata Agent Release Meetup, which will be held on the Netdata Discord.
Together we’ll cover:
- Release Highlights
- Q&A with the community
We look forward to meeting you.
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- Github Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- Github Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord: Jump into the Netdata Discord and hangout with like-minded sysadmins, DevOps, SREs and other troubleshooters. More than 1100 engineers are already using it!