There are a multitude of changes with this release; I'll try to describe them all at the level of detail they deserve.
Fusion of hashcat and oclHashcat (CPU + GPU)
This was a big milestone which I've planned for a very long time. If you know me, or if you're idling on our IRC channel (#hashcat on Freenode), you probably knew that this was on my agenda for many years. To those of you who don't know what I'm talking about:
There are two different versions of hashcat.
- One that was utilizing your CPU (hashcat)
- One that was utilizing your GPU (oclHashcat)
But that's changed...
This fusion became possible because of the following preparations:
- Going Open Source, which enabled the use of the JIT compiler.
- Provide the OpenCL kernels as sources instead of binaries for every hardware and algorithm.
- Full OpenCL device type integration (I'll explain later in detail).
- A complete rewrite of the SIMD handling from scratch.
The latter was important to make use of CPU specific extensions (like XOP, AVX2, etc) from within OpenCL. It also had a positive side-effect on GPU, because it reduced the number of registers required in the kernel to 1/Nth of the previous required registers where N is the SIMD width at which an hash-mode is running with.
Here are a few of the advantages of having just one fusioned tool:
- Supported hashes are now in sync. For example, oclHashcat had support to crack TrueCrypt container while hashcat did not.
- Supported options are now in sync. For example, oclHashcat had support for
--stdin while hashcat did not.
- It's no longer required to know all of the specific limits both programs have. For example, the maximum supported password- and salt-length.
- Tutorials and Videos you find in the wild will be less confusing. Some explained hashcat while others explained oclHashcat. This was often very frustrating for new users who may have been following along with a tutorial for the wrong application.
- Developers no longer need to back-port one hash-mode from hashcat to oclHashcat or vice versa. This means no more waiting for algorithms to appear in one version or another, you will be able to immediately use the algorithms on both CPU and/or GPU.
- Package maintainers can also integrate much more easily hashcat into a distribution package.
- A single tool means less dependencies. This could mean that you will see more distribution-specific packages in the near future.
- Last but not least, it's simply easier and more compact to say, and everyone knows what you're talking about when you say "hashcat".
Oh... speaking about hashcat CPU, to help distinguish them in the future, I'll rename it to hashcat-legacy.
Newly added hash-modes
- Android FDE (Samsung DEK)
- Kerberos 5 TGS-REP etype 23
- AxCrypt in memory SHA1
- Keepass 1 (AES/Twofish) and Keepass 2 (AES)
- PeopleSoft PS_TOKEN
- Windows 8+ phone PIN/Password
Some special notes about optimizations: Behind the WinZip KDF optimization
Support to utilize multiple different OpenCL platforms in parallel
Here's a list of OpenCL runtimes that are supported and have been tested by either myself, or some of the hashcat beta testers:
- AMD OpenCL runtime
- Apple OpenCL runtime
- NVidia OpenCL runtime (replaces CUDA)
- Mesa (Gallium) OpenCL runtime
- Pocl OpenCL runtime
- Intel (CPU, GPU and Accelerator) OpenCL runtime
I tried to stay as close as possible to the OpenCL specifications. That means, if you have a device which comes with an OpenCL runtime it should work. That could also be, for example, an OpenCL runtime that supports utilizing a FPGA. Some of the FPGA vendors, that provide such an OpenCL runtime have just not been available to me for testing.
Another addition to the support of mixed OpenCL platforms is the ability to run them in parallel and within the same hashcat session. Yes, that actually means you can put both an AMD and an NVidia GPU into your system and make use of both. There still may be some work needed to properly utilize multiple sets of drivers. More information may be provided on the wiki later.
In case you do not want a specific OpenCL runtime to be used, you can select specific platforms to be used with the new
--opencl-device-platforms command line option.
Support to utilize OpenCL device types other than GPU
When it comes to compatibility, oclHashcat was limited to just two different vendors: AMD and NVidia. They provide the fastest GPUs by far, and it was therefore important to support them, but there are many other options available that aren't even building a GPU.
As a result, hashcat will support the following device types:
- Anything else which comes with an OpenCL runtime
For example, Intel CPUs will now instantly pop up as an available OpenCL device after you've installed the Intel OpenCL runtime.
Support to utilize multiple different OpenCL device types in parallel
When I've redesigned the core that handles the workload distribution to multiple different GPUs in the same system, which oclHashcat v2.01 already supported. I thought it would be nice to not just support for GPUs of different kinds and speed but also support different device types. What I'm talking about is running a GPU and CPU (and even FPGA) all in parallel and within the same hashcat session.
Beware! This is not always a clever thing to do. For example with the OpenCL runtime of NVidia, they still have a 5-year-old-known-bug which creates 100% CPU load on a single core per NVidia GPU (NVidia's OpenCL busy-wait). If you're using oclHashcat for quite a while you may remember the same bug happened to AMD years ago.
Basically, what NVidia is missing here is that they use spinning instead of yielding. Their goal was to increase the performance but in our case there's actually no gain from having a CPU burning loop. The hashcat kernels run for ~100ms and that's quite a long time for an OpenCL kernel. At such a scale, spinning creates only disadvantages and there's no way to turn it off (Only CUDA supports that).
But why is this a problem? If the OpenCL runtime spins on a core to find out if a GPU kernel is finished it creates 100% CPU load. Now imagine you have another OpenCL device, e.g. your CPU, creating also 100% CPU load, it will cause problems even if it's legitimate to do that here. The GPU's CPU-burning thread will slow down by 50%, and you end up with a slower GPU rate just by enabling your CPU too (
--opencl-device-type 1). For AMD GPU that's not the case (they fixed that bug years ago.)
To help mitigate this issue, I've implemented the following behavior:
- Hashcat will try to workaround the problem by sleeping for some precalculated time after the kernel was queued and flushed. This will decrease the CPU load down to less than 10% with almost no impact on cracking performance.
- By default, if hashcat detects both CPU and GPU OpenCL devices in your system, the CPU will be disabled. If you really want to run them both in parallel, you can still set the option
1,2 to utilize both device types, CPU and GPU.
Here's some related information:
Added makefile native compilation targets; Adds GPU support for OSX and *BSD
To make it even easier for everyone to compile hashcat from sources (which hopefully also increases the number of commits from the community), I've decide to add a target for a native build. That should help to compile hashcat on Linux, OSX, *BSD and some other exotic operating systems.
But it turned out that I could not simply add a native compilation target to the Makefile without doing some preparations.
- For example, on Linux the first step was to achieve Linux FHS compatibility.
- Another preparation would be having a hashcat binary (without a .bin extension) somewhere located in
- Ideally a Makefile which provides a
DESTDIR variables to modify that and finally to have our files that need to be accessible by all users somewhere at
/usr/share/hashcat or so.
But when I started to implement that it turned out, again, that this is not fully ideal. There was still the problem of where to store pot files, dict files, etc. The logical answer was to add support for a home directory-specific folder. That folder is named
$HOME/.hashcat/ and it will be automatically created by hashcat. You can also remove it whenever you want (hashcat will continue to work and will recreate it as needed.)
In summary, the following changes were mandatory:
- Added a native Makefile target
- Added an install and uninstall Makefile target
- Added true Linux FHS compatibility
- Added separate Install-, Profile- and Session-folder
These changes are only active once the install target (
make install) is excecuted, those who choose not to install will use the source directory as it has been in the past.
Here's the full discussion:
Here's another piece of great news: There are no longer dependencies on AMD-APP-SDK, AMD-ADL, NV-CUDA-SDK, NV-Drivers, NV-NVML or NV-NVAPI.
Our first OSS version of oclHashcat just had too much dependencies; and they were all required to compile oclHashcat. We tried to provide a script to handle these for you (deps.sh), but you still had to download the archives yourself. That wasn't very comfortable and surely held back people from compiling oclHashcat, leaving them to use the binary version instead.
Having dependencies in general is not always bad, but it creates some overhead for both developers and package maintainers. Regular users usually do not notice this. Having no dependencies usually result in less features, so how did we manage to get rid of the dependencies while maintaining the features they provided at the same time?
The answer is simple. For both Linux and Windows we simply used their dynamic library loading capability instead of linking the libraries at compile time. So don't get me wrong here, we still use those libraries, we just load them at runtime.
This provides a lot of advantages for both users and developers, such as:
- The library
libOpenCL.so on Linux was load as-is. This was a problem when a user had a bad OpenCL installation that created
libOpenCL.so.1. Unless the user fixed the filename or created a link the binary would be unable to locate the library.
- The Windows binary becomes smaller since it does not need to ship the code, it reuses the code from your installed library.
- For developers, there is no longer a need to have a 32 bit and a 64 bit library object. That was always a problem with NVML provided by the Nvidia drivers; we had to manually symlink them to get them working.
- The installed library does not need to be of the same version as the one used by the person who compiled the hashcat binary. For example, if you remember this error you know what I'm talking about:
./oclHashcat64.bin: /usr/lib/x86_64-linux-gnu/libOpenCL.so.1: version 'OPENCL_2.0' not found (required by ./oclHashcat64.bin)
- Package maintainers should now have a really easy job. No more (compile-time) dependencies means way less work.
Added auto-tuning engine and user-configurable tuning database
The auto-tuning engine is exactly what it says it is, it automatically tunes the
-u parameters (aka workload) to a value which gives you the best performance to reach a specific kernel runtime.
To understand what that means you need to understand that the kernel runtime influences the desktop response time. If you don't care about desktop lags (because you have a dedicated cracking machine) you simply set
-w 3 and everything is fine. In that case, hashcat will optimize kernel runtime to a very efficient one. Efficient in terms of power consumption/performance. There's indeed a way for us to control how much power your GPU consumes while cracking. It's like a car. If you want to drive it with 220 km/h it consumes twice as much gas as if you run it with 200km/h. Well not exactly but you get the idea.
Having said that, the best way to control your workload is all about
-w now. There's still
-u, but this is mostly for development and debugging use. There's a total of 4 different workload settings, here's a snippet of
| # | Performance | Runtime | Power Consumption | Desktop Impact |
| 1 | Low | 2 ms | Low | Minimal |
| 2 | Default | 12 ms | Economic | Noticeable |
| 3 | High | 96 ms | High | Unresponsive |
| 4 | Nightmare | 480 ms | Insane | Headless |
-w setting will be default to number "2". But also number "1" could be interesting, in case you're watching an HD video, or if you're playing a game.
OK, so there's an auto-tuning engine that controls
-u, so what is that tuning database used for? If, for whatever reason, you do not like the setting the auto-tuning engine has calculated for you, you can force a specific
-u setting to be used. This also decreases the startup time a bit, because hashcat does not need to test runtimes with setting N and U.
But there's another setting to be controlled from within the database. It's the vector width, which is used within the OpenCL kernel. But note, not all kernel support a vector width greater than 1. The vector width can also be controlled with the new command line parameter
At this point I don't want to get too much into the details of the new auto-tuning engine, especially the database (hashcat.hctune). There's a lot of more information needed for you to make your own database.
Therefore, please read this dedicated thread: The Autotune Engine
Extended Hardware-Management support
With the increased interest in power consumption per GPU, vendors started to add complicated clock speed changes from inside the driver and the GPU BIOS. The problem with that is, some of the settings are related to the workload, some to the power consumption, and some to temperature. This can increase the complexity of troubleshooting hashcat issues (for example, if you are trying to determine why cracking performance has rather suddenly and dramatically dropped.) To prevent users sending in invalid "bug" reports related to performance, I decided to add the clock and memory rate of the current GPU to the status display. The user will notice the clocks jumping around as the speeds jump around and hopefully realize that there's something wrong with their setup.
Most of the time it's a cooling issue. In the past oclHashcat already showed the temperature in the status display, but the problem is that current drivers may try to hold a target temperature by either increasing the fan speed or by decreasing the clock rate. The latter case will lead the user to the false assumption their setup is well cooled; the speed dropped over time but since the temperature was not going up, they did not make the link that the clocks have been decreased.
Switching from NVAPI to NVML will be a very important change for setups using NVidia GPU and Windows. NVidia is actually distributing a 64 bit bit .dll for NVML with their latest driver version and hashcat will find the .dll by checking the Windows registry. If it does not find it, you can also simply copy the nvml.dll into hashcat installation folder (though that should not be necessary). There's another reason why we've switched to NVML. AMD users already had a workaround to disable the GPU bios trying to optimize power consumption. They simply switched on the flag
--powertune-enable which sets the maximum power the GPU can consume to 120%, the same way as you can do it by using e.g. MSI Afterburner. With hashcat, and because we're using NVML now, this option is also available to NVidia users.
There is still a sole exception of the nvapi, i.e. the usage of NVAPI calls in
ext_nvapi.c: hashcat needs this NVAPI dependency to recognize the core clock throttling in case temperatures exceed the threshold and become too high/hot. This is a configurable setting in Windows (for example, this may be modified with Afterburner.)
Added the option to quit at next restore checkpoint
One important user interface change that you might immediately recognize is the new checkpoint-stop feature. This new feature is visible at the status prompt, which now has a sixth option labeled
[c]heckpoint (in addition to the previous: [s]tatus, [p]ause, [r]esume, [b]ypass and [q]uit).
The goal of this new feature is to tell hashcat that it should delay stopping until it reaches the next restore point. Hitting the "q" key on your keyboard and "quitting" is not always the best choice; doing so will force hashcat to stop immediately, wherever the workload is. Since the restore option (
--restore) works on batched key space segments, this could lead to re-calculating work you have already done or even missing candidates alltogether when trying to restore your session.
Stopping at checkpoints will make sure a particular workload segment is completed and a checkpoint is reached before terminating. This means no duplicate work or lost candidates when restoring sessions. We could say this new feature is an intelligent version of quitting hashcat.
You will notice that the "Status" line in the status display will change to
Running (stop at checkpoint) whenever you enable this new feature.
However, if you have hit stop by mistake, or first decided to stop at the next checkpoint but then changed your mind, you can cancel the checkpoint stop just by hitting the
c key on your keyboard again. This will change from
Running (stop at checkpoint) back to
Running to let you know the checkpoint stop has been aborted.
Please note that quitting hashcat with the checkpoint-stop prompt option might take a little bit longer compared to stopping it with the "q" key. The total time depends on many factors, including the selected workload profile (
-w), the type of hashes you run (
-m), the total number of salts, etc.
In addition to all the improvements and newly added features, I'm always keen to optimize the performance.
The changes in performance from oclHashcat v2.01 to hashcat v3.00 largely depend on the combination of hash-mode and GPU. Here's a Spreadsheet that shows the changes in a more easy-to-read format, separated by hash-mode and GPU:
Hashcat v2.01 -> v3.00 performance comparison
Note that with older NVidia GPUs, and by old I mean before maxwell chipsets, there is a drop in performance. That is simply because NVidia's runtime isn't/wasn't optimized for OpenCL. They were made at a time when NVidia focused purely on CUDA and it seems they are not putting any effort in updating for older cards. In case you buy a NVidia GPU next time, just make sure it's of Shader Model 5.0 or higher.
Also note that the benchmarks for hashcat v3.00 were created using the option
--machine-readable which now can be used in combination with
--benchmark. This makes comparisons of the performance to older versions much easier. Also the time it takes to complete a full benchmark was reduced significantly. While it was around 45 minutes on hashcat v2.01, it's now just 20 minutes with hashcat v3.00 and that's including the new hash-modes, which were not available in v2.01.
I did not compare CPU performance of hashcat v2.01 to hashcat v3.00 but you can be sure it is either faster or at least even. Just one example, NTLM performance on my i7-6700 CPU increased from 95.64MH/s to 1046.1 MH/s, which is by the way new world record for cracking NTLM on CPU.
... and there still more, ... really!
If you want to know about all the changes please take a look at the redesigned
docs/changes.txt file. It includes all the fixed bugs and other changes, mostly interesting for developers, package maintainer and hashcat professionals.
Here's a small preview:
- Added support for [i]--gpu-temp-retain[/i] for NVidia GPU, both Linux and Windows
- Added option [i]--stdout[/i] to print candidates instead of trying to crack a hash
- Added human-readable error message for the OpenCL error codes
- Redesigned [i]--help[/i] menu layout
- Added [i]-cl-std=CL1.1[/i] to all kernel build options