Implementations of FizzBuzz test, with different optimisations

Related tags

Debug fizzbuzz
Overview

fizzbuzz

Optimisation of classic FizzBuzz test.

supernaive

The least efficient implementation, with 3 ifs and two printfs per number. It is so inefficient that I don't even use it for comparison.

naive

A little bit optimised version, with 2 ifs (not counting the loop condition) and 1 printf per number.

unrolled

Single loop iteration for 15 numbers, it means for 15 numbers just one branch (vs 45 branches for naive) and 1 printf (vs 15 printfs for naive).

customprint

Generic printf replaced with custom print routine, tailored for this particular task.

customprint2

Like customprint, but buffer is filled in reverse, and adjacent Fizz and Buzz words are merged into single memcpy, as a result number of memcpy calls per iteration goes down from 15 to just 4. Courtesy of kariya-mitsuru.

reusebuf

Reuse buffer from previous iteration, update only the changed characters. Use x86_64 vector instructions for comparing buffers.

reusebuf2

Reuse buffer from previous iteration, update only the changed characters. Use x86_64 vector instructions for comparing buffers. Fill buffer in reverse to reduce number of memcpy calls. Courtesy of kariya-mitsuru.

multithreaded

Use worker threads to process the sets of numbers in parallel. Buffer is filled in reverse, like in reusebuf2.

Comparison

All tests are performed on Dell Latitude 7480 with Core i7-7600U (2 cores with hyperthreading) and 16 Gb RAM, OpenSUSE Leap 15.1, kernel 4.12.14, gcc 7.5.0.

Output redirected to /dev/null. Multithreaded implementation uses 4 worker threads, with load of 3M number per thread.

Implementation Time (sec.millisec) Relative to naive Relative to previous
supernaive 49.661 0.80
naive 39.650 1 1.25
unrolled 20.151 1.97 1.97
customprint 8.771 4.52 2.30
customprint2 6.695 5.92 1.31
reusebuf 4.490 8.83 1.49
reusebuf2 2.818 14.07 1.59
multithreaded 1.051 37.73 2.68
You might also like...
Unified interface for selecting different I2C implementations on Arduino platforms

AceWire Wrapper classes that provide a simple, unified interface for different I2C implementations on Arduino platforms. The code was initially part o

This is a beginner-friendly project aiming to build a problem-set on different data structures and algorithms in different programming languages.

DSAready Overview This is a beginner-friendly project that aims to create a problem-set for various Data Structures and Algorithms. Being a programmer

Different Example Programs from different programming languages

Example Programs Don't repeat the same program again. Code Refactoring and Code Cleanup are accepted Name the File According to the Program written in

A test showing a flipped bit in a file encrypted on two different machines

ChaCha ASM Test I have observed that the ChaCha cipher may have very rarely divergent code paths for AVX vs. SSE. I have seen this in earlier CryptoPP

TestFrame - This is a test framework that uses Raylib and ImGui together to help test and develop concepts
TestFrame - This is a test framework that uses Raylib and ImGui together to help test and develop concepts

This is a test framework that uses Raylib and ImGui together to help test and develop concepts. It is made using C++ because it uses classes for windows and views.

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common

CommonMark spec, with reference implementations in C and JavaScript

CommonMark CommonMark is a rationalized version of Markdown syntax, with a spec and BSD-licensed reference implementations in C and JavaScript. Try it

Sorting routine implementations in "template" C

sort.h Overview sort.h is an implementation of a ton of sorting algorithms in C with a user-defined type that is provided at include time. This means

C++ implementations of well-known (and some rare) algorithms, while following good software development practices

ProAlgos: C++ This project is focused on implementing algorithms and data structures in C++, while following good software engineering practices, such

Comparing the performance of Wave Digital Filter implementations

WDF Bakeoff Comparing performance between Wave Digital Filters implemented in C++ and Faust. Building First clone the repository and submodules: git c

Reference implementations of post-quantum cryptographic primitives

PQ Crypto Catalog Implementation of quantum-safe signature and KEM schemes submitted to NIST PQC Standardization Process. The goal is to provide an ea

Celeborn is a Userland API Unhooker that I developed for learning Windows APIs and Syscall implementations

Celeborn is a Userland API Unhooker that I developed for learning Windows APIs and Syscall implementations. It mainly detects and patches hooking instructions in NTDLL.dll file. All PRs are welcome!

Competitive Programming Implementations, Resources, Solutions, and Tools

In competitive programming contests, one must write computer programs capable of solving clear-cut problems under the given contraints and limits. Most competitive programmers use C++, Java, or Python.

Benchmarking algebraic effect handler implementations

Benchmarking Effect handlers. Currently supported libraries and languages are: kk (extension .kk), Koka. ml (extension .ml), multi-core OCaml.

MATLAB and C++ implementations of sideslip angle estimators
MATLAB and C++ implementations of sideslip angle estimators

sideslip-angle-vehicle-estimation MATLAB and C++ implementations of sideslip angle estimators Factor graph sideslip angle estimator Papers: "A Factor

Implementations of Multiple View Geometry in Computer Vision and some extended algorithms.

MVGPlus Implementations of Multiple View Geometry in Computer Vision and some extended algorithms. Implementations Template-based RANSAC 2D Line estim

Optimized implementations of the Number Theoretic Transform (NTT) algorithm for the ring R/(X^N + 1) where N=2^m.

optimized-number-theoretic-transform-implementations This sample code package is an implementation of the Number Theoretic Transform (NTT) algorithm f

The foundation for many embedded graphical display implementations

Ubuntu Frame Description The foundation for many embedded graphical display implementations. ubuntu-frame is a simple fullscreen shell (based on Wayla

Collection of C99 dynamic array implementations

darc darc stands for Dynamic ARray Collection. This repo hosts 3 type-generic C99 implementations : mga (Macro Generated Array) type-safe 0-cost abstr

Comments
  • Supernaive needs to be re-measured.

    Supernaive needs to be re-measured.

    Supernaive has reduced the amount of output due to pull request #3 by @mattn, so I think it is better to be re-measured.

    cf. My machine's results.

    supernaive 91.526s
    naive      61.591s
    
    opened by kariya-mitsuru 2
  • Reduce output & memcpy

    Reduce output & memcpy

    • Since num + 13 must always be output to check the number of digits, output directly to the output buffer, and delete the subsequent output of num + 13 (from both reuse & not reuse buffer cases).
    • To achieve the above, output in reverse order.
    • For num + 13, output the last two digits unconditionally because those digits are always changing. On the other hand, the other digits do not change much, so output changed digits only. (Changed from previous PR)
    • When reusing buffers, the data to check changed digits is obtained directly from the output buffer instead of memcpy-ing it to the temporary buffer.

    Changes that does not affect the performance

    • Merge adjacent memcpy when buffer is not reused.
    • Don't save pos elements that will not be used later.
    • Use _mm_loadl_epi64 to save extra space at the end of the output buffer, since 8 digits at most are enough to compare for digit checking. (Changed from previous PR)

    My machine's results.

    reusebuffer                  7.367s
    reusebuffer2 (previous PR)   5.859s
    reusebuffer2                 5.119s
    
    opened by kariya-mitsuru 1
  • reduce print count per iteration

    reduce print count per iteration

    recudeprint.c By changing the output count per iteration from 15 to 30, reduce the number of myitoa calls from 8/15 to 3/30. (The last digit is always the same for every iteration, so there is no need to output it by myitoa.)

    reduceandreuse.c Combine reduceprint.c & reusebuffer2.c

    My machine's results.

    reusebuffer2    5.119s
    reduceprint     4.740s
    reduceandreuse  4.167s
    
    opened by kariya-mitsuru 0
  • use three buffers of 100 numbers each and increase by 300

    use three buffers of 100 numbers each and increase by 300

    As n = n + 300 (mod 15), this algorithm takes three buffers of 100 numbers each, writes them into output, and increases numbers (that aren't divisible by 3 and 5) by 300in each buffer (using decimal addition with carries)

    opened by lightln2 0
Owner
Ilya Caramishev
Ilya Caramishev
A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)

Catch2 v3 is being developed! You are on the devel branch, where the next major version, v3, of Catch2 is being developed. As it is a significant rewo

Catch Org 16k Jan 8, 2023
A testing micro framework for creating function test doubles

Fake Function Framework (fff) A Fake Function Framework for C Hello Fake World! Capturing Arguments Return Values Resetting a Fake Call History Defaul

Mike Long 551 Dec 29, 2022
test framework

Photesthesis This is a small, experimental parameterized-testing tool. It is intended to be used in concert with another unit-testing framework (eg. C

Graydon Hoare 11 Jun 2, 2021
DotX64Dbg aims to provide a seamless way to write and test plugins for X64Dbg using .Net 5.0 and C#.

DotX64Dbg (EARLY ALPHA) Plugins and Scripting with C# for x64Dbg. Create Plugins for X64Dbg with ease DotX64Dbg aims to provide a seamless way to writ

ζeh Matt 7 Oct 16, 2022
🎃 Submit creative FizzBuzz solutions in any language you want! Open for beginners !

?? Hacktoberfest 2021 FizzBuzz Submit creative FizzBuzz solutions in any language you want! TL;DR: We're searching for creative/extraordinary/weird Fi

Shubh4nk 17 Oct 25, 2022
WebAssembly from Scratch: From FizzBuzz to DooM.

WebAssembly from Scratch: From FizzBuzz to DooM Exploring WebAssembly from scratch from a backend-person-point-of-view. A story in four acts. Welcome

Cornelius Diekmann 1.4k Dec 24, 2022
A collection of different Cellular Automata implementations for recreational purposes

A collection of different Cellular Automata implementations for recreational purposes

Tsoding 19 Oct 4, 2022
My implementations of Ray Tracing in One Weekend written in many different languages.

Ray Tracing in Many Languages This repository contains my implementation of the Ray Tracing in One Weekend book written in several different programmi

Joshua Vega 1 Oct 19, 2021
Pool is C++17 memory pool template with different implementations(algorithms)

Object Pool Description Pool is C++17 object(memory) pool template with different implementations(algorithms) The classic object pool pattern is a sof

KoynovStas 1 Nov 18, 2022
Unified interface for selecting different implementations for communicating with a TM1637 LED controller chip on Arduino platforms

AceTMI Unified interface for communicating with a TM1637 LED controller chip on Arduino platforms. The code was initially part of the AceSegment libra

Brian Park 0 Feb 2, 2022