Implementation and tutorial for a dynamic vector data-structure in C.

Related tags

Miscellaneous c-vec
Overview

Vec: A Dynamic Vector in C

The subject.md file contains exercises for this implementation if you want to do it yourself. In that case, don't peek at the solutions in vec.c beforehand!

The C-standard library doesn't offer a good dynamic data structure often found in other languages such vector in C++ or Vec in Rust. This is an overview of the idea, design and implementation of such data structure in C.

Features of a Dynamic Data Structure

When we talk about a dynamic data structures, we usually mean a container with a growing memory buffer. Such a buffer has a certain strategy to allocate more memory when the existing memory runs out. A typical strategy is to double the size of the buffer each time a limit is reached. This way we minimize the number of reallocations in the buffer which is our most expensive operation. While we still have space left in the buffer we can append to the buffer with negligible cost. When we reach the limit and reallocate, we incur the cost of the allocating the space for a new buffer and copying over the data from the old buffer.

Such data structures can be typed, but in this tutorial we will make our best to mimic the usefulness of vectors in other languages by providing a generic interface which will accept any type of element.

Dynamic Vector

Now let's implement a dynamic vector data structure. Let's consider the equivalent Rust called Vec. Vec implements tons of functions, but in essence it's a growable and shrinkable buffer of elements of type <T>, and the main operations are push and pop. Pushing a new element to the end of the buffer and "popping" a.k.a. removing the element from the end and returning you a pointer to the element (this is an important detail). In addition, we implement the insert and remove operations that allow you to specify an index to which the element is added or which is removed. Furthermore, we usually have a get method, or we can access the elements by their index using the [] syntax.

It's important to note that there is a difference in performance between the operations. Adding to the end of the array and deleting from the end will always be cheapest. Everything else, such as inserting an element at the beginning or in the middle of the array, will incur the penalty of copying data around.

However, in various tests I've made, those penalties have been quite small compared to the penalties that incur with different data structures claiming to solve this problem. Sure you can prepend to a linked list without removing or shuffling around elements, but in a dynamic vector you are dealing with data that is all laid out sequantially in memory. This is important because modern processors are very fast when the required data can be found in the cache. In a linked list or other pointer-type list, the cache misses from pointer indirection far outweigh any other gains in my experience.

Design

First, let's talk a little about design patterns. We should adopt conventions that make our API easy to reason with and practices that make the code safe and easy to debug.

  1. Allocation and deallocation happen at the same level as the variable declaration.

This first pattern helps to avoid unnecessary allocations leading to increased performance as well as easier time with leaks. Let's see what it means.

// This "constructor" returns an allocated pointer to a `t_foo`. But what if
// we didn't need a pointer, because we use t_foo in the caller's scope only?
t_foo *create_a_foo(int foosize);

// We could just return a `t_foo`, but now the problem is the error condition.
// You can't return a NULL if something goes wrong.
t_foo create_a_foo(int foosize);

// This is how we create a `t_foo` without allocating a pointer (if it wasn't
// already allocated) and we can return an integer representing some error condition.
int create_a_foo(t_foo *foo, int foosize);

// ...or if we want to be passing foo around directly as a parameter to
// another function, we can return a pointer to `t_foo`.
t_foo *create_a_foo(t_foo *foo, int foosize);
  1. Parameter order is (dst, src, params, fpointers).

We always have the destination before the source if a destination pointer or variable is applicable.

int foo(char *dst, char *src, int len, (*bar)(char *));

Struct

We create a struct called s_vec and typedef it to t_vec.

typedef struct s_vec
{
    unsigned char *memory;    // Pointer to the first byte of allocated memory.
    size_t  elem_size;  // Size of a vector element in bytes.
    size_t  alloc_size; // Total size of allocated bytes.
    size_t  len;        // Length of the used-up part of the vector in
                        // `elem_size` chunks.
}   t_vec;

When we access elements in the vector our bounds are 0 -> len - 1. We might have allocated more memory in total, but we will only access memory in the byte-range 0 -> len * elem_size.

Implementation

Here is our vec.h header file with implementation prototypes;

#ifndef VEC_H
# define VEC_H

#include "stdlib.h"
#include "unistd.h"
#include "string.h"
#include "stdbool.h"

typedef struct s_vec
{
    unsigned char   *memory;
    size_t          elem_size;
    size_t          alloc_size;
    size_t          len;
}   t_vec;

int     vec_new(t_vec *src, size_t init_alloc, size_t elem_size);
void    vec_free(t_vec *src);
int     vec_from(t_vec *dst, void *src, size_t len, size_t elem_size);
int     vec_resize(t_vec *src, size_t target_size);
int     vec_push(t_vec *src, void *elem);
int     vec_pop(void *dst, t_vec *src);
int     vec_copy(t_vec *dst, t_vec *src);
void    *vec_get(t_vec *src, size_t index);
int     vec_insert(t_vec *dst, void *elem, size_t index);
int     vec_remove(t_vec *src, size_t index);
int     vec_append(t_vec *dst, t_vec *src);
int     vec_prepend(t_vec *dst, t_vec *src);
void    vec_iter(t_vec *src, void (*f) (void *));
int     vec_map(t_vec *dst, t_vec *src, void (*f) (void *));
int     vec_filter(t_vec *dst, t_vec *src, bool (*f) (void *));
int     vec_reduce(void *dst, t_vec *src, void (*f) (void *, void *));
void    vec_sort(t_vec *src, int (*f)(void *, void *));

#endif

Construction and Destruction

Function vec_new will take an allocated pointer dst and allocate len * elem_size amount of bytes in the buffer.

int main(void)
{
    t_vec   v;
    int ret;

    ret = vec_new(&v, 10, sizeof(int));
    if (ret < 0)
        printf("Error!");
}

Deallocation:

int main(void)
{
    t_vec   v;
    int ret;

    ret = vec_new(&v, 10, sizeof(int));
    if (ret < 0)
        printf("Error!");
    vec_free(&v);
}

A handy from method that creates a vec out data passed in as a pointer.

int main(void)
{
    int     vals[] = {1, 2, 3};
    t_vec   v;
    int ret;

    ret = vec_from(&v, vals, 3, sizeof(int));
    if (ret < 0)
        printf("Error!");
    vec_free(&v);
}

Copying and Resizing

Resizing is used by many functions that we will implement later, so better do it now. In the resizing function it's handy to call the vec_copy method so we'll implement that as well.

The copy function is very simple and will only copy at most as many bytes as are available in the dst vector.

int main(void)
{
    int     vals[] = {1, 2, 3};
    t_vec   v;
    t_vec   d;

    vec_from(&v, vals, 3, sizeof(int));
    vec_new(&v, 3, sizeof(int));
    vec_copy(&d, &v);
    vec_free(&v);
    vec_free(&d);
}

The resizing function vec_resize will take in a target_size parameter and either shrink (destructively) or grow the vector to the target size copying the old contents over to the new alloaction.

static int vec_resize(t_vec *src, size_t target_size);

Manipulating the Elements

We have several functions that we use to manipulate the vector. We can push elements to the end of the array or pop them from the end or get a pointer to an element at any index in the vector. These are the least expensive operations to perform.

int main(void)
{
    int     vals[] = {1, 2, 3};
    int     ret;
    int     *ptr;
    t_vec   v;

    vec_from(&v, vals, 3, sizeof(int));
    vec_push(&v, &vals[0]);
    vec_push(&v, &vals[1]);
    vec_pop(&ret, &v);
    ptr = vec_get(&v, 0);
    vec_free(&v);
}

A little more involved operation is inserting or removing an element from any index in the vector.

int main(void)
{
    int     vals[] = {1, 2, 3};
    t_vec   v;

    vec_from(&v, vals, 3, sizeof(int));
    vec_insert(&v, &vals[0], 2);
    vec_remove(&v, 2);
    vec_free(&t1);
}

Iterator Methods

When we iterate our vector we usually use iterator functions. It's quite surprising how many use cases these few simple functions cover, reducing errors and boilerplate code.

A simple iterator takes in a function pointer that is called for each element of the vector, with a pointer to the element passed as an argument. Any modifications to the element will modify the original values.

void print_int(void *src)
{
    printf("%d\n", *(int *)src);
}

int main(void)
{
    t_vec   t1;
    int     base[] = {1, 2, 3, 4, 5};

    vec_from(&t1, base, 5, sizeof(int);
    vec_iter(&t1, print_int);
    vec_free(&t1);
}

A map function which copies over the current elements (this operation is non-destructive), sends each element to f and then appends the element to a new vector dst.

void *add_one(void *ptr)
{
    *(int *)src += 1;
}

void main()
{
    t_vec values;

    vec_new(&values, 10, sizeof(int));
    for (int i = 0; i < 10; i++)
        vec_push(&values, &i);
    vec_map(&even, &values, add_one);
}

Next a filtering function. Now the function pointer will return true or false indicating if the element should be added to the resulting vector.

bool filter_even(void *src)
{
    if (*(int *)src % 2 == 0)
        return (true);
    return (false);
}

void main()
{
    t_vec values;

    vec_new(&values, 10, sizeof(int));
    for (int i = 0; i < 10; i++)
        vec_push(&values, &i);
    vec_filter(&even, &values, filter_even);
}

Finally, a function that iterates the elements of the vector, and at each iteration, applies the function f that gets the acc element passed to vec_reduce as the first parameter (the "accumulator"), and the current element as the second parameter. With the accumulator, the results of iterating over the elements are reduced to one element such as in summing a list of integers.

void sum(void *acc, void *elem)
{
    *(int *)acc += *(int *)elem;
}

void main()
{
    t_vec   t1;
    int     base[] = {1, 2, 3, 4, 5};
    int     result = 0;

    vec_from(&t1, base, 5, sizeof(int));
    vec_reduce(&result, &t1, reduce_tester);
    assert(result == 15);
    vec_free(&t1);
}

Conclusions

That's a quick overview on how to implement a performant and easy-to-use dynamic data structure in C. This structure can be used as a fundamental building block in many programs. The implementation has been kept simple because frankly, it doesn't have to be more complicated. New functions could definitely be added to grow the functionality of t_vec even further.

Issues
  • Added vec_clear and vec_find. Implemented lazy init

    Added vec_clear and vec_find. Implemented lazy init

    • Implemented lazy initialization for the vector. Now if user passes init_len == 0 to vec_new, memory won't be allocated at this point, but when the vector is first used (push, insert, etc).
    • Implemented vec_clear that just changes the length of the vector to zero reserving all allocated space.
    • Implemented vec_find a simple search taking f comaprison method as arg.
    opened by juliuskoskela 2
  • Typo in header for vec_new

    Typo in header for vec_new

    There's a mismatch in the naming convention for the src/dst parameters in vec_new.

    In the exercise: https://github.com/juliuskoskela/c-vec/blob/main/subject.md#ex0-vec_new The parameters are named dst and init_alloc, but the corresponding parameters are named src and len in the header https://github.com/juliuskoskela/c-vec/blob/main/subject.md#ex0-vec_new

    opened by Caruychen 2
  • Proof read and modify return values of some functions

    Proof read and modify return values of some functions

    Fixing some typos and clarifying some concepts. Small fixes in the implementation itself, such as error checking and the return values of some functions (used to return the allocated size, now returns 1 if the operation was successful)

    opened by satukoskinen 0
Owner
Julius Koskela
Co-founder Locationews Ltd., Student at HIVE
Julius Koskela
An implementation of a weak handle interface to a packed vector in C++

Experimental handle container in C++ Overview Following on from c-handle-container, this library builds on the same ideas but supports a dynamic numbe

Tom Hulton-Harrop 12 Mar 11, 2022
Just getting started with Data Structure and Algorithms? Make your first contribution here and start the journey of learning DSA.

Getting Started ! ✨ If you are just beginning with open source then let's make your first contribution in this repository ! Contributing Tutorial ?? P

amega 3 Apr 18, 2022
Add anything about data structure and algorithm in any language.

Hacktoberfest 2021 Follow the README below to get started! Note : This repo is excluded from the Hacktoberfest but you can still contribute and the re

null 29 Jun 15, 2022
OpenVDB - Sparse volume data structure and tools

OpenVDB AX Nano Houdini Website | Discussion Forum | Documentation OpenVDB is an open source C++ library comprising a novel hierarchical data structur

Academy Software Foundation 1.7k Jun 23, 2022
Grand Programs for every Data Structure including all their operations

Grand-Programs-UE20CS203 Grand Programs for every Data Structure including all their operations Some idioms that I use, so you won't get confused I pr

Aditya Rao 1 Apr 1, 2022
CSE2122: Data Structure Lab

CSE2122: Data Structure Lab Array Traversing Inserting (int) Inserting (string) Deleting Sorting (Bubble sort: int type data) Sorting (Bubble sort: st

Fahim Ahammed Firoz 12 Feb 28, 2022
Contribute a Data Structure you coded or solve the problems given in the Description.md file in any language! Happy coding!

Pro Lang Contribute a Data Structure you coded or solve the problems given in the Description.md file (can be found in the both the folders) in any la

IEEE-IAS VIT 1 Jan 16, 2022
Programming tutorial series for creating LV2 plugins using C/C++ and turtle.

Programming LV2 Plugins From Scratch Programming tutorial series for creating LV2 plugins using C/C++ and turtle. Content Programming LV2 Plugins From

null 29 Jun 21, 2022
Writing a basic compiler frontend following LLVM's tutorial, with complete added supports Hindi and UTF-8 in general

सारस | SARAS Started with following LLVM's tutorial In development, a hobby project only JIT is broken right now, 'jit' branch par code hai uska Compi

Aditya Gupta 4 May 1, 2022
MDE is a model extraction tool that converts Destiny 2 dynamic models into fbx files supporting textures, skeletons, and all provided vertex data.

MDE is a model extraction tool that converts Destiny 2 dynamic models into fbx files. A dynamic model is one that is animated or is spawned in during the game.

Montague 31 Jun 20, 2022
This is a tool for software engineers to view,record and analyse data(sensor data and module data) In the process of software development.

![Contributors][Huang Jianyu] Statement 由于工具源码在网上公开,除使用部分开源项目代码外,其余代码均来自我个人,工具本身不包含公司的知识产权,所有与公司有关的内容均从软件包中移除,软件发布遵循Apache协议,任何人均可下载进行修改使用,如使用过程中出现任何问

HuangJianyu 34 May 5, 2022
IconVG is a compact, binary format for simple vector graphics: icons, logos, glyphs and emoji.

IconVG IconVG is a compact, binary format for simple vector graphics: icons, logos, glyphs and emoji. WARNING: THIS FORMAT IS EXPERIMENTAL AND SUBJECT

Google 607 Jun 15, 2022
A C++ fast and lightweight 3D vector library

Vector3D A C++ fast and lightweight 3D vector library. Usage On your C++ code include the file. #include <vector.h> Then, declare your vector and ini

Carlos del Valle 6 Jan 18, 2022
Tutorial to connect the Waveshare display ST7789V to the ESP32

ESP32-ST7789V This repository contains the required configuration to connect a display Waveshare ST7789V to an ESP32 board. The correct connection to

Pedro Paulo Amorim 1 Nov 14, 2021
A set of tutorial projects for creating a simple digital radio receiver based on the STM32G431KB microcontroller

simple-radio Обучающие проекты по созданию простого цифрового радиоприемника на базе микроконтроллера STM32G431KB. Разработка программ выполнялась в W

null 7 Apr 11, 2022
Specular Lighting in OpenGL (Followed tutorial by Michael Grieco)

Specular lighting in OpenGL Specular Lighting in OpenGL (Followed tutorials by Michael Grieco). 2022-02-13.01-55-21.mp4 Setup Clone $ git clone https:

Aniket Rajnish 4 Feb 18, 2022
Juce tutorial for beginners, with DSP introduction.

JUCE Tutorial Juce tutorial for beginners, with DSP basics. My teaching materials :D Warning: working in progress lesson 0: Setup lesson 1: Basic Basi

Wen-Yi Hsiao 4 Apr 14, 2022
An implementation of a Windows loader that can load dynamic-linked libraries (DLLs) directly from memory

memory-module-loader memory-module-loader is an implementation of a Windows loader that can load dynamic-link libraries (DLLs) directly from memory. T

SCYTHE 110 Jun 15, 2022
Highly efficent, caching, copy-on-write lua vector math library

lua-vec Table constructions in Lua are expensive, creating and destroying thousands of "vector" tables frame to frame in an engine can cause serious p

Dale Weiler 3 Jul 24, 2021