Ideas, thoughts, and notes on a typeclass/interface based polymorphism pattern for standard C

Overview

Polymorphism through Typeclasses / Interface / Traits

Ideas, thoughts, and notes on an action based polymorphism pattern for good ol' C. Originally used in c-iterators, and explained in a small document.

This is meant to be an extension to the aforementioned document. In reality, this pattern was supposed to be a major focus point of c-iterators. But I realized that I needed another repo to log my ideas and thoughts about this pattern.

You're free to use this pattern and the related ideas discussed in this repository. Although, there is a LICENSE file, I don't expect people to have to include it everywhere. Attribution is all I ask for.

A brief introduction

Before we move on to implementations, I expect you to be familar with action based polymorphism. For the OO programmer, this is an Interface. For the Functional programmer, this is a Type class. Though in reality, the implementation is more akin to Trait objects than real typeclasses.

Goals

  • Type safety - try not to make "user" interfaces (i.e concrete implementations) use void*.

  • Full, and strict standard C[1] conformance - no hacky strict alias violating shenanigans.

    [1] Core idea supports C90; examples use compound literals (C99) for convenience (not required); further (highly optional) abstractions may require C99 or even C11

  • Extensible and usable in libraries (unlike _Generic) - possible through dynamic dispatch.

  • Open to being used with existing C libraries. Implementing typeclasses should not have special requirements.

  • As transparent as possible, especially from a usage perspective.

  • Base polymorphism around actions (abilities), not objects.

Core Idea

Reference code: barebones.c

A struct that contains 2 members-

  • The concrete type itself, to use with the respective functions (abilities of the type) - self
  • A vtable containing function pointers to the exact implementations of respective abilities for the specific type - tc

Feel free to name these members whatever you'd like. I used self, since it's widely used in this context, and tc, for "type class".

This struct has to be polymorphic. As in, the self member should be of type void*. The bonafide polymorphic type in C. This way, functions can simply ask for this struct to constrain types to ones that can do certain actions. The function can then call whatever function they need to call through the tc member, and pass in the self member.

This is what that'd look like in psuedocode-

typedef struct typeclass_name_vtable
{
    ReturnType (*const func_name)(void* self, ...);
    ... /* More "abilities" */
} TypeclassName_vtable;

typedef struct typeclass_name
{
    void* self;
    TypeclassName_vtable const* tc;
} TypeclassName;

The latter struct is called a Typeclass Instance.

A polymorphic function could then look like-

void poly_foo(TypeclassName x)
{
    /* Use x's abilities here */
    x.tc->func_name(self);
    ...
}

Coming back to the real world, you need to actually be able to turn concrete types into a typeclass. For that, you first need the concrete implementations for the abilities required by the typeclass. Once again, this is what that'd be like in psuedocode-

/* Assume `T` is a concrete type */
typedef some_type T;

/* The `func_name` ability (shown above) impl for `T` */
ReturnType T_func_name(T* self, ...);

Assuming this is the only ability required by a certain typeclass, you can now make a function to convert T to that typeclass-

TypeclassName T_to_TypeclassName(T* x)
{
    static TypeclassName_vtable const tc = {.func_name = (ReturnType (*const)(void*, ...))(T_func_name) };
    return (TypeclassName){.tc = &tc, .self = x};
}

This is called the Implementation Function. As expected, it accepts a pointer to that concrete type (since it has to be assignable to void*), wraps it around its typeclass instance, and returns it. The lifetime of the returned struct is the same as the lifetime of the data pointed to by the given pointer.

In general, neither this typeclass struct, nor the typeclass functions should take ownership of the concrete type. Though this isn't a forced requirement, just a suggested one.

That's all there is to it! This is the typeclass pattern (or interface pattern). These typeclass structs can be used in libraries to have polymorphic functions. The correct functions will be dynamically dispatched to.

The code snippets above are all psuedocode for a general idea. This core idea is used to define and implement a Show typeclass for an enum in barebones.c.

Combining multiple typeclasses/interfaces

Reference code: barebones-combined.c

In real world code, you'll require types that implement multiple typeclasses/interfaces. You can encode that by having a struct containing the usual self member, and multiple vtables - each corresponding to a specific typeclass.

Consider 2 typeclasses- Show and Enum, their vtable structs looks like-

typedef struct
{
    char* (*const show)(void* self);
} ShowTC;

typedef struct
{
    int (*const from_enum)(void* self);
} EnumTC;

You can combine these with-

typedef struct
{
    void* self;
    ShowTC const* showtc;
    EnumTC const* enumtc;
} ShowEnum;

Now, you have a typeclass instance - that requires multiple typeclass implementations. You'd wrap a type into this combined typeclass instance by obtaining the Show and Enum typeclass implementations for that type (by calling the implementation functions), extracting the vtables from those instances and putting them into this struct.

If you implemented Show for int, and named the implementation function int_to_show, implemented Enum for int, and named the implementation function int_to_enum, this whole process would look like-

int x = 42;
ShowEnum shen = { .self = &x, .showtc = int_to_show(&x).tc, .enumtc = int_to_enum(&x).tc };

Feel free to generalize this into a function. This concept is showcased in barebones-combined.c.

You now have full, type safe, and flexible polymorphism. Usable in any context where you can use regular types. Function arguments, container elements, polymorphic return values etc. Many of these polymorphic types will be the combination of many typeclasses. Which may feel somewhat dry, and repetitive. However, the core idea is intentionally barebones. You can always design macros around these to make it less dry. Or, you may choose to simply have this completely transparent, your choice.

For a few macros more

Reference code: barebones-macro.c

This pattern, as implemented with maximum transparency above, may seem rather unintuitive for the implementor. It's very easy for the implementor of a typeclass to make mistakes in the way showcased above. You can, instead, make a macro to generalize the implementation-

#define impl_TypeclassName(T, Name, func_name_f)                                                                       \
    TypeclassName Name(T* x)                                                                                           \
    {                                                                                                                  \
        ReturnType (*const func_name_)(T* self, ...) = (func_name_f);                                                  \
        (void)func_name_;                                                                                              \
        static TypeclassName_vtable const tc = {.func_name = (ReturnType (*const)(void*, ...))(func_name_f) };         \
        return (TypeclassName){.tc = &tc, .self = x};                                                                  \
    }

This is very similar to the T_to_TypeclassName above. But with a touch more "features".

ReturnType (*const func_name_)(T* self, ...) = (func_name_f);
(void)func_name_;

These 2 lines are a no-op, but they are very important to the goal of this pattern- Type safety. The implementation functions should take in the exact concrete type. Implementations are inherently "user targeted" - the user should not be using errant void pointers.

The first line ensures the function implementation is the exact type it needs to be, with void* self substituted for T* self. If you accidentally provided a function implementation with incorrect type, you'll know it.

The second line silences the "unused variable" warning and allows the 2 lines to be a no-op.

Otherwise, the macro is an exact analog of T_to_TypeclassName, it takes the concrete type the implementation is for, the name to define this wrapper function as, and all the required function implementations.

You can now simplify the implementation for T above-

impl_TypeclassName(T, T_to_TypeclassName, T_func_name)

The code snippets above are all psuedocode for a general idea. This concept is used to define a Show typeclass for an enum in barebones-macro.c.

With more constraints

NOTE: The ideas and abstractions described in this section are not integral enough to the actual pattern. If you want maximum transparency, you may safely ignore this.

Reference code: name-constrained.c

Constraints are a core part of abstractions. With more constraints, you can have more predictability - allowing for more "syntax sugar" macros.

If the user is disallowed from choosing their own names for the implementation functions - you get predictability. Which allows you to make a macro to wrap a user given type into the necessary typeclass. You'll need to abstract out the impl function naming in the impl_ macro to have consistent naming-

#define CONCAT_(x, y) x ## y
#define CONCAT(x, y) CONCAT_(x, y)

/* Consistently name the impl functions */
#define ImplName(T, TypeclassName) CONCAT(CONCAT(T, _to_), TypeclassName)

#define impl_TypeclassName(T, func_name_f)                                                                             \
    TypeclassName ImplName(T, TypeclassName)(T* x)                                                                     \
    {                                                                                                                  \
        ReturnType (*const func_name_)(T* self, ...) = (func_name_f);                                                  \
        (void)func_name_;                                                                                              \
        static TypeclassName_vtable const tc = {.func_name = (ReturnType (*const)(void*, ...))(func_name_f) };         \
        return (TypeclassName){.tc = &tc, .self = x};                                                                  \
    }

Now, if you implemented TypeclassName for your type T, the function used to turn T* into TypeclassName would just be ImplName(T, TypeclassName). With this knowledge, you can abstract out the "wrapping T into TypeclassName" part-

/* "Apply" a typeclass over a concrete type */
#define ap(x, T, TypeclassName) ImplName(T, TypeclassName)(x)

You can now simplify the implementation for T as well as the wrapping-

impl_TypeclassName(T, T_func_name)

int main(void)
{
    /* Initiate the concrete value */
    T val = ...;
    TypeclassName x = ap(&val, T, TypeclassName);
}

The code snippets above are all psuedocode for a general idea. This concept is used to define a Show typeclass for an enum in name-constrained.c.

Here be meta-macros

NOTE: The ideas and abstractions described in this section are not integral enough to the actual pattern. If you want maximum transparency, you may safely stop ignore this.

Ah, meta programming with macros. Remarkable projects like metalang99, C99-Lambda, obj.h, clofn.h and many more, really showcase how ridiculously strong (and evil) the C preprocessor can be in the right hands. Surely, meta macros can be used here too? To make some magical abstractions?

Well yes, not entirely sure how magical they would be though. I'm not going to go through the implementations of these - I'd rather just discuss the concepts. Following is a list of "abstractions", wrapping around the typeclass pattern, that you can implement with macros.

Default implementations

In many cases, typeclasses or interfaces have certain functions that aren't required to be implemented - and instead some default implementation is used. You can do this by having a variadic argument (...)[1] on the impl_ macro, representing the optional function implementations. You'll then need macros to work with these variadic macros, and figure out which optional functions were provided - and which weren't. This isn't a new concept and is already used in many of the meta macro projects mentioned above. You can also find an implementation of this on Stack Overflow.

For implementations that were provided, you simply use them - as long as they pass typecheck. For ones that weren't provided, you use some default implementation you have defined already.

[1] In standard C, A variadic argument represents 1 or more arguments. Not 0 or more. This means that you may have to have a dummy argument to really achieve the "0 or more" arguments concept that you'll need for truly optional arguments.

Less redundancy for the definer

In general, defining all the typeclasses and its respective impl_ macro is very similar. By spamming enough meta macros, you should be able to abstract out the defining part completely.

In general, you could have a singular macro to define the vtable and the typeclass instance together - it just needs to know some information about the functions. Next, you need a general impl macro. It should be able to deduce information about the functions of a typeclass (possibly through an object like macro), and define a function similar to how the current impl_ macro does. You'll definitely need mapping for this.

Limitations

  1. Polymorphic return types, i.e when the return type is a typeclass instance, generally involve heap allocation. This is because the self member is of type- void*. You can only assign pointers to it. But you can't assign the address of a local variable since its lifetime ends after the function returns.

  2. There's no way to have a function's return type, be the exact same as a polymorphic input (argument) type. This is because there's no way to know the exact type wrapped inside a typeclass. You can return the same polymorphic type. But in many cases, this isn't what you'd want.

    Consider addition- (+) :: Num a => a -> a -> a - 2 arguments and a return value, all of the same type. As long as the type implements Num. If you use (+) with ints, the return value is an int, with floats, the return value is a float.

    You simply can't do this in C, since there's no way to capture those types. You could return the Num typeclass instance itself. But the only thing you can (safely) do with that return value, is more Num operations.

  3. As an extension to the point 2, Return type polymorphism (not to be confused with polymorphic return types) is simply not possible (safely). Which means functions like Enum a => toEnum :: Int -> a cannot be implemented.

  4. It requires extra effort to pass combined typeclass instances to functions expecting less typeclass implementations.

    Suppose you have the combined typeclass instance Foo. It contains the usual self member, and vtables for 3 other typeclasses Atc, Btc, Ctc. You want to use this with a function that just wants a type implementing Atc. You need to manually extract the Atc vtable from Foo, the self member, and then create soley the Atc typeclass instance to be able to use it with the aforementioned function.

  5. Type safety, on functions taking multiple typeclass instance arguments, but requiring those arguments to be backed up by the same concrete type, cannot be guaranteed.

    Consider the compare function- Ord a => compare :: a -> a -> Ordering (assume Ordering is typedef enum { LT = -1, EQ = 0, GT = 1 } Ordering;) - 2 polymorphic arguments, bounded by the Ord typeclass, but required to be the same concrete type. It wouldn't make sense to compare an int with a char* - and yet both can be wrapped inside a Ord typeclass instance, as long as they implement it. So if you have 2 Ord typeclass instances, and you want to call compare from one of them, pass in self from both Ord instances - there's no guarantee that both instances are actually wrapping the same type.

    There's a solution to this though, but you must do it manually. Before calling compare, verify equality of the tc members of both Ord instances. If they're wrapping the same type - obtained from their respective implementation function - the typeclass address is the exact same.

Motivation

It's pretty common for people to ask for polymorphism after they've written enough C. Thankfully, there's no shortage of demonstrations, helper headers, and crazy cool meta macros for implementing OOP polymorphism in C. But I was looking for an action oriented pattern with as much type safety as possible.

Specifically, I wanted something like Haskell typeclasses, or Java/C# interfaces, or Rust/Scala traits. A sort of "interface" that declares a bunch of functions with specific types, leaving in a polymorphic self or the like. The exact implementations, for which, a concrete type must fill. This allows you to make polymorphic actions that ask for a type implementing a certain "interface".

The end result, after about a week of experimenting and refining, is this pattern. Over many refinements and re-evaluations, I think the final result is an actually extensible and practical polymorphism pattern based around actions. It's not a perfect encoding of actual typeclasses (especially not the haskell ones - those are extremely high level). But it's probably as good as it gets.

Issues
  • UB due to incompatible function pointer types

    UB due to incompatible function pointer types

    Interface function implementations accept T * as a first parameter, whereas a corresponding function in a virtual table accepts void *, thereby making these two function types incompatible (godbolt).

    Moreover (6.2.7 Compatible type and composite type):

    All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.

    And I am curious about this code:

    TypeclassName T_to_TypeclassName(T* x)
    {
        static TypeclassName_vtable const tc = {.func_name = (ReturnType (*const)(void*, ...))(T_func_name) };
        return (TypeclassName){.tc = &tc, .self = x};
    }
    

    Here, .func_name and T_func_name are two declarations to the same function having incompatible types. Then we cast T_func_name to the type accepting void * as a first parameter; we cast incompatible types. Does it even follow the standard, or it is UB?

    The C99 draft standard: http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

    opened by Hirrolot 5
  • Correct the wrong links

    Correct the wrong links

    First of all, thanks for sharing these ideas to provide polymorphism using C, I really enjoy them a lot!

    For this PR, I found the wrong links of your C code and they are corrected here.

    opened by RinHizakura 1
Owner
Chase
Don't ya love the smell of bugs and glitches in the mornin'
Chase
The Repository Contains the Set Way of Learning Cpplus With the help of programs And Notes.

Preface Since the C++ language varies so heavily between versions (e.g. C++0x, C++11, C++17, etc.), I will preface this cheat sheet by saying that the

Pawan Roshan Gupta 5 Jan 11, 2022
Programs and my Notes from the course: "Beginning c++ Programming - From Beginner to Beyond" by Dr. Frank J. Mitropoulos

Project Info Technology Stack Linux (Arch) Visual Studio Code GCC 11.1.0 (since GCC 11.1 the default target is gnu++17, a superset of C++17) Source Ud

Alok Shandilya 1 Oct 22, 2021
Cs50 Week5 Data-Structure Notes

CS50-Week-5-Note Cs50 Week5 Data-Structure Notes What Is Data-Structured It is the way of arranging the data to be able to use effectively. Summa

Waz.sheeran 0 Dec 6, 2021
C++ Type Traits for Smart Pointers that are not included in the standard library, containing inheritance detection and member detection.

Smart Pointer Type Trait ?? A simple, header-only cpp library implementing smart pointer type traits. You can easily compile your code diffrently depe

Woon2 11 Dec 22, 2021
C++ standard library reference

Information This is source package for Cppreference C++ standard library reference documentation available at http://en.cppreference.com. If there is

Povilas Kanapickas 319 Jan 14, 2022
This is a C/C++ simulation project which illustrates the framing of standard ethernet protocol

This is a C/C++ simulation project which illustrates the framing of standard ethernet protocol. It creates server and client processes on the same machine and through IPC, it sends the data from the client to the server in a simplex communication.

P Punyacharan 6 Sep 7, 2021
This is a simple UNITEST to test the implementation of the the various container types of the C++ standard template library

ft_container UNITest. This is a simple UNITEST to test the implementation of the the various container types of the C++ standard template library that

Mohamed AMOUSSAOUI 27 Dec 23, 2021
A demonstration of implementing, and using, a "type safe", extensible, and lazy iterator interface in pure C99.

c-iterators A demonstration of implementing, and using, a "type safe", extensible, and lazy iterator interface in pure C99. The iterable is generic on

Chase 51 Dec 30, 2021
The DSiLanguagePatcher increases accessibility to foreign region DSi consoles by providing a mean to change the user interface language.

DSi Language Patcher The DSi Language patcher is a small tool, which runs on your DSi (homebrew execution required) and create a copy of your original

null 14 Oct 9, 2021
Introducing to the world - Maze Game! A game with an easily accessible, user-friendly interface that will provide you the serotonin a game should!

Maze-Project Maze game by Maze™ ?? About Introducing to the world - Maze game! ⛏️ Used technologies C++ ✅ Features 3 levels of difficulty User-friendl

Yoana Agafonova 5 Nov 10, 2021
Connect 4 clone written with c++ with the RSGL library. Based on my connect 4 clone written in python/pygame and my SDL port of that same repo. Along with 3DS support by SaCode

RSGL-Connect-4 Building linux git clone https://github.com/RSGL-Org/RSGL-Connect-4.git cd RSGL-Connect-4 make ./Connect4 Bulding 3ds (3ds support

RSGL 2 Dec 11, 2021
Code profiler based on Frida

Code Profiler Based on Frida This repository contains the code to profile LIEF functions with Frida. Get Started Make sure to download the right versi

LIEF 26 Nov 14, 2021
A curated list of project-based tutorials in C

A list of tutorials that work towards the making of small to large projects in C.

R 8.2k Jan 23, 2022
Ubpa small flat containers based on C++20

USmallFlat Ubpa small flat containers based on C++20 Containers basic_flat_map basic_flat_multimap basic_flat_multiset basic_flat_set basic_small_vect

Ubpa 14 Dec 27, 2021
λQ: A Simple Quantum Programming Language based on QWIRE.

λQ λQ: A Simple Language based on QWIRE which can be compiled to QASM. The name λQ means lambda calculus with quantum circuits. This is a term project

Wenhao Tang 5 Jul 11, 2021
Cmusic source code, based on the CARL 0.2.0 distribution

CMUSIC These are the sources and makefiles for a build of the classic CARL cmusic and related programs. The build has been tested on OSX and Linux, an

null 6 Apr 10, 2021
Stack based programming language

stacky WIP stack-based compiled concatenative programming language. Currently it's somewhere between B and C programming languages in universe where B

Robert Bendun 2 Jan 7, 2022
Implementation of kcp protocol based on c++11

?? kcp-cpp A C++11 header-only kcp library,It has been heavily optimized to support native heartbeat packets and multithreading There are a lot of ins

 KH 11 Jan 19, 2022
a header-only crossplatform type-safe dynamic compiler generator based on C++ 17.

Mu Compiler Generator MuCompilerGenerator(MuCplGen) a Header-Only dynamic compiler generator based on C++ 17. Why MuCplGen? header-only cross-platform

MuGdxy 11 Dec 31, 2021