A cross platform shader language with multi-threaded offline compilation or platform shader source code generation

Overview

pmfx-shader

Build Status Build status

A cross platform shader language with multi-threaded offline compilation or platform shader source code generation. Output json reflection info and c++ header with your shaders structs, fx-like techniques and compile time branch evaluation via (uber-shader) "permutations".

A single file does all the shader parsing and code generation. Simple syntax changes are handled through macros and defines found in platform, so it is simple to add new features or change things to behave how you like. More complex differences between shader languages are handled through code-generation.

This is a small part of the larger pmfx system found in pmtech, it has been moved into a separate repository to be used with other projects, if you are interested to see how pmfx shaders are integrated please take a look here.

Supported Targets

  • HLSL Shader Model 3+
  • GLSL 330+
  • GLES 300+ (WebGL 2.0)
  • GLSL 200 (compatibility)
  • GLES (WebGL 1.0) (compatibility)
  • SPIR-V. (Vulkan, OpenGL)
  • Metal 1.0+ (macOS, iOS, tvOS)
  • PSSL
  • NVN (Nintendo Switch)

(compatibility) platforms for older hardware might not support all pmfx features and may have missing legacy features.

Dependencies

Windows users need vcredist 2013 for the glsl/spirv validator.

Console Platforms

Compilation for Orbis and Nvn is possible but you will need the SDK's installed and the environment variables set.

Usage

python3 build_pmfx.py -help

--------------------------------------------------------------------------------
pmfx shader (v3) ---------------------------------------------------------------
--------------------------------------------------------------------------------
commandline arguments:
    -shader_platform 
   
    
    -shader_version (optional) 
    
     
        hlsl: 3_0, 4_0 (default), 5_0
        glsl: 200, 330 (default), 420, 450
        gles: 100, 300, 310, 350
        spirv: 420 (default), 450
        metal: 2.0 (default)
        nvn: (glsl)
    -metal_sdk [metal only] 
     
      
    -metal_min_os (optional) <9.0 - 13.0 (ios), 10.11 - 10.15 (macos)>
    -nvn_exe [nvn only] 
      
       
    -extensions 
       
         -i 
        
          -o 
          -t  -h  -d (optional) generate debuggable shader -root_dir 
            
              sets working directory here -source (optional) (generates platform source into -o no compilation) -stage_in <0, 1> (optional) [metal only] (default 1) uses stage_in for metal vertex buffers, 0 uses raw buffers -cbuffer_offset (optional) [metal only] (default 4) specifies an offset applied to cbuffer locations to avoid collisions with vertex buffers -texture_offset (optional) [vulkan only] (default 32) specifies an offset applied to texture locations to avoid collisions with buffers -v_flip (optional) (inserts glsl uniform to conditionally flip verts in the y axis) -------------------------------------------------------------------------------- 
            
        
       
      
     
    
   

Compiling Examples

Metal for macOS

python3 build_pmfx.py -shader_platform metal -metal_sdk macosx -metal_min_os 10.14 -shader_version 2.2 -i examples -o output/bin -h output/structs -t output/temp

Metal for iOS

python3 build_pmfx.py -shader_platform metal -metal_sdk iphoneos -metal_min_os 0.9 -shader_version 2.2 -i examples -o output/bin -h output/structs -t output/temp

SPIR-V for Vulkan

python3 build_pmfx.py -shader_platform spirv -i examples -o output/bin -h output/structs -t output/temp

HLSL for Direct3D11

python3 build_pmfx.py -shader_platform hlsl -shader_version 4_0 -i examples -o output/bin -h output/structs -t output/temp

GLSL

python3 build_pmfx.py -shader_platform glsl -shader_version 330 -i examples -o output/bin -h output/structs -t output/temp

Usage

Use mostly HLSL syntax for shaders, with some small differences:

Always use structs for inputs and outputs.

struct vs_input
{
    float4 position : POSITION;
};

struct vs_output
{
    float4 position : SV_POSITION0;
};

vs_output vs_main( vs_input input )
{
    vs_output output;
    
    output.position = input.position;
    
    return output;
}

Supported semantics and sizes

POSITION     // 32bit float
TEXCOORD     // 32bit float
NORMAL       // 32bit float
TANGENT      // 32bit float
BITANGENT    // 32bit float
BLENDWEIGHTS // 32bit float
COLOR        // 8bit unsigned int
BLENDINDICES // 8bit unsigned int

Shader resources

Due to fundamental differences accross shader languages, shader resource declarations and access have a syntax unique to pmfx. Define a block of shader_resources to allow global textures or buffers as supported in HLSL and GLSL.

shader_resources
{
    texture_2d( diffuse_texture, 0 );
    texture_2dms( float4, 2, texture_msaa_2, 0 );
};

Resource types

// texture types
texture_2d( sampler_name, layout_index );
texture_2dms( type, samples, sampler_name, layout_index );
texture_2d_array( sampler_name, layout_index );
texture_cube( sampler_name, layout_index );
texture_cube_array( sampler_name, layout_index ); // requires sm 4+, gles 400+
texture_3d( sampler_name, layout_index );
texture_2d_external( sampler_name, layout_index ); // gles specific extension

// depth formats are required for sampler compare ops
depth_2d( sampler_name, layout_index ); 
depth_2d_array( sampler_name, layout_index );
depth_cube( sampler_name, layout_index ); 
depth_cube_array( sampler_name, layout_index );

// compute shader texture types
texture_2d_r( image_name, layout_index );
texture_2d_w( image_name, layout_index );
texture_2d_rw( image_name, layout_index );
texture_3d_r( image_name, layout_index );
texture_3d_w( image_name, layout_index );
texture_3d_rw( image_name, layout_index );
texture_2d_array_r( image_name, layout_index );
texture_2d_array_w( image_name, layout_index );
texture_2d_array_rw( image_name, layout_index );

// compute shader buffer types
structured_buffer( type, name, index );
structured_buffer_rw( type, name, index );
atomic_counter(name, index);

Accessing resources

// sample texture
float4 col = sample_texture( diffuse_texture, texcoord.xy );
float4 cube = sample_texture( cubemap_texture, normal.xyz );
float4 msaa_sample = sample_texture_2dms( msaa_texture, x, y, fragment );
float4 level = sample_texture_level( texture, texcoord.xy, mip_level);
float4 array = sample_texture_array( texture, texcoord.xy, array_slice);
float4 array_level = sample_texture_array_level( texture, texcoord.xy, array_slice, mip_level);

// sample compare
float shadow = sample_depth_compare( shadow_map, texcoord.xy, compare_ref);
float shadow_array = sample_depth_compare_array( shadow_map, texcoord.xy, array_slice, compare_ref);
float cube_shadow = sample_depth_compare_cube( shadow_map, texcoord.xyz, compare_ref);
float cube_shadow_array = sample_depth_compare_cube_array( shadow_map, texcoord.xyz, array_slice, compare_ref);

// compute rw texture
float4 rwtex = read_texture( tex_rw, gid );
write_texture(rwtex, val, gid);

// compute structured buffer
struct val = structured_buffer[gid]; // read
structured_buffer[gid] = val;        // write

cbuffers

cbuffers are a unique kind of resource, this is just because they are so in HLSL. you can use cbuffers as you normally do in HLSL.

cbuffer per_view : register(b0)
{
    float4x4 view_matrix;
};

cbuffer per_draw_call : register(b1)
{
    float4x4 world_matrix;
};

vs_output vs_main( vs_input input )
{
    vs_output output;
    
    float4 world_pos = mul(input.position, world_matrix);
    output.position = mul(world_pos, view_matrix);
    
    return output;
}

GLES 2.0 / GLSL 2.0 cbuffers

cbuffers are emulated for older glsl versions, a cbuffer is packed into a single float4 array. The uniform float4 array (glUniform4fv) is named after the cbuffer, you can find the uniform location from this name using glUniformLocation. The count of the float4 array is the number of members the cbuffer where float4 and float4x4 are supported and float4x4 count for 4 array elements. You can use the generated c++ structs from pmfx to create a coherent copy of the uniform data on the cpu.

Atomic Operations

Support for glsl, hlsl and metal compatible atomics and memory barriers is available. The atomic_counter resource type is a RWStructuredBuffer in hlsl, a atomic_uint read/write buffer in Metal and a uniform atomic_uint in GLSL.

// types
atomic_uint u;
atomic_int i;

// operations
atomic_load(atomic, original)
atomic_store(atomic, value)
atomic_increment(atomic, original)
atomic_decrement(atomic, original)
atomic_add(atomic, value, original)
atomic_subtract(atomic, value, original)
atomic_min(atomic, value, original)
atomic_max(atomic, value, original)
atomic_and(atomic, value, original)
atomic_or(atomic, value, original)
atomic_xor(atomic, value, original)
atomic_exchange(atomic, value, original)
threadgroup_barrier()
device_barrier()

// usage
shader_resources
{
    atomic_counter(counter, 0); // counter bound to index 0
}

// increments counter and stores the original value in 'index'
uint index = 0;
atomic_increment(counter, index);

Includes

Include files are supported even though some shader platforms or versions may not support them natively.

#include "libs/lighting.pmfx"
#include "libs/skinning.pmfx"
#include "libs/globals.pmfx"
#include "libs/sdf.pmfx"
#include "libs/area_lights.pmfx"

Extensions

To enable glsl extensions you can pass a list of strings to the -extensions commandline argument. The glsl extension will be inserted to the top of the generated code with : enabled set:

-extensions GL_OES_EGL_image_external GL_OES_get_program_binary

Unique pmfx features

cbuffer_offset / texture_offset

HLSL has different registers for textures, vertex buffers, cbuffers and un-ordered access views. Metal and Vulkan have some differences where the register indices are shared across different resource types. To avoid collisions in different API backends you can supply offsets using the following command line options.

Metal: -cbuffer_offset (cbuffers start binding at this offset to allow vertex buffers to be bound to the slots prior to these offsets)

Vulkan: -texture_offset (textures start binding at this point allowing uniform buffers to bind to the prior slots)

v_flip

OpenGL has different viewport co-ordinates to texture coordinate so when rendering to the backbuffer vs rendering into a render target you can get output results that are flipped in the y-axis, this can propagate it's way far into a code base with conditional "v_flips" happening during different render passes.

To solve this issue in a cross platform way, pmfx will expose a uniform bool called "v_flip" in all gl vertex shaders, this allows you to conditionally flip the y-coordinate when rendering to the backbuffer or not.

To make this work make sure you also change the winding glFrontFace(GL_CCW) to glFrontFace(GL_CW).

cbuffer padding

HLSL/Direct3D requires cbuffers to be padded to 16 bytes alignment, pmfx allows you to create cbuffers with any size and will pad the rest out for you.

Techniques

Single .pmfx file can contain multiple shader functions so you can share functionality, you can define a block of jsn in the shader to configure techniques. (jsn is a more lenient and user friendly data format similar to json).

Simply specify vs, ps or cs to select which function in the source to use for that shader stage. If no pmfx: json block is found you can still supply vs_main and ps_main which will be output as a technique named "default".

pmfx:
{    
    gbuffer:
    {
        vs: vs_main,
        ps: ps_gbuffer
    },
        
    zonly:
    {
        vs: vs_main_zonly,
        ps: ps_null
    },
}

You can also use json to specify technique constants with range and ui type.. so you can later hook them into a gui:

constants:
{
    albedo      : { type: float4, widget: colour, default: [1.0, 1.0, 1.0, 1.0] },
    roughness   : { type: float, widget: slider, min: 0, max: 1, default: 0.5 },
    reflectivity: { type: float, widget: slider, min: 0, max: 1, default: 0.3 },
}

pmfx constants

Access to technique constants is done with m_prefix.

ps_output ps_main(vs_output input)
{
    float4 col = m_albedo;
}

Inherit

You can inherit techniques by using jsn inherit feature.

gbuffer(forward_lit):
{
    vs: vs_main,
    ps: ps_gbuffer,

    permutations:
    {
        SKINNED: [31, [0,1]],
        INSTANCED: [30, [0,1]],
        UV_SCALE: [1, [0,1]]
    }
},

gbuffer inherits from forward lit, by putting the base clase inside brackets.

Permutations

Permutations provide an uber shader style compile time branch evaluation to generate optimal shaders but allowing for flexibility to share code as much as possible. The pmfx block is used here again, you can specify permutations inside a technique.

permutations:
{
    SKINNED: [31, [0,1]],
    INSTANCED: [30, [0,1]],
    UV_SCALE: [1, [0,1]]
}

The first parameter is a bit shift that we can check.. so skinned is 1<<31 and uv scale is 1<<1. The second value is number of options, so in the above example we just have on or off, but you could have a quality level 0-5 for instance.

To insert a compile time evaluated branch in code, use a colon after if / else

if:(SKINNED)
{
    float4 sp = skin_pos(input.position, input.blend_weights, input.blend_indices);
    output.position = mul( sp, vp_matrix );
}
else:
{
    output.position = mul( input.position, wvp );
}

For each permutation a shader is generated with the technique plus the permutation id. The id is generated from the values passed in the permutation object.

Adding permutations can cause the number of generated shaders to grow exponentially, pmfx will detect redundant shader combinations using md5 hashing, to re-use duplicate permutation combinations and avoid un-necessary compilation.

C++ Header

After compilation a header is output for each .pmfx file containing c struct declarations for the cbuffers, technique constant buffers and vertex inputs. You can use these sturcts to fill buffers in your c++ code and use sizeof for buffer update calls in your graphics api.

It also contains defines for the shader permutation id / flags that you can check and test against to select the correct shader permutations for a draw call (ie. skinned, instanced, etc).

namespace debug
{
    struct per_pass_view
    {
        float4x4 view_projection_matrix;
        float4x4 view_matrix;
    };
    struct per_pass_view_2d
    {
        float4x4 projection_matrix;
        float4 user_data;
    };
    #define OMNI_SHADOW_SKINNED 2147483648
    #define OMNI_SHADOW_INSTANCED 1073741824
    #define FORWARD_LIT_SKINNED 2147483648
    #define FORWARD_LIT_INSTANCED 1073741824
    #define FORWARD_LIT_UV_SCALE 2
    #define FORWARD_LIT_SSS 4
    #define FORWARD_LIT_SDF_SHADOW 8
}

JSON Reflection Info

Each .pmfx file comes along with a json file containing reflection info. This info contains the locations textures / buffers are bound to, the size of structs, vertex layout description and more, at this point please remember the output reflection info is fully compliant json, and not lightweight jsn.. this is because of the more widespread support of json.

"texture_sampler_bindings": [
    {
        "name": "gbuffer_albedo",
        "data_type": "float4",
        "fragments": 1,
        "type": "texture_2d",
        "unit": 0
    }]
   
"vs_inputs": [
    {
        "name": "position",
        "semantic_index": 0,
        "semantic_id": 1,
        "size": 16,
        "element_size": 4,
        "num_elements": 4,
        "offset": 0
    }]
You might also like...
Fluid Visualization - The code compilation is only tested on Arch Linux x86_64

Fluid Visualization The code compilation is only tested on Arch Linux x86_64, Linux kernel 5.15.13-arch1, with gcc 11.1.0, CMake 3.22.1, Xorg X server

Shader generation for C++

Shaderpp is a GLSL(shader) code generator library for C++. It enables integrated development experiment for C++ and GLSL. Table of Contents Requiremen

glslcc: Cross-compiler for GLSL shader language (GLSL-HLSL,METAL,GLES,GLSLv3)

glslcc: Cross-compiler for GLSL shader language (GLSL-HLSL,METAL,GLES,GLSLv3) @septag glslcc is a command line tool that converts GLSL code to HLSL,

⚔️ A tool for cross compiling shaders. Convert between GLSL, HLSL, Metal Shader Language, or older versions of GLSL.
⚔️ A tool for cross compiling shaders. Convert between GLSL, HLSL, Metal Shader Language, or older versions of GLSL.

A cross compiler for shader languages. Convert between SPIR-V, GLSL / GLSL ES, HLSL, Metal Shader Language, or older versions of a given language. Cross Shader wraps glslang and SPIRV-Cross, exposing a simpler interface to transpile shaders.

Shader Playground is a website for exploring shader compilers.
Shader Playground is a website for exploring shader compilers.

Shader Playground is a website for exploring shader compilers. Visit website Supported backends Compilers ANGLE Clspv DXC FXC Glslan

Lightweight, cross-platform & full-featured shader IDE
Lightweight, cross-platform & full-featured shader IDE

SHADERed is a lightweight tool for writing and debugging shaders. It is easy to use, open source, cross-platform (runs on Windows, Linux & Web).

Project is to port original Zmodem for Unix to CP/M and provide binaries and source code for platform specific modification as needed. Based on 1986 C source code by Chuck Forsberg

Zmodem-CP-M This repository is intended to foster a RetroBrewComputers community effort to port the original Zmodem source code for Unix to CP/M so ev

Project is to port original Zmodem for Unix to CP/M and provide binaries and source code for platform specific modification as needed. Based on 1986 C source code by Chuck Forsberg

Zmodem4CPM This repository is intended to foster a RetroBrewComputers community effort to port the original Zmodem source code for Unix to CP/M so eve

Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework, based on QEMU.
Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework, based on QEMU.

Unicorn Engine Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework, based on QEMU. Unicorn offers some unparalleled fe

Comments
  • GLSL 330, GLES 310 (WebGL/macOS) - Sampling texture array inside loop causes undefined behaviour.

    GLSL 330, GLES 310 (WebGL/macOS) - Sampling texture array inside loop causes undefined behaviour.

    When sampling texture array, texture cube array or shadow array inside a loop OpenGL exhibits undefined behaviour, I have tried using textureGrad, textureLod and replacing sampleShadow with a simple compare and all of these end up eventually causing undefined behaviour...

    macOS can compile and run the shadow_maps sample in pmtech but sometimes the lighting pass will result in no triangles being rasterised with no warnings or errors, compiling the program and running again can sometimes fix but it will eventually return to nothing being rendered... this sample was previously working on older machine (MacBook Pro 2017) my current machine (MacBook Pro 2019) has never ran the samples correctly when texture arrays are involved, I am not sure if it is todo with the GPU or macOS version.

    I am not sure how to get around the issue, the same machine can do all of the desired behaviour using metal which indicates the GPU can support it.

    For the time being GLSL shaders built from macOS will omit any texture array calls of all flavours and return 0 for colour reads and 1 for compares, this fixes the OpenGL samples for macOS.

    opened by polymonster 1
  • Experimental Version 2.0

    Experimental Version 2.0

    Version 2.0 is now working for HLSL, you can specify a full pipeline state object, auto generate vertex layout and other useful features.

    Version 1.0 is still supported through the -v1 flag.

    opened by polymonster 0
Owner
Alex Dixon
gamedev, low level graphics and video.
Alex Dixon
C/C++ language server supporting multi-million line code base, powered by libclang. Emacs, Vim, VSCode, and others with language server protocol support. Cross references, completion, diagnostics, semantic highlighting and more

Archived cquery is no longer under development. clangd and ccls are both good replacements. cquery cquery is a highly-scalable, low-latency language s

Jacob Dufault 2.3k Jan 2, 2023
Command line tool for offline shader ISA inspection.

Intel Shader Analyzer Intel Shader Analyzer is a tool for offline static analysis of shaders for Intel GPU Architectures. It allows a user to compile

null 113 Jan 3, 2023
Pyramid is a free, open GUI tool for offline shader validation and analysis

Pyramid is a free, open GUI tool for offline shader validation and analysis. The UI takes HLSL or GLSL as input, and runs them through various shader compilers and static analyzers.

null 277 Dec 20, 2022
chia-plotter (pipelined multi-threaded)

chia-plotter (pipelined multi-threaded) This is a new implementation of a chia plotter which is designed as a processing pipeline, similar to how GPUs

Max 2.3k Dec 31, 2022
This is a compilation of the code and images for all Arduino code in the Robotics 11 class.

Robotics 11 - Arduino This is a compilation of the code and images for all Arduino code in the Robotics 11 class. All code can be viewed in each proje

GuhBean 1 Oct 29, 2021
This is the source code for Mirai. The compilation method has been simplified a little and some modifications have been made.

Mirai BotNet to Tashiro(未来砲) Leaked Linux.Mirai Source Code for Research/IoT Development Purposes Uploaded for research purposes and so we can develop

われ 28 Jul 30, 2022
Shader cross compiler to translate HLSL (Shader Model 4 and 5) to GLSL

XShaderCompiler ("Cross Shader Compiler") Features Cross compiles HLSL shader code (Shader Model 4 and 5) into GLSL Simple to integrate into other pro

Lukas Hermanns 345 Dec 9, 2022
"Zero setup" cross-compilation for a wide variety of architectures.

"Zero setup" cross-compilation for a wide variety of architectures. xcross includes compact docker images and a build utility for minimal setup C/C++ cross-compiling, inspired by rust-embedded/cross

Alexander Huszagh 29 Nov 10, 2022
Ziggified GLFW bindings with 100% API coverage, zero-fuss installation, cross compilation, and more.

mach/glfw - Ziggified GLFW bindings Ziggified GLFW bindings that Mach engine uses, with 100% API coverage, zero-fuss installation, cross compilation,

Hexops 201 Dec 27, 2022
A demonstration of various different techniques for implementing 'threaded code,' a technique used in Forth and in virtual machines like the JVM.

Threaded code is a technique used in the implementation of virtual machines (VMs). It avoids the overhead of calling subroutines repeatedly by 'thread

null 25 Nov 4, 2022