A CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD).

Overview

Velvet

Velvet is a CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD).

screen-shot

Why another cloth simulator?

There are already lots of cloth simulation project around. But most of them are executed on the CPU, thus suffers from low performance.

Flex is a great GPU-accelerated physics engine which is developed by the authors of Position Based Dynamics, but the project close-sourced their physics simulation code.

It's a non-trivial task to implement cloth simulation on GPU. There are many more issues that need to solved and their solutions aren't easy to find.

This project intend to reproduce the behavior of Flex and open-source our solutions. We hope that this project can make it less painful for other developers to learn GPU cloth simulation.

Features

  • Constraints
    • Attach constraints
    • Long range attachment constraints
    • Stretch constraints
    • Bending constriants
  • Collisions
    • SDF collision
    • Particle collision
    • Spatial hash neighbor finding
  • OpenGL rendering

gif-2022-04-19

Quick Start

Velvet is built using Visual Studio 2019, C++ 17 and CUDA 11.1. Extra dependencies includes:

  • glfw3: For window managing
  • glad: For OpenGL
  • fmt: For better printing functionality
  • glm: For 3D mathematics
  • assimp: For model loading
  • imgui[core, opengl3-binding, glfw-binding]: For graphical user interface

Executable is provided and can be launched using run_velvet.bat.

If you want to build from source by yourself, dependencies can be installed using vcpkg:

./vcpkg.exe install glfw3:x64-windows
./vcpkg.exe install glad:x64-windows
./vcpkg.exe install fmt:x64-windows
./vcpkg.exe install glm:x64-windows
./vcpkg.exe install assimp:x64-windows
./vcpkg.exe install imgui[core, opengl3-binding, glfw-binding]:x64-windows

Implementation Details

In computer graphics, building your own wheel can often be unevitable. But what fears most is that sometimes you don't even have recipe for the wheel you want to build. There are lots of great paper describing their methods, but many of the implementation details are left out or scattered across the internet.

The following provides some implementation details on GPU cloth simulation that may help you in someway. Most of the related code can be found in VtClothSolverGPU.[hpp, cuh, cu].

1. Simulation pipeline

The methods and tricks of PBD is scattered across many papers and resources. Here is the summed up pipeline:

CollideSDF(); // Collide particles with Signed Distance Fields (Q1)
for (int substep = 0; substep < numSubsteps; substep++) // Reference: [Small steps in physics simulation, 2019]
{
    PredictPositions();           // Predict particle positions
    FindNeighborsBySpatialHash(); // Use spatial hashing to find particle neighbors
    
    // Handle Collisions
    CollideParticles();           // Collide particles with neighboring particles
    CollideSDF();                 // Collide particles with Signed Distance Fields
	
    // Solve constraints
    for (int iteration = 0; iteration < numIterations; iteration++)
    {
        SolveStretch();           // Stretch constraint keeps the distance between two particles
        SolveAttachment();        // Attachment constraint keeps the distance between a particle and a 3D position
        // SolveBending();        // Bending constraints doesn't work well on GPU (Q2)
        ApplyDeltas();            // Apply position deltas accumulated in previous constraint solving (Q3)
    }
    
    Finalize();                   // Update velocities and positions
}
ComputeNormals();                 // Compute vertex normals for rendering
  • Q1: Why CollideSDF is executed first?
    • Kinematic objects (objects controlled by animation, user input, etc.) can move really fast, and if particles collide with them inside PBD loop, particles will end up with really large velocities and produce unstable behavior (mostly large vibration). Therefore, we include a pre-stabilization pass (which useCollideSDF in our project). This pass update the particle position $x_i$ directly, therefore will not influence particle velocity.
    • Reference: [Unified particle physics for real-time applications, 2014, Section 4.4 Initial Conditions]
  • Q2: Why not SolveBending?
    • In my experience, bending constraints don't work well with Jacobi iteration. Using a small bending compliance is really difficult to converge, and results in particle clustering or vibration. Using large bending compliance makes little influence on overall visual results. Also, bending constraints are more expensive than other constraints. For your information, Flex also exclude bending constraints.
  • Q3: What is ApplyDeltas?
    • In GPU implementation, we normally replace the Gauss-Seidel iteration used in CPU version with Jacobi iteration for better parallization. The downside of Jacobi iteraiton is its slow convergence. If we directly update the predicted positions, the particles will vibrate. To address this issue, we accumulated the position changes of all constraints into another variable deltas, and apply the averaged value to predicted positions in the end.
    • Reference: [Unified particle physics for real-time applications, 2014, Section 4.3 Successive Over-Relaxation]

2. Collision Filtering for Stable Particle Collision

In Position Based Dynamics, particle collision is an easy way to implement cloth self collision. However, the particle size can be difficult to choose. A small particle size leads to gaps inside cloth geometry, which results in particle tunneling and bumpy surface. A large particle size makes the cloth really thick and neighboring particles may constantly collide, which may results in erroneous buckling or folding.

In this project, we use a relatively large particle size (1.5x spring rest length), and employ collision filtering technique to ignore collisions between particles that are initially close.

A naive way to achieve collision filtering is to store all the pair of particles that are connected by springs (stretch constraints). But this data structure can be quite inefficient to query on GPU.

Our method is to store the initial particles positions in a memory buffer initialPositions. During particle neighbor finding phase, we can easily determine whether particles are initially close by examine their initial distances using initialPositions.

Note that in industrial usage, simulators rarely use particles for collision. It's difficult to control the cloth thickness and the surface can be quite bumpy. But for simplicity's sake, particle collision is a great choice.

3. Avoid Stretchy Cloth by Long Range Attachments

A common problem you will notice after porting a CPU cloth solver to GPU is that cloth becomes really stretchy. This is caused by the slower convergence rate from using Jacobi iteration.

Of course you can just increase the number of substeps and iterations, but this way the performance will degrade really fast.

A relatively cheap way to address this problem is to use Long Range Attachments (Reference: Long range attachments - A method to simulate inextensible clothing in computer games, 2012).

The LRA is quite simple. In the original PBD, when solving an attachment constraint, you modify the attached particle's position only. With Long Range Attachment, all particles are required to stay within a distance from the attached point. This mechanism can be easily implemented by adding another attribute attachDistances into attachement constraints.

4. Performance Optimizations

4.1 Atomic updates

Many kernels in GPU cloth simulation use atomic updates to avoid conflict between different threads. Below is a useful method to improve the atomic performance by including a reorder parameter.

The atomic updates for the x,y,z components of a vector is reordered based on the reorder value. This decreases the chances by 2/3 that multiple threads are updating the same memory location at the same time.

The value of reorder can be choosed easily, for example stretch constraints can use the sum of two particle's ID.

__device__ inline void AtomicAdd(glm::vec3* address, int index, glm::vec3 val, int reorder)
{
    int r1 = reorder % 3;
    int r2 = (reorder + 1) % 3;
    int r3 = (reorder + 2) % 3;
    atomicAdd(&(address[index].x) + r1, val[r1]);
    atomicAdd(&(address[index].x) + r2, val[r2]);
    atomicAdd(&(address[index].x) + r3, val[r3]);
}

4.2 Spatial hashing

We follow the CUDA sample project particles for spatial hashing implementation. There are some easy optimizations that can be made to improve its performance:

  • Sort Algorithm

    The particles project use thrust::sort_by_key for sorting. In my experience, thrust's performance aren't that great. By replacing this with cub::DeviceRadiusSort::SortPairs, the sorting performance can be improved by roughly 50%.

  • Neighbor Caching

    Spatial hashing are expensive to compute, but we don't need to update the spatial hashing every time we do particle collisions. The neighbor information can be stored in an neighbors array and used for multiple times. In this project, we use a integer parameter interleavedHash to control how many times neighbors are reused before recompute.

    The neighbors array are structured such that neighbors[i + numMaxNeighborsPerParticle * j] stores the j-th neighbor for particle i. It's not recommended to organize neighbors as neighbors[i * numMaxNeighborsPerParticle + j]since this disallows coalesced memory access.

4.3 Particle Rendering

Render particles make it easier to debug. However, even with instanced rendering, it can be get quite slow to render more than 100k particles (On my laptop with RTX2060, 50k particles can't make it to 60FPS).

A more efficient way is screen space rendering [Reference: Screen Space Fluid Rendering for Games]. Particles are rendered in screen space as billboard quads using geometry shader. Then the normals and depth of the pixels are mannually updated to produce the illusion of spheres. This is not physically accurate but much more efficient.

You might also like...
imGuIZMO.quat is a ImGui widget: like a trackball it provides a way to rotate models, lights, or objects with mouse, and graphically visualize their position in space, also around any single axis (Shift/Ctrl/Alt/Super)
imGuIZMO.quat is a ImGui widget: like a trackball it provides a way to rotate models, lights, or objects with mouse, and graphically visualize their position in space, also around any single axis (Shift/Ctrl/Alt/Super)

imGuIZMO.quat v3.0 imGuIZMO.quat is a ImGui widget: like a trackball it provides a way to rotate models, lights, or objects with mouse, and graphicall

Plot airfield from a file containing the list of airfield of Germany and their position (latitude/longitude)
Plot airfield from a file containing the list of airfield of Germany and their position (latitude/longitude)

Plot aerodromes from a file containing the list of aerodromes of Germany and their position (latitude/longitude)

Extended kalman filter implementation.
Extended kalman filter implementation.

EKF (Extended Kalman Filter) This project is a C++ implementation of EKF.For the related principles of EKF, please check this tutorial (TODO). Project

Obfuscator refactored and extended from OLLVM.
Obfuscator refactored and extended from OLLVM.

OLLVM++ Obfuscator refactored and extended from OLLVM. Environment Ubuntu 18.04.5 LTS LLVM 12.0.1 Clang 12.0.1 CMake 3.21.1 Usage Compile Obfuscation

Half-Life : Extended main branch for developing purposes
Half-Life : Extended main branch for developing purposes

Half Life : Extended SDK Source Code of Half Life : Extended as a open source modbase for everyone publicly, make your own mod with alot of features e

Xtl - eXtended Template Library

eXtended Template Library Open Hub Linux Windows Coverage Technical Debt Code Quality License Contribute with Gratipay Contribute with Beerpay View th

Invariant-ekf - C++ library to implement invariant extended Kalman filtering for aided inertial navigation.
Invariant-ekf - C++ library to implement invariant extended Kalman filtering for aided inertial navigation.

inekf This repository contains a C++ library that implements an invariant extended Kalman filter (InEKF) for 3D aided inertial navigation. This filter

A small XM (FastTracker II Extended Module) player library.

libxm A small XM (FastTracker II Extended Module) player library. Main features: Small size in mind; many features can be disabled at compile-time, or

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation
VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

Comments
Owner
Vital Chen
World famous bug breeder
Vital Chen
Improved version of real-time physics engine that couples FEM-based deformables and rigid body dynamics

Enhanced version of coupled FEM and constrained rigid body simulation Description This little playground aimed to test our Conjugate Gradients based M

Andrey Voroshilov 25 Apr 11, 2022
The core engine forked from NVidia's Q2RTX. Heavily modified and extended to allow for a nicer experience all-round.

Polyhedron - A Q2RTX Fork A fork of the famous Q2RTX project by NVIDIA™ that strives to improve all of its other factors of what was once upon a time

Polyhedron Studio 21 Dec 22, 2022
A PIC/FLIP fluid simulation based on the methods found in Robert Bridson's "Fluid Simulation for Computer Graphics"

GridFluidSim3d This program is an implementation of a PIC/FLIP liquid fluid simulation written in C++11 based on methods described in Robert Bridson's

Ryan Guy 727 Dec 8, 2022
A cheap,simple,Ongeki controller Use Keyboard Simulation and Mouse Simulation to controller the ongeki game. Using Pro-micro control.

N.A.G.E.K.I. A cheap,simple,Ongeki controller Use Keyboard Simulation and Mouse Simulation to controller the ongeki game. Using Pro-micro control. 中文版

NanaNana 39 Dec 8, 2022
A cheap,simple,Ongeki controller Use Keyboard Simulation and Mouse Simulation to controller the ongeki game. Using Pro-micro control.

N.A.G.E.K.I. PLEASE CHECK Main Project A cheap,simple,Ongeki controller Use Keyboard Simulation and Mouse Simulation to controller the ongeki game. Us

NanaNana 11 Dec 30, 2021
Building and Executing Position Independent Shellcode from Object Files in Memory

PIC-Privileges Building and Executing Position Independent Shellcode from Object Files in Memory. This is a pingback to the blogpost I wrote at https:

Paranoid Ninja 108 Dec 26, 2022
I modified the colmap,when it reconstructs from known pose ,only let it optimize rotation ,fixing position!

Mapping-base-lidar-pose-or-vslam-pose I simply modified the colmap,when it reconstructs from known pose ,only let it optimize rotation ,fixing positio

李城(大橙子) 23 Dec 16, 2022
PUBG ESP Hack for Emulator using C++ code. Player Position, Bones, Loots, Weapons, Vehicles, Boxes ... etc.

PUBG 1.7 ESP Hack for Emulator (C++ Source Code) PUBG ESP Hack for Emulator using C++ code. Player Position, Bones, Loots, Weapons, Vehicles, Boxes ..

Zero One Billion 38 Jan 5, 2023
Convenient, high-performance RGB color and position control for console output

Oof (omnipotent output friend) It's common for C++ programs to write output to the console. But consoles are far more capable than what they are usual

Sebastian Werhausen 776 Dec 27, 2022
Scroll pos - Provides some additional functions to ScrollController to define item position relative to the screen.

Scroll Position Provides some additional functions to ScrollController to define item position relative to the screen. A live version is available her

Kevin Vuilleumier 11 Nov 13, 2022