Algae: First steps

Partial shader JIT compilation

Siebencorgie published on
12 min, 2246 words

Categories: Sdf Compiler

Overview

In my previous post I closed with the realization that a GPU based interpreter works, but is too slow to create complex signed distance fields. Another shortcoming of the former approach was the linear concatenation of operations. This made it difficult to work with operations like infinite repetition, since only "everything till now", or nothing could be repeated.

The new approach was already teased as

Can I somehow use the compiler at runtime to inline my SDF directly into the shader code?

While working on the basic concept I realized that using Rustc / Rust-Gpu would not give me the control I'd like. Therefore, the shader creation is split into two parts.

  1. Compile-time (via Rustc / Rust-Gpu)
  2. Runtime code injection (via Algae)

Step one is done, as with any rust shader at compile-time. It emits a valid SpirV-Module that could be used as is. The resulting shader module contains an injection function that serves as entry point when injecting new code at runtime via algae. The whole injection point definition is abstracted into a function-like proc-macro (algae_gpu::algae_inject).

This injection point allows the injected code to be tied to the rest of the shader easily. Supplied variables could either be sourced from push constants or be runtime parameters of the shader. In the example below the coord variable is a per-shader pixel coordinate and offset is read from a push constant. The actual test shader can be found here.

CPU-side rust code is now free to define a possibly complex operation that is injected at this point in the shader.

Before the technical implementation is discussed it makes sense to show the usage from the user's perspective.

Usage

Rust-Gpu Shader

GPU site inject function definition and parameter handling: The algae_inject macro must return a valid function. The parameter names are used for later recognition when injecting a function. Again a full example of this code can be found here. It is possible that the usage of the macro might change in the future.


///Defines an injection function with two runtime parameters, as well as 
///default function if nothing is injected.
algae_gpu::algae_inject!(|coord: Vec2, offset: Vec2| -> f32 {
    let a = coord.value + offset.value; //our default function, could be returning 0.0 as well.
    a.dot(a)
});

//...

fn main(..){

//... normal rust-gpu shader code, at some point we want to evaluate the function injected by
//algae:

let result: f32 = algae_inject(coord, Vec2::from(push.offset));

///...
}

Example

The application loads a SpirV module from source (or bytes) first and searches for an injection function.

The application is free to define any function that fulfills basic compile time checks (mostly type safety). This is done through the Operation trait. This function can then be injected at runtime. The AlgaeJit returns the final shader byte-code that is used for pipeline creation.

//create a JIT instance for the SpirV module.
let mut compiler = AlgaeJit::new("resources/test_shader.spv").unwrap();

//Define a function that offsets `coord` based on `offset` and returns the signed distance from the 
// offseted `coord` to s circle with a constant radius=100
let mut function = Subtraction {
    minuent: Box::new(Length {
        inner: Box::new(Addition{
            a: Box::new(Variable::new("coord", Vec2::new(0.0, 0.0))),
            b: Box::new(Variable::new("offset", Vec2::new(0.0, 0.0)))
        }),        
    }),
   subtrahend: Box::new(Constant { value: 100.0 }),
};

//inject into the SpirV module. This can be done everytime the function
//changes and should be *fast*.
compiler.injector().inject((), &mut function);

The example shown above injects a circle-sdf at runtime where the position is controlled by a offset parameter. This parameter is defined by a push constant which is written by the application code. Since the parameter is defined as a variable, injection only has to take place once at application startup. The resulting graphics look like this:

AnimatedCircel

For most users it would be enough to understand algae up to this point. However, most people reading this will be interested in the implementation of all this, so keep reading :D.

Technical overview

Injection function

The injection macro takes care of creating the function which can be changed at run time as well as preparing runtime parameters to be recognizable in the resulting SpirV-bytecode.

The easiest way to recognize any function is of course to search for the function's definition in the byte code. In SpirV bytecode this looks something like this: %16 = OpFunction %24 which defines a callable function %16 with a return value of type %24. At the moment this definition however has two problems. The first being that we don't know if this is the correct float returning function and the second being that this function would probably be inlined when optimizing the code. We therefore add more context information. The first being a NoInline attribute, as well as some debug information that contains the functions name as a simple string.

The function name solution is not final, since it prevents the shader from being striped of its debug information before being passed to the driver. However, for now it works well enough.

The final function definition and call look like this:

Definiton

%16 = OpFunction  %24  DontInline %32
%54 = OpFunctionParameter  %15
%55 = OpFunctionParameter  %15
//... whatever rust-gpu generated for the default function
OpReturnValue %67
OpFunctionEnd

Function call

%113 = OpCompositeConstruct  %15  %41 %94
%114 = OpCompositeConstruct  %15  %42 %99
%115 = OpFunctionCall  %24  %16 %113 %114

As you can see the function call parameters are constructed right before the actual call via OpCompositeConstruct. This is the second part I mentioned above: Parameter preparation.

Each parameter (in the example "coord" and "offset") are wrapped in a struct where the first field is a constant 32bit hash of the name, and the second element is the actual runtime parameter.

This way each parameter can not only be identified at runtime by its type, but also its name. This makes it possible in the example to distinguish coord: Vec2 from offset: Vec2.

A similar marking mechanism could probably be used to identify the correct, not inlined function as well. But I did not find a reliable way yet that does not get stripped when using optimized SpirV.

Function interface analysis and runtime parameters

Since we left enough traces in the SpirV-module we just have to search for those traces when loading the byte code in AlgaeJit. Each parameter's type is analyzed at runtime to prevent runtime type mismatches in the shader. They are cought by Vulkan's validation layers, but I am not sure what would happen without those. Probably depends on the driver that is being use.

For interaction with the SpirV-module and new byte-code construction rspirv is used. Thanks to Khyperia for hinting this crate, otherwise I might have rolled my own thing, which would have been much more tedious.

Function definition

Since the entry point is now well-defined it is time to talk about the actual function definition and byte-code construction.

At first, I wanted to create some kind of custom intermediate representation (IR) which is then serialized either to SpirV or some other instruction set like x86 or RiscV. This would have allowed me to test the generated shader on the CPU first before testing them on a GPU similar to what I did with Nako's interpreter implementation.

I decided against it since this would effectively mean that I would have to design a pretty big and complex IR that translates to SpirV which in turn is also only an IR. This feels kind of redundant at the moment. Since I am only injecting a sub-set of SpirV anyways (no image operations for instance) it would be easier to write a small SpirV interpreter anyways to the generated code.

Therefore, the only requirement for the function definition is that it is serializable into SpirV. The function itself is defined by a tree of operations. For instance the circle function is the operation tree:

graph TD; A([Subtraction]); B([Radius]); C([Length]); D([Addition]); E([Coord]); F([Offset]); A---C; A---B; C---D; D---E; D---F;

In Rust this comes down to the Operation trait:

pub trait Operation {
    type Input;
    type Output;

    fn serialize(&mut self, serializer: &mut Serializer, input: Self::Input) -> Self::Output;
}

In most cases Input is some kind of jit-compile-time information or nothing, and Output is the variable-id of the result of this operation.

A variable ID is represented as DataId<T> where T is the rust type of this ID's value. For instance f32 or glam::Vec2. This way the rust compiler can check type safety and implementations of the Operation trait can be made generic. The IntoSpvType trait allows turning Rust types into SpirV types which is needed for the former mentioned runtime type checking of variables. Currently, this is only implemented for some basic types like floats, integers and glam's Vec and Mat types. A derive-macro for structs would enable custom structs to be used in algae functions as well.

Serialization

Since the serializer is only a thin wrapper over rspirv's dr::Builder each operation is free to serialize anything. This allows for instance the Length operation (which returns the euclidean length of a vector) to use the extended instruction set GLSL.std.450.

To end the technical overview with the example again a simple before/after diff of the injected circle-SDF function with two runtime parameters looks like this:

%16 = OpFunction  %24  DontInline %32				%16 = OpFunction  %24  DontInline %32
%54 = OpFunctionParameter  %15					%54 = OpFunctionParameter  %15
%55 = OpFunctionParameter  %15					%55 = OpFunctionParameter  %15
%56 = OpLabel						      |	%122 = OpLabel
OpLine %4 29 12						      |	%123 = OpCompositeExtract  %31  %54 1
%57 = OpCompositeExtract  %31  %54 1			      |	%124 = OpCompositeExtract  %31  %55 1
OpLine %4 29 26						      |	%125 = OpCompositeExtract  %24  %123 0
%58 = OpCompositeExtract  %31  %55 1			      |	%126 = OpCompositeExtract  %24  %124 0
OpLine %6 192 15					      |	%127 = OpFAdd  %24  %125 %126
%59 = OpCompositeExtract  %24  %57 0			      |	%128 = OpCompositeExtract  %24  %123 1
OpLine %6 192 24					      |	%129 = OpCompositeExtract  %24  %124 1
%60 = OpCompositeExtract  %24  %58 0			      |	%130 = OpFAdd  %24  %128 %129
OpLine %13 101 44					      |	%131 = OpCompositeConstruct  %31  %127 %130
%61 = OpFAdd  %24  %59 %60				      |	%133 = OpExtInst  %24  %132 Length %131
OpLine %6 193 15					      |	%135 = OpFSub  %24  %133 %134
%62 = OpCompositeExtract  %24  %57 1			      |	OpReturnValue %135
OpLine %6 193 24					      <
%63 = OpCompositeExtract  %24  %58 1			      <
OpLine %13 101 44					      <
%64 = OpFAdd  %24  %62 %63				      <
OpLine %13 337 44					      <
%65 = OpFMul  %24  %61 %61				      <
%66 = OpFMul  %24  %64 %64				      <
OpLine %13 101 44					      <
%67 = OpFAdd  %24  %65 %66				      <
OpLine %4 31 2						      <
OpReturnValue %67					      <
OpFunctionEnd							OpFunctionEnd

Shortcomings

There are several unresolved issues with the current state. The function recognition on debug that I'd like to exclude in the future. Only one injection point per shader is currently expected. It would be nice to be able to define multiple functions.

Apart from that I did not do a real performance analysis of the injection process yet. A real advantage compared to Nako is that the shader data only needs to be altered if the function's structure changes, not at each value change. But how often that is actually the case heavily depends on the use case.

Another problem by design is the open nature of the Operation trait. In theory an algae foreign implementation could break the guarantees that are assumed (valid data IDs, type IDs etc.) and could break the system. However, this is also an advantage since a user of the library could potentially implement specialized operation. For instance, before discovering the extended instruction set I had a Length operation based on fast inverse square root implemented.

Future works

For now I'll extend the number of operation implementations. I'll probably rework the IntoSpvType trait to allow converting rust types to SpirV types and back. After that I'll move an experimental branch of Nako to use Algae instead of Nako's instruction set and compare the performance of both.

At the moment I am always writing for the use case of injecting signed distance field function into a shader. Algae however does not care what the function really does. So it would be totally possible to inject shading function or anything else at runtime. This probably comes in handy whenever I'll write my next renderer.

Another idea I had was trying out some kind of kinetic ML approach using Algae. The idea is to procedurally define functions through Algae that are tested on the GPU. The most promising configuration could then be improved until acceptable losses are achieved on the training data. An interesting property of this method is, that the resulting model would be some mathematical function contrary to a big matrix configuration. However, this is currently just an idea. Maybe this was done already by someone else.

Closing

The project is currently not in a stable state where it should be used by someone else. However, if you are interested you can use this commit to try it out. I'll try to keep the main branch in a working/compilable state.

If you have ideas, enhancements or questions don't hesitated to write me an email, or tweet or toot or write an issue on the GitLab. My contact info can be found on this blogs index.