Date: 07.03.2022
In my previous post I closed with the realization that a GPU based interpreter works, but is too slow to create complex signed distance fields. Another shortcoming of the former approach was the linear concatenation of operations. This made it difficult to work with operations like infinite repetition, since only “everything till now”, or nothing could be repeated.
The new approach was already teased as
Can I somehow use the compiler at runtime to inline my SDF directly into the shader code?
While working on the basic concept I realized that using Rustc / Rust-Gpu would not give me the control I’d like. Therefore, the shader creation is split into two parts.
Step one is done, as with any rust shader at compile-time. It emits a
valid SpirV-Module that could be used as is. The resulting shader module
contains an injection function that serves as entry point when
injecting new code at runtime via algae. The whole injection point
definition is abstracted into a function-like proc-macro
(algae_gpu::algae_inject
).
This injection point allows the injected code to be tied to the rest
of the shader easily. Supplied variables could either be sourced from
push constants or be runtime parameters of the shader. In the example
below the coord
variable is a per-shader pixel coordinate
and offset
is read from a push constant. The actual test
shader can be found here.
CPU-side rust code is now free to define a possibly complex operation that is injected at this point in the shader.
Before the technical implementation is discussed it makes sense to show the usage from the user’s perspective.
GPU site inject function definition and parameter handling: The algae_inject macro must return a valid function. The parameter names are used for later recognition when injecting a function. Again a full example of this code can be found here. It is possible that the usage of the macro might change in the future.
///Defines an injection function with two runtime parameters, as well as
///default function if nothing is injected.
algae_gpu::algae_inject!(|coord: Vec2, offset: Vec2| -> f32 {
let a = coord.value + offset.value; //our default function, could be returning 0.0 as well.
.dot(a)
a});
//...
fn main(..){
//... normal rust-gpu shader code, at some point we want to evaluate the function injected by
//algae:
let result: f32 = algae_inject(coord, Vec2::from(push.offset));
///...
}
The application loads a SpirV module from source (or bytes) first and searches for an injection function.
The application is free to define any function that fulfills basic
compile time checks (mostly type safety). This is done through the
Operation
trait. This function can then be injected at
runtime. The AlgaeJit
returns the final shader byte-code
that is used for pipeline creation.
//create a JIT instance for the SpirV module.
let mut compiler = AlgaeJit::new("resources/test_shader.spv").unwrap();
//Define a function that offsets `coord` based on `offset` and returns the signed distance from the
// offseted `coord` to s circle with a constant radius=100
let mut function = Subtraction {
: Box::new(Length {
minuent: Box::new(Addition{
inner: Box::new(Variable::new("coord", Vec2::new(0.0, 0.0))),
a: Box::new(Variable::new("offset", Vec2::new(0.0, 0.0)))
b}),
}),
: Box::new(Constant { value: 100.0 }),
subtrahend};
//inject into the SpirV module. This can be done everytime the function
//changes and should be *fast*.
.injector().inject((), &mut function); compiler
The example shown above injects a circle-sdf at runtime where the
position is controlled by a offset
parameter. This
parameter is defined by a push constant which is written by the
application code. Since the parameter is defined as a variable,
injection only has to take place once at application startup. The
resulting graphics look like this:
For most users it would be enough to understand algae up to this point. However, most people reading this will be interested in the implementation of all this, so keep reading :D.
The injection macro takes care of creating the function which can be changed at run time as well as preparing runtime parameters to be recognizable in the resulting SpirV-bytecode.
The easiest way to recognize any function is of course to search for
the function’s definition in the byte code. In SpirV bytecode this looks
something like this: %16 = OpFunction %24
which defines a
callable function %16
with a return value of type
%24
. At the moment this definition however has two
problems. The first being that we don’t know if this is the correct
float returning function and the second being that this
function would probably be inlined when optimizing the code. We
therefore add more context information. The first being a
NoInline
attribute, as well as some debug information that
contains the functions name as a simple string.
The function name solution is not final, since it prevents the shader from being striped of its debug information before being passed to the driver. However, for now it works well enough.
The final function definition and call look like this:
Definiton
%24 DontInline %32
%16 = OpFunction %15
%54 = OpFunctionParameter %15
%55 = OpFunctionParameter -gpu generated for the default function
//... whatever rust%67
OpReturnValue OpFunctionEnd
Function call
%15 %41 %94
%113 = OpCompositeConstruct %15 %42 %99
%114 = OpCompositeConstruct %24 %16 %113 %114 %115 = OpFunctionCall
As you can see the function call parameters are constructed right
before the actual call via OpCompositeConstruct
. This is
the second part I mentioned above: Parameter preparation.
Each parameter (in the example “coord” and “offset”) are wrapped in a struct where the first field is a constant 32bit hash of the name, and the second element is the actual runtime parameter.
This way each parameter can not only be identified at runtime by its
type, but also its name. This makes it possible in the example to
distinguish coord: Vec2
from offset: Vec2
.
A similar marking mechanism could probably be used to identify the correct, not inlined function as well. But I did not find a reliable way yet that does not get stripped when using optimized SpirV.
Since we left enough traces in the SpirV-module we just have to
search for those traces when loading the byte code in
AlgaeJit
. Each parameter’s type is analyzed at runtime to
prevent runtime type mismatches in the shader. They are cought by
Vulkan’s validation layers, but I am not sure what would happen without
those. Probably depends on the driver that is being use.
For interaction with the SpirV-module and new byte-code construction rspirv is used. Thanks to Khyperia for hinting this crate, otherwise I might have rolled my own thing, which would have been much more tedious.
Since the entry point is now well-defined it is time to talk about the actual function definition and byte-code construction.
At first, I wanted to create some kind of custom intermediate representation (IR) which is then serialized either to SpirV or some other instruction set like x86 or RiscV. This would have allowed me to test the generated shader on the CPU first before testing them on a GPU similar to what I did with Nako’s interpreter implementation.
I decided against it since this would effectively mean that I would have to design a pretty big and complex IR that translates to SpirV which in turn is also only an IR. This feels kind of redundant at the moment. Since I am only injecting a sub-set of SpirV anyways (no image operations for instance) it would be easier to write a small SpirV interpreter anyways to the generated code.
Therefore, the only requirement for the function definition is that it is serializable into SpirV. The function itself is defined by a tree of operations. For instance the circle function is the operation tree:
In Rust this comes down to the Operation
trait:
pub trait Operation {
type Input;
type Output;
fn serialize(&mut self, serializer: &mut Serializer, input: Self::Input) -> Self::Output;
}
In most cases Input
is some kind of jit-compile-time
information or nothing, and Output
is the variable-id of
the result of this operation.
A variable ID is represented as DataId<T>
where
T
is the rust type of this ID’s value. For instance
f32
or glam::Vec2
. This way the rust compiler
can check type safety and implementations of the Operation
trait can be made generic. The IntoSpvType
trait allows
turning Rust types into SpirV types which is needed for the former
mentioned runtime type checking of variables. Currently, this is only
implemented for some basic types like floats, integers and glam’s Vec
and Mat types. A derive-macro for structs would enable custom structs to
be used in algae functions as well.
Since the serializer is only a thin wrapper over rspirv’s
dr::Builder
each operation is free to serialize anything.
This allows for instance the Length
operation (which
returns the euclidean length of a vector) to use the extended
instruction set GLSL.std.450
.
To end the technical overview with the example again a simple before/after diff of the injected circle-SDF function with two runtime parameters looks like this:
%24 DontInline %32 %16 = OpFunction %24 DontInline %32
%16 = OpFunction %15 %54 = OpFunctionParameter %15
%54 = OpFunctionParameter %15 %55 = OpFunctionParameter %15
%55 = OpFunctionParameter | %122 = OpLabel
%56 = OpLabel %4 29 12 | %123 = OpCompositeExtract %31 %54 1
OpLine %31 %54 1 | %124 = OpCompositeExtract %31 %55 1
%57 = OpCompositeExtract %4 29 26 | %125 = OpCompositeExtract %24 %123 0
OpLine %31 %55 1 | %126 = OpCompositeExtract %24 %124 0
%58 = OpCompositeExtract %6 192 15 | %127 = OpFAdd %24 %125 %126
OpLine %24 %57 0 | %128 = OpCompositeExtract %24 %123 1
%59 = OpCompositeExtract %6 192 24 | %129 = OpCompositeExtract %24 %124 1
OpLine %24 %58 0 | %130 = OpFAdd %24 %128 %129
%60 = OpCompositeExtract %13 101 44 | %131 = OpCompositeConstruct %31 %127 %130
OpLine %24 %59 %60 | %133 = OpExtInst %24 %132 Length %131
%61 = OpFAdd %6 193 15 | %135 = OpFSub %24 %133 %134
OpLine %24 %57 1 | OpReturnValue %135
%62 = OpCompositeExtract %6 193 24 <
OpLine %24 %58 1 <
%63 = OpCompositeExtract %13 101 44 <
OpLine %24 %62 %63 <
%64 = OpFAdd %13 337 44 <
OpLine %24 %61 %61 <
%65 = OpFMul %24 %64 %64 <
%66 = OpFMul %13 101 44 <
OpLine %24 %65 %66 <
%67 = OpFAdd %4 31 2 <
OpLine %67 <
OpReturnValue OpFunctionEnd OpFunctionEnd
There are several unresolved issues with the current state. The function recognition depends on debug information that I’d like to exclude in the future. Only one injection point per shader is currently expected. It would be nice to be able to define multiple functions.
Apart from that I did not do a real performance analysis of the injection process yet. A real advantage compared to Nako is that the shader data only needs to be altered if the function’s structure changes, not at each value change. But how often that is actually the case heavily depends on the use case.
Another problem by design is the open nature of the
Operation
trait. In theory an algae foreign implementation
could break the guarantees that are assumed (valid data IDs, type IDs
etc.) and could break the system. However, this is also an advantage
since a user of the library could potentially implement specialized
operations. For instance, before discovering the extended instruction
set I had a Length
operation based on fast
inverse square root implemented.
For now I’ll extend the number of operation implementations. I’ll
probably rework the IntoSpvType
trait to allow converting
rust types to SpirV types and back. After that I’ll move an experimental
branch of Nako to use
Algae instead of Nako’s instruction set and compare the performance of
both.
At the moment I am always writing for the use case of injecting signed distance field function into a shader. Algae however does not care what the function really does. So it would be totally possible to inject shading function or anything else at runtime. This probably comes in handy whenever I’ll write my next renderer.
Another idea I had was trying out some kind of kinetic ML approach using Algae. The idea is to procedurally define functions through Algae that are tested on the GPU. The most promising configuration could then be improved until acceptable losses are achieved on the training data. An interesting property of this method is, that the resulting model would be some mathematical function contrary to a big matrix configuration. However, this is currently just an idea. Maybe this was done already by someone else.
The project is currently not in a stable state where it should be used by someone else. However, if you are interested you can use this commit to try it out. I’ll try to keep the main branch in a working/compileable state.
If you have ideas, enhancements or questions don’t hesitated to write me an email, or tweet or toot or write an issue on the GitLab. My contact info can be found on this blogs index.