Signed distance field based synthesizer
Idea and motivation
In the winter of 2021 I wrote my bachelors thesis named Signed distance field based sound modeling for intuitive audio creation interfaces.
As you can see at 1:00 there is a 3D representation of whatever I build. So the Idea was: "What if I could model this surface?"
After some research the main points of interest where identified (actually most of the work for the thesis ^^).
- How to create the sound (time domain / frequency domain)?
- How does modeling work related to sound creation?
- How could a graphical interface look like for such a tool?
- ... And some less important questions...
Obviously there are multiple questions spawning from those. It was planned to use hand tracking for the input, since this makes a lot of sense here. I couldn't implement it fast enough however, so I had to default back to good old mouse+keyboard input.
All the theoretical stuff isn't discussed here, but maybe you can find the thesis by its name, or send me a message if you are interested.
The easy stuff first: 3D signed distance field rendering. Since I already had Nako at this point it makes sense to use it here as well. It was a good real world test to check what works and what doesn't. One aspect of Nako that directly influenced the development of the synth is its time-line like way of adding primitives to each other. The way Nako combines those is by taking a string of operations. Something like Take this sphere, remove this cube, then add this plane etc.. While this is a technical limitation in reality (I wrote about it here) this actually works quite well. The order can be expressed easily in the interface and the result is predictable.
Apart from that everything was already prepared by Nako and its companion crates.
A perk of using Nako's renderer was that I got free 2D (also SDF based) rendering for free. Including features like caching already rendered and unchanged layers between frames. This lets the GPU work on the 3D rendering almost exclusively. While there are already some 2D rendering components prepared (in the
nako_std crate), like text rendering and some simple buttons, I ended up hard coding most of the interface elements specifically for the synth. I noticed that handling the compositor by hand is not really practical for bigger projects like this.
The whole purpose of creating the 3D surface is to retrieve some kind of signal. This is currently done by stepping along a plane and checking the height of the surface at the given location. The resulting set of points is the base signal that is either played back as is (time domain), or transformed into the time domain from the frequency domain using iFFT.
While the time domain is straight forward the frequency domain wasn't as easy to implement. Mostly because I didn't want to give up on the multi-voice feature of the synth. At the moment whenever the model changes a time-domain base signal is created via iFFT. This signal is then pitched to the correct offset for each voice's key. Therefore the amount of iFFT transformations keeps constant and pitching the voices is mostly a matter of reading the base signal more or less fast. Obviously this is not the correct way. This pitching strategy introduces artefacts from resampling, and there is no interaction between the voices frequencies. However doing it the right way requires a lot more engineering which wasn't practical at the time.
Nako and graphics
The biggest problem of the synth is currently its performance. It needs a reasonably powerful graphics card to even start. For instance my notebook's
Intel HD 4000 won't even render one frame.
Nako also scales with model complexity. Therefore creating highly detailed models is currently not practical. This, paired with the already mentioned Operation Stream nature is what led me to start Algae.
The 2D UI which is also based on Nako is quite tedious to use. Which brings me to the conclusion that it might be time to overcome not invented here syndrome and start using toolkits like EGUI. UI Toolkit seem to be hard, at least even my third attempt isn't as good as I hoped.
After getting my bachelors for the thesis and the program I got the chance to evolve the project and submit it to a conference under the title Creative Sound Modeling with Signed Distance Fields at the Mensch und Computer conference (MuC). Below is a small demonstration video of the final version.