- Sat 26 April 2025
- Experiments
- akuz
For years Iâve wanted to build a super-dense electronic-music compressor: keep only the loops and phase cues that really matter, then re-synthesise the track perfectly. Evenings and weekends, however, were never long enough to design the model, write the maths, and wrangle PyTorch. Recently I opened ChatGPT running the new o3 model and treated it as a design partner. If we could keep the conversation focused, perhaps we could sketchâand prototypeâthe entire idea in a single stretch.
Co-designing the generative model
We started by deciding how the data should look. I wanted a phase-aware spectrogramâcomplex numbers on an
đš Ă đ
gridârebuilt from a handful of reusable patterns and a sparse list of occurrences. I proposed details; o3 replied with equations. We swapped 3 Ă 3
windows for 5 Ă 5
, removed global gains then re-introduced per-occurrence magnitudes, and replaced hard clamping with bilinear interpolation so gradients would flow. After several iterations we froze a checkpoint: unit-normalised patterns, fractional offsets encoded as phases, occurrences positioned by two complex numbers rather than fixed indices. o3 typeset the whole formulation in LaTeX, and I compiled it into a concise PDF.
Implementingâand debuggingâthe first learning loop
o3 then produced a clean repo: separate modules for patterns, occurrences, a differentiable lattice writer, and a training script. The first run showed falling loss yet every pattern remained zero. In chat we traced the issue to hard gates that silenced magnitudes before gradients could reach them; replacing the mask with soft weights solved the problem immediately, and patterns began to develop non-zero amplitudes and phases. For visibility we added a simple ASCII heat-map that printed the target spectrogram, the reconstruction, and their difference directly in the terminal.
ASCII illustrations for debugging
I initialised the data (grid of complex numbers) to a weavy pattern (ASCII reprentation of the magnitude):
With 5000 occurrences of only 4 patterns, the algorithm was able to compress around 1/3 of the data (obviously the number of occurrences can be increased, but I decided to keep this result so that it shows how this compression is limited by the constraints of the algorithm, namely the number and size of the pattern, and the number of occurrences):
The ASCII imllustration below shows the part of the data that is not described by the algorithm, due to a limited number of patterns and occurrences.
One working day later...
By the evening the model could reconstruct a synthetic test grid with a small dictionary and far fewer occurrences than pixels. No extensive design document, no weekend-long coding marathonâjust a day of iterative conversation with an AI partner. Next steps are clear: push the code to GitHub, train on real electronic tracks, and measure how low we can take the bitrate.
What makes this prototype different
The crucial detail is that occurrences are not tied to the lattice. Each centre is stored as two unit-complex numbers whose phases map to continuous coordinates, so patterns can be placed anywhereâeven between grid cellsâwhile gradients still flow. A single pattern can therefore be reused at arbitrary offsets instead of being cloned for every shift. This first experiment shows that phase-parametrised placement can turn a dense spectrogram into a sparse set of grid-free building blocks, opening the door to extremely compact music compression.
Conclusion
Working with ChatGPT o3 felt like pairing with an always-awake research colleague: every question was answered instantly, every edit compiled on the spot, and roadblocks dissolved in minutes instead of months. An experiment that had lived in my âsomedayâ notebook for yearsâdesigning a grid-free, phase-aware music compressorâwent from sketch to running prototype in a single day of dialogue and iterative coding. Turning long-standing ideas into tangible results this quickly is both liberating and a glimpse of how research will feel in the very near future. Exciting times!
See github repository here.