(m)ORPH began as an experiment to disrupt traditional DAW-based stereo mixing and evolved into an XR platform for interactive music, immersive spatial-audio listening, and live performance. Using Unity, Wwise, HRTF rendering, and physics-driven behaviors, the system treats audio objects as spatial entities whose distance, motion, and interaction shape both mix and composition in real time. This session examines the architectural decisions, technical implementation, gestural interface design, and intentional abstraction that enable emergent behavior and “musical happy accidents.” Attendees will gain insight into designing interactive audio systems that function as instruments rather than playback engines, and inspire a new breed of music lovers who want to actively engage rather than passively consume.
This session explores the implementation of an oscillator that leverages the 50MHz clock of an inexpensive FPGA to generate waveforms at ultra-high sample rates far beyond the audible range. By concentrating computational resources on the sample rate rather than bit depth, we achieve alias-free output of harmonics exceeding 100kHz. We will share practical know-how for fully integrating the entire process on-chip—from computation to audio output via a 1-bit ΔΣ DAC—without relying on microcontrollers or external DAC ICs.
AlphaTheta utilizes JUCE in multiple DJ products for audio processing and cross-platform development. In this presentation, we will briefly introduce the product areas adopting JUCE and the main elements being utilized (Audio/DSP, GUI, peripheral tools/standardization) based on concrete examples.
We will explain how JUCE is implemented within the constraints specific to DJ products, which aspects are customized for our company's needs, and discuss how we plan to leverage it going forward.
This talk presents a methodology for working with very large, GPT-like deep learning models trained on (open and ethically sourced) MIDI data. This approach promotes nuanced, musical interfacing with the model, requiring practice and skill development instead of one-shot text-based prompting.
The full machine-learning pipeline is presented, including data pre-processing, tokenization, model training and inference. The presented system will be used to demonstrate multiple generative examples created through musical interaction with Large Piano Models.
This talk presents a modern alternative to C++-dominated audio plugin development. We will explore how to build commercial-grade, multi-format plugins (CLAP, VST3, AU, Standalone) without writing a single line of C++, by using Rust for DSP/logic and Web technologies for the GUI. Based on real-world adoption in NovoNotes products, we will cover the "CLAP First" architecture with clap-wrapper, solving async task management (run_loop), integrating WebViews (wxp), and comparing this approach with JUCE.
In recent years, generative models have become capable of generating high-quality music from natural language. However, the mechanisms to adequately respond to repeated trial-and-error and fine-grained nuance adjustments that occur throughout the production process remain in a developmental stage.
This presentation introduces design approaches based on interactive machine learning, where users can leverage small amounts of local data generated during the production process and manipulate the latent space of generative models. By incorporating exploration and parameter manipulation into an interactive loop, we present a structure that allows generative model outputs to be not merely "selected," but rather integrated into and utilized within one's own production process.
Through research case studies from the presenter, we will introduce visualization of generative models, real-time control, applications to live performance, and design examples as audio plugins and tools. We will discuss new practical approaches for how music generation AI can be integrated into workflows for composition, arrangement, and sound design.
In an era of generative automation, the traditional boundary between artist and audience is dissolving. This session explores the transition of the human voice from a static recording to a dynamic, professional instrument. Drawing on my experience as a Billboard-charting frontman and MBA strategist, I will demonstrate how vocal synthesis—specifically the development of the HXVOC voicebank—enables creators to bypass the 'cold wall' of the algorithm. We will discuss the ethical shift from mass-consumption to distributed authorship, showing that technology will not replace the performer, but empower a global community to build its own legacy.
Differentiable artificial reverberation has the potential to address a wide range of audio machine-learning tasks, including style transfer, blind estimation, and speech enhancement. This research area has grown rapidly, with many new approaches proposed over the past few years, particularly within the field of differentiable digital signal processing. As a result, numerous differentiable reverb architectures have emerged. At the same time, these developments highlight the need for loss functions that properly capture the perceptually important time- and frequency-domain characteristics of reverberation.
In this talk, we will review key results from recent literature with a focus on architectures suitable for real-time applications. Specifically, we will discuss different architecture choices, optimization strategies, and practical insights for designing loss functions tailored to reverberation. We will also explore how standard, off-the-shelf loss functions can be adapted to better handle reverb and reverberated signals. We will conclude with a forward-looking perspective, highlighting current challenges and open research questions, as well as spatial audio applications.
NKIDO is a live-coding audio environment built from scratch: a Tidal-inspired pattern language, a zero-allocation C++20 bytecode VM with 95+ DSP opcodes, and a browser IDE running it all via WebAssembly. This talk covers the language design, the runtime internals, and what it's like to vibe-code 60,000 lines of real-time audio C++ with AI.
Building cross-platform audio apps is difficult - and for a long time, Android lagged far behind iOS when it came to music-making tools. That's changing. Elementary Audio introduces a new paradigm for audio experiences: by exposing a shared JS API with both web and native renderers, it makes code reuse across platforms feel natural. In this talk, I'll introduce Elementary Audio, walk through react-native-elementary, and demo what's possible to build with it today - including how AI is removing what little friction remains.
Software engineer and music producer based in London. Building Midicircuit at Yonko Level — an interactive app for learning music production — and releasing beats as TXBROWN. Interested in audio engineering, learning UX, and making music technology accessible to everyone.
Hatsune Miku has evolved beyond a mere sound source into a "singing voice synthesizer" equipped with advanced expressiveness and real-time responsiveness. This session explains the core technologies of real-time singing voice synthesis developed to meet these requirements, focusing on the architectural shift from conventional subtractive synthesis-based singing synthesis methods to additive synthesis-based approaches.
We delve into fundamental technical challenges in singing voice synthesis: "balancing computational cost with the fidelity of spectral reconstruction" and "ensuring precise controllability without compromising naturalness." In particular, we detail why the additive synthesis architecture was adopted, and discuss the advantages and trade-offs in time-series fidelity and spectral manipulation flexibility compared to other methods such as subtractive synthesis.
Additionally, as optimization strategies for maintaining real-time performance in general consumer environments, we address parameter compression concepts and computational load management techniques. Finally, we share future perspectives including SDK-oriented design to support next-generation creativity and engine extensibility.