In recent years, generative models have become capable of generating high-quality music from natural language. However, the mechanisms to adequately respond to repeated trial-and-error and fine-grained nuance adjustments that occur throughout the production process remain in a developmental stage.
This presentation introduces design approaches based on interactive machine learning, where users can leverage small amounts of local data generated during the production process and manipulate the latent space of generative models. By incorporating exploration and parameter manipulation into an interactive loop, we present a structure that allows generative model outputs to be not merely "selected," but rather integrated into and utilized within one's own production process.
Through research case studies from the presenter, we will introduce visualization of generative models, real-time control, applications to live performance, and design examples as audio plugins and tools. We will discuss new practical approaches for how music generation AI can be integrated into workflows for composition, arrangement, and sound design.
In an era of generative automation, the traditional boundary between artist and audience is dissolving. This session explores the transition of the human voice from a static recording to a dynamic, professional instrument. Drawing on my experience as a Billboard-charting frontman and MBA strategist, I will demonstrate how vocal synthesis—specifically the development of the HXVOC voicebank—enables creators to bypass the 'cold wall' of the algorithm. We will discuss the ethical shift from mass-consumption to distributed authorship, showing that technology will not replace the performer, but empower a global community to build its own legacy.
Differentiable artificial reverberation has the potential to address a wide range of audio machine-learning tasks, including style transfer, blind estimation, and speech enhancement. This research area has grown rapidly, with many new approaches proposed over the past few years, particularly within the field of differentiable digital signal processing. As a result, numerous differentiable reverb architectures have emerged. At the same time, these developments highlight the need for loss functions that properly capture the perceptually important time- and frequency-domain characteristics of reverberation.
In this talk, we will review key results from recent literature with a focus on architectures suitable for real-time applications. Specifically, we will discuss different architecture choices, optimization strategies, and practical insights for designing loss functions tailored to reverberation. We will also explore how standard, off-the-shelf loss functions can be adapted to better handle reverb and reverberated signals. We will conclude with a forward-looking perspective, highlighting current challenges and open research questions, as well as spatial audio applications.
NKIDO is a live-coding audio environment built from scratch: a Tidal-inspired pattern language, a zero-allocation C++20 bytecode VM with 95+ DSP opcodes, and a browser IDE running it all via WebAssembly. This talk covers the language design, the runtime internals, and what it's like to vibe-code 60,000 lines of real-time audio C++ with AI.
Building cross-platform audio apps is difficult - and for a long time, Android lagged far behind iOS when it came to music-making tools. That's changing. Elementary Audio introduces a new paradigm for audio experiences: by exposing a shared JS API with both web and native renderers, it makes code reuse across platforms feel natural. In this talk, I'll introduce Elementary Audio, walk through react-native-elementary, and demo what's possible to build with it today - including how AI is removing what little friction remains.
Software engineer and music producer based in London. Building Midicircuit at Yonko Level — an interactive app for learning music production — and releasing beats as TXBROWN. Interested in audio engineering, learning UX, and making music technology accessible to everyone.
Hatsune Miku has evolved beyond a mere sound source into a "singing voice synthesizer" equipped with advanced expressiveness and real-time responsiveness. This session explains the core technologies of real-time singing voice synthesis developed to meet these requirements, focusing on the architectural shift from conventional subtractive synthesis-based singing synthesis methods to additive synthesis-based approaches.
We delve into fundamental technical challenges in singing voice synthesis: "balancing computational cost with the fidelity of spectral reconstruction" and "ensuring precise controllability without compromising naturalness." In particular, we detail why the additive synthesis architecture was adopted, and discuss the advantages and trade-offs in time-series fidelity and spectral manipulation flexibility compared to other methods such as subtractive synthesis.
Additionally, as optimization strategies for maintaining real-time performance in general consumer environments, we address parameter compression concepts and computational load management techniques. Finally, we share future perspectives including SDK-oriented design to support next-generation creativity and engine extensibility.