In recent years, generative models have become capable of generating high-quality music from natural language. However, the mechanisms to adequately respond to repeated trial-and-error and fine-grained nuance adjustments that occur throughout the production process remain in a developmental stage. 近年の生成モデルは、自然言語から高品質な音楽を生成できるようになりました。一方で、制作の過程で繰り返される試行錯誤や細かなニュアンスの調整に、十分に応答できる仕組みはまだ発展途上にあります。
This presentation introduces design approaches based on interactive machine learning, where users can leverage small amounts of local data generated during the production process and manipulate the latent space of generative models. By incorporating exploration and parameter manipulation into an interactive loop, we present a structure that allows generative model outputs to be not merely "selected," but rather integrated into and utilized within one's own production process.
Through research case studies from the presenter, we will introduce visualization of generative models, real-time control, applications to live performance, and design examples as audio plugins and tools. We will discuss new practical approaches for how music generation AI can be integrated into workflows for composition, arrangement, and sound design.
In an era of generative automation, the traditional boundary between artist and audience is dissolving. This session explores the transition of the human voice from a static recording to a dynamic, professional instrument. Drawing on my experience as a Billboard-charting frontman and MBA strategist, I will demonstrate how vocal synthesis—specifically the development of the HXVOC voicebank—enables creators to bypass the 'cold wall' of the algorithm. We will discuss the ethical shift from mass-consumption to distributed authorship, showing that technology will not replace the performer, but empower a global community to build its own legacy.
Differentiable artificial reverberation has the potential to address a wide range of audio machine-learning tasks, including style transfer, blind estimation, and speech enhancement. This research area has grown rapidly, with many new approaches proposed over the past few years, particularly within the field of differentiable digital signal processing. As a result, numerous differentiable reverb architectures have emerged. At the same time, these developments highlight the need for loss functions that properly capture the perceptually important time- and frequency-domain characteristics of reverberation.
In this talk, we will review key results from recent literature with a focus on architectures suitable for real-time applications. Specifically, we will discuss different architecture choices, optimization strategies, and practical insights for designing loss functions tailored to reverberation. We will also explore how standard, off-the-shelf loss functions can be adapted to better handle reverb and reverberated signals. We will conclude with a forward-looking perspective, highlighting current challenges and open research questions, as well as spatial audio applications.
Doctoral Researcher, Aalto University Acoustics Lab
Gloria Dal Santo received the M.Sc. degree in electrical and electronic engineering from the EPFL, Switzerland, in 2022. During her studies, she focused on the modeling and cancellation of acoustic echo. She is currently pursuing a doctoral degree at the Acoustics Lab, Aalto University... Read More →
Wednesday June 3, 2026 11:20am - 12:10pm JST Next 2
In this talk, Tatsuya Shiozawa, Founder and CEO of COCOTONE, Inc., shares his personal experiences and the practical steps he has taken over ten years of starting and sustaining a community for audio programmers.
From self-publishing the technical book JUCE JAPAN—including researching trademarks and negotiating permissions with rights holders—to hosting meetups and study sessions, speaking as a guest at events, and sponsoring conferences and gatherings in the music field, he sets the technical topics aside to share the "reality of community activity" from a practitioner's point of view.
This is the story of "my case": the joys and the quiet frustrations of keeping things going, and both the successes and the challenges along the way—for anyone who has ever thought about getting involved in a community.
When we build cross-platform music apps and plugins, they are mostly desktop and sometimes including iOS, but much less happens on Android. Since Android audio latency has improved a lot by 2026, we have to tackle the next problem: we are missing audio plugin formats on Android. Apple has good ecosystem, so why not designing one for Android?
But you would wonder, why can't we just simply take VST3, CLAP, or LV2 on Android? Because, it is not that simple. We have a lot of lessons learned (or, learning) from Apple AudioUnit V3, along with their efforts on Logic Pro.
Throughout this session we will explain what is tricky to achieve audio plugin functionality on Android through past accomplishments, and how to deal with it. There are many issues such as, publishing audio plugin products from diverse plugin vendors without being tied to a specific DAW, passing audio and event data between a DAW and a plugin, showing plugin GUI on a DAW, and so on. We also discuss what's missing on Android platform itself to achieve full realtime capability within our apps, not just their own framework.
There are many trends on audio plugin development such as MIDI 2.0 integration like (upcoming next-gen. JUCE AudioProcessor), CLAP-first development, AI-capability such as MCP integration. We discuss what kind of features a plugin format should and should NOT tackle, especially taking CLAP as a reference. You would also learn why JUCE cannot be a "format" here.
At last, designing a plugin format is just a milestone and not the goal. We also have to achieve a plugin "ecosystem", which is very often understood as chicken and egg problem. We would discuss this with some existing efforts.
Building cross-platform audio apps is difficult - and for a long time, Android lagged far behind iOS when it came to music-making tools. That's changing. Elementary Audio introduces a new paradigm for audio experiences: by exposing a shared JS API with both web and native renderers, it makes code reuse across platforms feel natural. In this talk, I'll introduce Elementary Audio, walk through react-native-elementary, and demo what's possible to build with it today - including how AI is removing what little friction remains.
Software engineer and music producer based in London. Building Midicircuit at Yonko Level — an interactive app for learning music production — and releasing beats as TXBROWN. Interested in audio engineering, learning UX, and making music technology accessible to everyone.
Real-time convolution reverb is well understood, but continuously synthesizing long, spatial impulse responses (IRs) at runtime remains a significant engineering and perceptual challenge. This session presents a hybrid GPU/CPU acoustics pipeline that synthesizes listener-centric IRs in real time using multi-bounce raytracing. The pipeline is currently integrated into Elemental Games’ proprietary engine for its unannounced open-world debut title.
The system models frequency-dependent absorption, geometric propagation, and spatial encoding using Ambisonics, while balancing physical plausibility with perceptual clarity. Beyond straightforward multi-bounce tracing, the implementation explores performance-aware sampling strategies and hybrid visibility heuristics to better capture the contrast between enclosed and open spaces. Adaptive update strategies dynamically adjust IR refresh rates based on listener motion and scene changes, maintaining perceptual stability while respecting GPU budgets.
IR data is prepared using partitioned FFT processing on the GPU and transferred to the audio thread through a wait-free synchronization model, enabling stable time-varying convolution without blocking real-time audio processing. Particular focus is given to artifact-free IR updates under evolving conditions, including hybrid time- and frequency-domain crossfading techniques.
The talk examines architectural decisions, modeling trade-offs, perceptual post-processing techniques such as diffusion and stochastic smoothing, and the practical constraints of integrating real-time acoustic synthesis into a production engine. Attendees will gain insight into designing hybrid GPU/CPU DSP pipelines that balance physical modeling, runtime performance, and creative control.
Anton Lundberg is a software engineer and audio programmer specializing in high-performance real-time audio systems and game engine architecture. He develops next-generation game audio middleware at elias.audio and leads development of the audio technology stack at Elemental Games... Read More →