The convergence of AI and cloud computing is revolutionizing audio development. This session explores how AWS cloud services enable audio developers to build scalable, AI-powered applications for speech recognition, audio synthesis, real-time streaming, and generative audio content.
We'll demonstrate practical architectures for:
Real-time audio processing with AI-based dubbing and translation using AWS Media Services Speech synthesis and recognition using Amazon Polly, Transcribe, and generative AI models Scalable audio streaming architectures with Amazon Kinesis and serverless computing Building audio ML models with Amazon SageMaker and deploying them at scale Sentiment analysis from audio data using AWS generative AI services Attendees will learn cloud-native patterns for audio development, including containerization with Kubernetes, event-driven architectures, and GPU-optimized infrastructure for AI workloads.
Vishal Alhat is a Developer Advocate at Amazon Web Services (AWS) and a former AWS Hero, recognized for his significant contributions to the AWS community. With 11+ years of experience in cloud technologies, Vishal specializes in DevOps, cloud security, and AI/ML.As an active community... Read More →
Wednesday June 3, 2026 9:00am - 10:00am JST Next 1
How close is close enough when modeling analog ladder filters on a $1-class microcontroller?
Microcontrollers operate under strict computational constraints that fundamentally differ from desktop virtual analog environments. When implementing Moog-style ladder filters in such systems, defining and evaluating analog-likeness becomes a practical engineering challenge.
This work is motivated by the development of a virtual analog synthesizer running on an RP2350 microcontroller, designed to deliver a convincing analog feel. Practical evaluation metrics are consolidated, including resonance peak alignment, Q consistency, normalized harmonic spectra, level-dependent cutoff shift, and self-oscillation behavior. The talk also discusses how these metrics can be meaningfully and reproducibly measured.
Ladder filter implementations drawn from published algorithms, open-source audio libraries such as Teensy Audio Library, DaisySP, and JUCE, are ported to the RP2350 and evaluated against a SPICE circuit simulation serving as a reproducible analog reference. Measurement results are presented to examine how closely each implementation can approach analog behavior under strict hardware constraints.
Modern audio plug-in development still pays a steep portability tax: separate UI/DSP stacks per OS and host, repeated rebuilds, and validation that’s hard to automate. This talk introduces a “write once, run everywhere” approach that treats Chromium as a compatibility runtime and tunnels its audio graph directly into a DAW plug-in host (VST)—allowing a single codebase for both UI and DSP to run across environments.
Beyond portability, the runtime enables a DevOps-style workflow for audio: externally controlled timing, deterministic offline rendering, and CI-friendly regression testing for AudioWorklet-style processing. We’ll present a working proof-of-concept, outline the key architectural choices and trade-offs, and show how this foundation can unlock faster iteration at ecosystem scale—especially as automated and AI-assisted development becomes the norm.
移植性を超えて、このランタイムはオーディオに対する DevOps スタイルのワークフローを実現します。外部から制御されるタイミング、確定的なオフラインレンダリング、AudioWorklet スタイルの処理に対する CI フレンドリーな回帰テストです。実装可能な概念実証を提示し、主要なアーキテクチャの選択とトレードオフについて説明し、この基盤がどのようにしてエコシステム規模での迅速な反復を実現できるかを示します。特に自動化と AI 支援開発が標準化されるにつれて、その重要性が高まります。
When we build cross-platform music apps and plugins, they are mostly desktop and sometimes including iOS, but much less happens on Android. Since Android audio latency has improved a lot by 2026, we have to tackle the next problem: we are missing audio plugin formats on Android. Apple has good ecosystem, so why not designing one for Android?
But you would wonder, why can't we just simply take VST3, CLAP, or LV2 on Android? Because, it is not that simple. We have a lot of lessons learned (or, learning) from Apple AudioUnit V3, along with their efforts on Logic Pro.
Throughout this session we will explain what is tricky to achieve audio plugin functionality on Android through past accomplishments, and how to deal with it. There are many issues such as, publishing audio plugin products from diverse plugin vendors without being tied to a specific DAW, passing audio and event data between a DAW and a plugin, showing plugin GUI on a DAW, and so on. We also discuss what's missing on Android platform itself to achieve full realtime capability within our apps, not just their own framework.
There are many trends on audio plugin development such as MIDI 2.0 integration like (upcoming next-gen. JUCE AudioProcessor), CLAP-first development, AI-capability such as MCP integration. We discuss what kind of features a plugin format should and should NOT tackle, especially taking CLAP as a reference. You would also learn why JUCE cannot be a "format" here.
At last, designing a plugin format is just a milestone and not the goal. We also have to achieve a plugin "ecosystem", which is very often understood as chicken and egg problem. We would discuss this with some existing efforts.
The Head-Related Transfer Function (HRTF) is a key technology for three-dimensional binaural audio rendering. However, issues regarding audio quality and HRTF personalization must be resolved for this technology to be adopted more widely. When HRTFs are applied to music production, audio quality may become problematic. Additionally, since HRTFs exhibit significant individual variation, personalized HRTFs—that is, HRTFs measured or customized for each user—are desirable, but cost becomes an issue. Therefore, for widespread adoption of HRTFs, a typical HRTF that provides consistent effectiveness for everyone is needed.
The speaker proposes using Generalized HRTF (GHRTF) based on machine learning as a solution to these problems. This presentation first outlines the fundamentals and challenges of HRTFs and binaural rendering. Then it presents the definition of GHRTFs that achieve high audio quality, along with estimation methods based on machine learning and their results. Next, the presentation demonstrates a learning method for Typical GHRTFs based on data from numerous subjects and provides estimation examples. Finally, the presentation describes its application to SoundObject, an object-based three-dimensional spatial audio VST 3 plug-in that the speaker has made freely available to the public. The presentation concludes that this approach yields clearer directionality and higher audio quality compared to conventional dummy head HRTFs.
The presentation materials are in both English and Japanese.
頭部伝達関数 (Head-Related Transfer Function: HRTF) はバイノーラル再生による立体音響のキーテクノロジーです.しかし,この技術の普及には音質と頭部伝達関数の個人化の問題を解決する必要があります.頭部伝達関数を音楽制作に適用した場合,音質が問題となる場合があります.また,頭部伝達関数は個人差が大きいため,頭部伝達関数の個人化,即ち利用者毎に計測ないしカスタマイズした頭部伝達関数の使用が望ましいが,コストが問題となります.従って,頭部伝達関数の普及には,誰でも一定の効果が得られる典型的な頭部伝達関数が必要となります.
Areas of expertise: analog and digital signal processing, circuit design, computer architecture, low-level programming, and UNIX kernel. 得意分野は,アナログおよびディジタル信号処理,回路設計,コンピュータアーキテクチャ,低レベルプログラミング,UNIX... Read More →
Real-time convolution reverb is well understood, but continuously synthesizing long, spatial impulse responses (IRs) at runtime remains a significant engineering and perceptual challenge. This session presents a hybrid GPU/CPU acoustics pipeline that synthesizes listener-centric IRs in real time using multi-bounce raytracing. The pipeline is currently integrated into Elemental Games’ proprietary engine for its unannounced open-world debut title.
The system models frequency-dependent absorption, geometric propagation, and spatial encoding using Ambisonics, while balancing physical plausibility with perceptual clarity. Beyond straightforward multi-bounce tracing, the implementation explores performance-aware sampling strategies and hybrid visibility heuristics to better capture the contrast between enclosed and open spaces. Adaptive update strategies dynamically adjust IR refresh rates based on listener motion and scene changes, maintaining perceptual stability while respecting GPU budgets.
IR data is prepared using partitioned FFT processing on the GPU and transferred to the audio thread through a wait-free synchronization model, enabling stable time-varying convolution without blocking real-time audio processing. Particular focus is given to artifact-free IR updates under evolving conditions, including hybrid time- and frequency-domain crossfading techniques.
The talk examines architectural decisions, modeling trade-offs, perceptual post-processing techniques such as diffusion and stochastic smoothing, and the practical constraints of integrating real-time acoustic synthesis into a production engine. Attendees will gain insight into designing hybrid GPU/CPU DSP pipelines that balance physical modeling, runtime performance, and creative control.
Anton Lundberg is a software engineer and audio programmer specializing in high-performance real-time audio systems and game engine architecture. He develops next-generation game audio middleware at elias.audio and leads development of the audio technology stack at Elemental Games... Read More →