Udio sets a new benchmark in music generation
The launch of Udio highlights rapid progress in AI music generation, raising questions about copyright and the enduring social value of human-created art.
Joel Miller

This week the launch of the new music generation service Udio put AI’s impact on music and the creative world front and centre. Services such as Udio (from will.i.am and some former Google DeepMind researchers), Suno or Stability AI’s recently launched Audio are giving people with the tools to dynamically create high quality music in any genre, both with generated or provided lyrics or in instrumental form. Meanwhile Spotify, now somewhat the old guard, adopt AI through their prompt based playlist generation tool. Udio has impressed with its notably higher levels of audio fidelity and fluency, although there is still some way to go for these services to replicate the clarity and impact of a professionally prepared track. Its also interesting to note that humans seem much more sensitive to audio anomalies than those found in other forms of generation. But much like with text, image and video the progress is rapid, we are probably no more than a year away from creative works that can be generated in any medium that are indistinguishable from professional output.
Before we get onto the wider reflection, a quick word on the tech. Suno, and likely Udio, are re-using language (LLM) architecture, but they’re feeding in fragments (or tokens) of audio not text. These models are trained on lots of ‘tokenised’ musical sequences (no doubt meaning more battles over copyrighted materials) and they teach themselves to predict tokens that could come next. This approach seems to be working for everything… From video, to music, to DNA, and even sensor data and robotic movements (see our research news)… chop up a pattern, feed it into a giant neural network and hey presto. And the bigger the neural network, the more compute, the better the output.
The economic and scientific impacts of this tech are driving a productivity revolution. But creativity and productivity are not the same. And our evaluation and consumption of creative outputs are fundamentally different from the way we treat the fruits of industrial transformation. Music in particular has unique social value; it can induce powerful emotional responses, physically synchronises the brain activity of live audiences, helps us form life-long identities, is bound up in our memories, and plays central roles in our rituals and community. It has a precious and finite value. And we humans are absolutely hard wired to appreciate its scarcity. We signal our status by demonstrating our exclusive access, and we manufacture scarcity by assigning attributes that create rarity or personal significance. Plus young and old we can’t seem to get enough of the excitement generated a concentrated group of cultural icons. The recording and streaming tech revolutions of the 20th and 21st centuries already changed music’s unit economics but these human and social fundamentals did not and will not change. We may come to deify AI artists… Claude has a controversial side that is already building a devout following in the world of text performance (more on that next week)… but whether AI, human or human augmented music superstars, we’ll tune out the noise to maintain the scarcity.
The music industry as we know it is likely to continue to grow, for now. But the music industry is a ~10th the size of the gaming industry. Our linear, time bound music consumption limits the amount of music that we consume. Perhaps we can look to younger generations, who are increasingly combining their consumption of music, video and gaming experiences in fragmented non-linear ways. Multi-modal AI generation could inspire decentralised, globally networked but hyper-personal open worlds. Curation, scarcity, rarity, connection, and meaning may be infinitely shaped and transmitted in new forms of interactive play. As with language models, music generation models will get open-sourced and become easy to run on your own machine, to train on your own musical tastes. The creation and sharing of music in this form will be less a centralised industry and more a new interactive entrainment medium, and that is likely the only path to a workable business model for these new services.
As Sam Altman teased this week (from his self-appointed position on the other side of the looking glass); “movies are going to become video games and video games are going to become something unimaginably better”. Perhaps music will also evolve into new forms, we’ll see an expansion of what is possible, rather than a replacement for the current music industry. Suno and Udio are part revolutionary, part conventional first experiments on the journey to an ever more decentralised and fluid creative landscape.
Takeaways: Here’s an uplifting ExoBrain rap… You can draw your own conclusions on whether the world needs more or less AI innovation hip hop, but Udio’s ability to generate believable music is undoubtedly impressive.