
We’ve covered the impact of increasing compute and adding branching search at the point of use in the context of text and OpenAI’s o1 model, and a new paper this week has explored what this can do for image generation. This picture shows the impact of increasing compute on AI image quality in several ways. The first row shows standard processing with more denoising steps, the lower set shows results from advanced search techniques. (Prompt: “Photo of an athlete cat explaining it’s latest scandal at a press conference to journalists.”)
Takeaways: Unlike LLMs which are typically fixed after training, image models can be adjusted during generation by changing the number of generation steps or using search methods to find better starting points. The research shows that additional compute at generation time leads to better quality images, with smaller models sometimes matching larger ones when given extra processing power. The team tested this with both class-based image generation and text-to-image generation. A key finding was that a model around one-tenth the size could match or outperform the much larger Flux model, suggesting efficient scaling is possible in the future for both image and video generation
