ExoBrain
Image generation group test
benchmarks and evalscreative AImodel releasesmultimodal AI

Image generation group test

A comparative test of leading image generation models reveals Midjourney 6.1 and Flux Pro 1.1 as top choices for professionals, while Ideogram 2.0 excels in text rendering.

Joel Miller

Joel Miller

3 min read

As this week’s Adobe MAX conference wraps up, it’s been a busy time for AI in content generation. MAX 2024 was centred around AI. The event’s headline announcement was copyright-safe Firefly Video, a model integrated into Premiere Pro for creating and manipulating video content from text prompts. This AI focus extended across the Creative Cloud suite, with Photoshop gaining a new Generative Workspace and Illustrator benefiting from enhanced AI-powered Image Trace features. Adobe also unveiled Gen Studio, an AI-driven application to streamline workflows between creative and marketing teams.

Google made headlines by rolling out Imagen 3 to all Gemini users, offering free AI image generation with improved quality, albeit with limitations on generating images of people. This was on the back of Black Forest Labs releasing Flux 1.1 Pro last week, claiming it’s six times faster than its predecessor while enhancing image quality and prompt compliance. The model outperformed competitors like Ideogram 2 and Midjourney 6.1 in benchmark tests, particularly in prompt adherence and coherence, and currently tops the Artificial Analysis leaderboard.

We decided to put these new tools to the test, alongside other leading models, using a particularly challenging Claude-crafted prompt. As their ability to follow more nuanced instruction increases, we wanted to see how far this could go. You can draw your own conclusions, but much like their text-based cousins, the models have strengths and weaknesses. The prompt:

Create a photorealistic futuristic cityscape at sunset, viewed from a slight elevation. Blend art deco and bio-organic architecture, featuring a central DNA helix-shaped skyscraper labeled “HELIX TOWER”. Include flying cars, a transparent maglev train tube, vertical gardens, and a park with bioluminescent plants. Show a diverse crowd on a sky bridge, AI robots assisting humans, and delivery drones. Incorporate renewable energy sources, an anti-gravity waterfall, and a localized rain shower. Add a street vendor with a “Galactic Flavors” sign in Hindi script selling alien fruits, a holographic “ZeroG Yoga” ad displaying an impossible pose, and a pet walking service with both robotic and organic pets. Emphasize warm sunset lighting, reflections, and neon signs.

The results:

In summary:

  • Midjourney 6.1 can be trusted to generate a hyper-realistic and cinematic aesthetic. While it struggles with fine-grained prompting and text, its output is generally the most impactful and can be further tuned with its advanced styling tools. Best for professionals.
  • Flux Pro 1.1 is also a good choice for professionals, particularly for creating highly realistic-looking images that still reflect subtle details. In this case, the model was able to include many of the more complex elements.
  • Ideogram 2.0 wins hands down on text. If you need words in your image, it’s the best option every time, and its aesthetic capabilities are improving.
  • OpenAI DALL-E 3 (ChatGPT) is showing its age. It is purposefully tuned to create somewhat cartoonish images, and while it was able to follow some of the detail, the composition was simplistic.
  • Adobe Firefly 3 in image generation form has been out for a while. The results in this test demonstrated a very realistic output, but it wasn’t able to respond to the detail in the prompt.
  • Google’s Imagen 3 (Gemini) seemed to bear out the claims made by Google on its ability to reflect the prompt detail, but the aesthetics and composition may be a matter of taste.

Takeaways: Midjourney retains its position as the best all-round model; we can’t wait for Midjourney 7, which is expected to be launched in the coming weeks. Flux 1.1 is a strong option, is unrestricted by content guidelines, and with some manipulation could challenge it. For an end-to-end professional workflow, it seems like Adobe is as strong as ever, but they’ll need to keep innovating as more of the creativity and manipulation are absorbed into these expanding models.