MidJourney has just announced its newest AI image generator model, the V6 base model, in the crowded race to rule the realm of digital creativity. Rolling out for alpha testing today, the development team says V6 features enhanced prompt accuracy, improved coherence, and—for the first time in MidJourney’s evolution— text generation capabilities.
Announced in an official Discord post, V6 is positioned as a major overhaul.
“Much more accurate prompt following as well as longer prompts, improved coherence, and model knowledge,” reveals the announcement, highlighting its advancement over the previous V5.1 model launched in May 2023. The V5 model, noted for its easy-to-use short prompts and aesthetic improvement, paved the way for the more sophisticated and detailed V6.
One of the most noteworthy components of V6 is its text-drawing ability. While it’s not the focal point of the model—the team says it’s still a “minor” feature—this capability puts MidJourney in direct competition with other leading models like Dall-E 3 and Ideogram. However, MidJourney’s approach to text generation is unique.
Describing it as “minor text drawing ability,” Midjourney says. “You must write your text in ‘quotations’ and –style raw or lower –stylize values may help.”
Decrypt was able to test the model and compare it to Dall-E 3, known for its accuracy in text generation. MidJourney appears to prioritize style and aesthetics, sometimes at the cost of text precision. Most of the time it generated either inaccurate or no text. But when it did, the images were on par or even better than the ones generated by Dall-E 3, the text-to-image AI model powering ChatGPT and Microsoft Bing.
Comparing the text generations from MidJourney, Dall-E 3, SDXL with Harrlogos and Ideogram AI, one oversimplified recommendation could be to use MidJourney if aesthetics is a priority, Dall-E 3 for ease of use and cartoon digital art aesthetics, SDXL for those with advanced knowledge of A1111, and Ideogram AI for results in which the text is more important than the aesthetics.
MidJourney and Dalle-3 with ChatGPT currently cost money, where SDXL and Ideogram AI are free. Bing’s version of Dall-E 3 is free to use but it only generates square images and people can only modify prompts instead of the natural conversation approach taken by OpenAI.
MidJourney V6 is also a bit slower and more expensive than v5, however the team emphasizes its focus on speeding the model up with time. The V6 model also boasts improved upscalers in ‘subtle’ and ‘creative’ modes, enhancing image resolution by 2x.
These features, coupled with a diverse range of supported arguments like –ar (to change the resolution), –chaos (to change the variations among generations), and –stylize (to change how creative the model is), offer users a broad spectrum of creative possibilities. However, other features like inpainting, outpainting and image description are not yet available. They should come in an update next month, according to MidJourney.
The announcement calls for users to employ these “incredible powers with joy, wonder, responsibility, and respect,” which has always been part of MidJourney’s ethos. But don’t get too excited as they will be more strict with censoring.
“Don’t be a jerk or create images to cause drama,” the announcement reads. Chances are, that blocks attempts to create digital waifus or political deepfakes.
Edited by Ryan Ozawa.
Stay on top of crypto news, get daily updates in your inbox.
Source: https://decrypt.co/210637/midjourney-v6-base-model-upgrade-text-generation