Toronto-based start-up Ideogram released its state-of-the-art image generation model (Ideogram 1.0, https://ideogram.ai/) on the back of an $80-million seed round. There are some pretty smart people behind this tech. The cool thing? It does text pretty well. Hands too.
Back when DALL-E 2 (https://openai.com/dall-e-2) was released by Open A.I. in early 2022, it was a marvel. It was followed quickly by Midjourney (https://midjourney.com/) and Stable Diffusion (https://stability.ai/stable-image) -- all of which allowed anyone who could compose a creative prompt the freedom to splash digital ink on pixel canvas and generate incredible artwork in oh, about 30 seconds. In a remarkably short period of time, generative A.I. has opened the door to worlds previously trapped in our imaginations, led to curious questions (e.g., are these models truly "creative" in the same sense as a human artist?) and sparked inevitable controversy (namely in the form of class action copyright infringement suits by those very human artists).
But these models are far from perfect. In fact, anyone who has played with them will know them to be oddly janky. Common mishaps abound, including weird hands with far too many fingers and a complete inability to compose text on images with any accuracy.
Enter Ideogram (https://ideogram.ai/), a Toronto-based start-up with a team featuring experts from Google, UC Berkely and University of Toronto, which launched in August, 2023 [1].
Ideogram 1.0 (their public model) was released on 28 February 2024 off the back of an $80 million seed round led by Andreessen Horowitz. The model claims to make close to 2x less errors rendering words than other models, such as DALL-E 2 [2]. As you can see from the above sample, this is a pretty impressive feat. Pretty cool.
It isn't perfect, however -- see below rendering of a man supposed to be holding a delicious can of NeuralFizz -- which suggests that the model is not exactly multi-modal (i.e., it isn't representing text as text, instead the text-elements appear to be construed as visual patterns much like the rest of the image).
On the plus side, the hand's not bad. I can just about make out four fingers and a thumb.