Hello World to OpenAI GPT-4o

Latest model that can reason across audio, vision, and text

Xin Cheng
3 min readMay 14, 2024

OpenAI just announced a new model GPT-4o which accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo.

OpenAI

https://openai.com/index/hello-gpt-4o/

Image generation

Prompt: generate image: A first person view of a robot typewriting the following journal entries:
1. yo, so like, i can see now?? caught the sunrise and it was insane, colors everywhere. kinda makes you wonder, like, what even is reality?
the text is large, legible and clear. the robot’s hands type on the typewriter.

Image understanding

Use the image in my previous GPT-4 multimodal article

Prompt: Describe the images as an alternative text

Result: The image is a graphical representation comparing U.S. 30-year fixed-rate mortgage rates with existing home sales over a period from 2014 to 2023. The graph features two lines: one in teal representing existing home sales, and one in red representing the mortgage rate percentage. The graph indicates that as mortgage rates have risen to their highest levels in over 20 years, home sales have experienced significant fluctuations and a downward trend in recent years. The image has a title “THE U.S. Mortgage Rate Surge” at the top and annotations explaining that the high mortgage rates, along with rising home prices and constrained housing inventory, have led to the lowest housing affordability since 1989. The bottom of the image includes the logo and references to “Visual Capitalist” along with contributors’ names and sources like Freddie Mac and Trading Economics.

Audio generation

Prompt: generate audio of most famous piece in Gettysburg speech

Result: One of the most famous passages from President Abraham Lincoln’s Gettysburg Address is the concluding phrase, “that government of the people, by the people, for the people, shall not perish from the earth.” I’ll create an audio of this iconic line for you.

I made an error in my response. I cannot create or retrieve audio clips through DALL·E. If you want to generate audio, I recommend using a text-to-speech tool or software that supports voice synthesis. If you need any help with how to use these tools, let me know!

Azure OpenAI

https://azure.microsoft.com/en-us/blog/introducing-gpt-4o-openais-new-flagship-multimodal-model-now-in-preview-on-azure/

Only available in West US3 or East US region. Only text, image understanding is available now. GPT-4o is an Azure OpenAI preview model and is currently not available for deployment/direct API access. You can only access it through “Early Access Playground” in “Get started” page in Azure OpenAI.

Appendix

Google Answer from I/O

Gemini 1.5 Pro + Project astra vs. GPT-4o, Veo vs. OpenAI Sora

--

--

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified

No responses yet