Hello World to Multimodal GPT
Last year was the year of “GPT”. This year “Multimodal” (beyond text, e.g. vision, audio) could be another hot trend. Here is a quick way to use multimodal GPT to analyze image.
TLDR: LlamaIndex and Langchain have integration with Azure OpenAI GPT-4V for image analysis task
Steps
- Create Azure OpenAI resource
- Deploy GPT-4V model in Azure OpenAI
Code
We will setup environment variables to configure Azure OpenAI. Create a configuration file called .env.
.env
OPENAI_API_TYPE=azure
AZURE_OPENAI_ENDPOINT=https://<deployment>.openai.azure.com/
AZURE_OPENAI_API_VERSION=2023-12-01-preview
AZURE_OPENAI_API_KEY=<api key>
AZURE_OPENAI_MODEL_NAME=gpt-4-vision-preview
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4-vision-preview
Make sure your Python environment has correct packages
pip install llama-index openai, langchain, dotenv
LlamaIndex
Load .env config file value to environment variables
from dotenv import load_dotenv
load_dotenv()
import os
os.environ["OPENAI_API_VERSION"] = os.environ['AZURE_OPENAI_API_VERSION']
Load image (U.S. 30-year fixed-rate mortgage from 2014–2023)
import requests
from llama_index.schema import ImageDocument
image_url = "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg"
response = requests.get(image_url, verify=False)
if response.status_code != 200:
raise ValueError("Error: Could not retrieve image from URL.")
image_document = ImageDocument(image=base64str, image_mimetype="image/jpeg")
Create LLM
from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
azure_openai_mm_llm = AzureOpenAIMultiModal(
engine=os.environ['AZURE_OPENAI_DEPLOYMENT_NAME'],
api_version=os.environ["OPENAI_API_VERSION"],
model=os.environ['AZURE_OPENAI_MODEL_NAME'],
max_new_tokens=300,
)
Analyze image
complete_response = azure_openai_mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=[image_document],
)
print(complete_response)
Result
The image is a line graph showing the U.S. 30-year fixed-rate mortgage and existing home sales from 2014 to 2023. The mortgage rate is represented by a red line, while the home sales are represented by a blue line. The graph shows that the mortgage rate has reached its highest level in over 20 years, while home sales have fluctuated over time. There is also a note that the data is sourced from the U.S. Federal Reserve, Trading Economics, and Visual Capitalist.
Langchain
Load .env config file value to environment variables
from dotenv import load_dotenv
load_dotenv()
import os
os.environ["OPENAI_API_VERSION"] = os.environ['AZURE_OPENAI_API_VERSION']
Load image to base64 string
import base64
import requests
image_url = "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg"
response = requests.get(image_url, verify=False)
if response.status_code != 200:
raise ValueError("Error: Could not retrieve image from URL.")
base64str = base64.b64encode(response.content).decode("utf-8")
Create LLM
from langchain.chat_models import AzureChatOpenAI
from langchain.schema import HumanMessage
chat = AzureChatOpenAI(
azure_deployment=os.environ['AZURE_OPENAI_DEPLOYMENT_NAME'],
openai_api_version=os.environ["OPENAI_API_VERSION"],
max_tokens=256)
Analyze image
chat.invoke(
[
HumanMessage(
content=[
{"type": "text", "text": "Describe the images as an alternative text"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64str}",
"detail": "auto",
},
},
]
)
]
)
Result
AIMessage(content='The image is a graph titled "The U.S. Mortgage Rate Surge" showing the U.S. 30-year fixed-rate mortgage versus existing home sales from 2014 to 2023. The graph has two lines, one representing the mortgage rate (in percentage) and the other representing existing home sales (in millions). The mortgage rate line fluctuates between 2% and 8%, while the existing home sales line fluctuates between 3M and 7M. The graph indicates that in 2023, the U.S. 30-year fixed-rate mortgage has reached its highest level in over 20 years, with high mortgage rates, rising home prices, and a constrained housing inventory leading to U.S. housing affordability being at its lowest point since 1989. The source of the data is the National Association of Realtors, and the collaborators of the graph are Visual Capitalist with research and writing by Selin Oquz, art direction and design by Joyce Ma. The Visual Capitalist logo and social media icons are displayed at the bottom.')