The Merger of Natural Language Processing Technology and Computer Vision
The joint launch of ChatGPT and GPT-Vision marks a major breakthrough in the field of artificial intelligence. This fusion of natural language processing technology and computer vision opens new perspectives and offers varied and profound applications. Discover how these technologies are transforming the way we interact with visual and textual data.
Exploring Applications
The synergy between ChatGPT and GPT-Vision unlocks new features. Here are some captivating examples that illustrate the diversity of possible applications.
- Modeling from an image
A simple image can be transformed into an impressive 3D model using these technologies, as shown in this example:
ChatGPT Vision starting to write Gcode (for a Haas) from prints pic.twitter.com/IgXeMEAS8e
— Aaron Slodov (@aphysicist) October 10, 2023
- Personalized strength training program according to your equipment
Thanks to ChatGPT Vision, it is possible to obtain a tailor-made strength training program based on your available equipment, as shown in this example:
ChatGPT Vision turned a picture of my home gym equipment into a full 8-week workout program.
This is better than 99% of any programs I’ve ever bought. pic.twitter.com/ToACYgzTyf
— Rowan Cheung (@rowancheung) October 11, 2023
You can also find other program ideas here:
ChatGPT Vision:
Fitness plan ideas based on limited equipment.
Adjust prompt, if you see mistakes in the recognition. pic.twitter.com/LslHBeDFlX
— Borriss (@_Borriss_) October 12, 2023
- Analysis and decoding of blurred documents
Thanks to ChatGPT-4V Multimodal, it is possible to reveal the secrets of a blurred document through in-depth analysis, as shown in this example:
ChatGPT-4V Multimodal decodes a Redacted government document on a UFO sighting released by NASA.
I have tested this on 100s of redacted documents and I can say we are in a new world. pic.twitter.com/aCKOm577TO
— Brian Roemmele (@BrianRoemmele) October 6, 2023
- Converting photos to text for a complex letter
These technologies make it possible to transform an image of a letter into editable text, as shown in this example:
???? ChatGPT Vision is fk’in nuts lol pic.twitter.com/Ccsl7tFgkD
– to fart! ???? (@pwang_szn) October 4, 2023
- Retrieving complex objects in an image
The technology makes it possible to identify and recover complex objects in an image, as shown in this example:
Power of ChatGPT vision capability ???? pic.twitter.com/cr1izVP9df
— Kashan Ahmed???????????? (@KashanAhmed) October 6, 2023
- Detection of images from Google Street View or satellites
Thanks to ChatGPT Vision, it is possible to precisely detect images from Google Street View or satellites, as shown in this example:
ChatGPT Vision pic.twitter.com/X619nlCdBW
— Anu Aakash (@anukaakash) October 11, 2023
- Detailed analysis of an x-ray
Thanks to ChatGPT, it is possible to quickly and accurately analyze an x-ray, as shown in this example:
ChatGPT: The doctor in your pocket ????
ChatGPT can now look at X-rays, prescriptions, or medical reports and answer any question in a matter of seconds.
Future of health talk – simple, snappy, and AI! pic.twitter.com/nXgEfEvEsn
— Shubham Saboo (@Saboo_Shubham_) October 6, 2023
- Complex image analysis
Dive into analyzing a highly complex image using these technologies, as shown in this example:
ChatGPT-4V Multimodal please decode this.
Thank you. pic.twitter.com/seOuma96QO
— Brian Roemmele (@BrianRoemmele) October 2, 2023
- Creation of scenarios from the analysis of several images
Using these technologies, four separate images can be transformed into a coherent storyline, as shown in this example:
I gave GPT-4V four “movie stills” I generated with Midjourney and asked it to construct a plotline tying them together.
A good example of how AI is more “creative” and surprising when given constraints, much like humans. Its not as creative as the best people, but interesting. pic.twitter.com/tzYJmMChsn
— Ethan Mollick (@emollick) October 2, 2023
- Analysis of a car engine
Thanks to ChatGPT, it is possible to thoroughly analyze a car engine. However, it is recommended to consult a professional for any repair:
6. Car Maintenance
Prompt: “Analyze the issue shown in this car photo, explain likely causes, and provide actionable DIY repairs or professional servicing recommendations.” pic.twitter.com/mSfUTp0j5n
— Bryan Marley (@_bryanmarley) October 9, 2023
- Code optimization
ChatGPT can also be used to optimize code, offering suggestions for improving performance, efficiency and compliance with best practices, as shown in this example:
8. Code Optimization
Prompt: “Analyze this code and suggest ways to improve performance, efficiency, conciseness, and adherence to best practices.” pic.twitter.com/4leeDoVf53
— Bryan Marley (@_bryanmarley) October 9, 2023
Notable Limitations
Despite the progress made, certain limitations must be taken into account. It is important to note that reading QR Codes and sharing conversations currently remains impossible with these technologies.
If you don’t see new features, you may need to refresh the page or log out/log back in. If the problem persists, you can try clearing the cache related to openai.com.
Here is a screenshot showing one of the user interfaces for these new features:
GPT-Vision video
I would like to credit Emile Dev’s YouTube channel (follow to stay informed on artificial intelligence news) which inspired this article. Here is the presentation video: