GOOGLE has taken significant strides in bolstering the capabilities of its Gemini AI, introducing new features aimed at enhancing its adaptability and user experience. However, despite these advancements, Gemini still lags behind OpenAI’s ChatGPT in generating photorealistic human images.
One of the standout additions is the introduction of “Gems,” customizable AI experts tailored to specific topics or goals. Available to Gemini Advanced, Business, and Enterprise users, Gems offer assistance with a wide array of tasks, from coding to career guidance.
Pre-made Gems such as a Learning Coach which helps break down complex topics, making them easier to understand; Brainstormer which gives fresh ideas for a themed party to the perfect gift for an upcoming birthday; Career Guide unlocks useful skills that develop career potential with detailed plans to refine your skills and achieve personal career goals; Writing Editor which elevates writing through clear, constructive feedback on everything from grammar to structure; Coding Partner that levels up coding skills and can help “build projects and learn as you go.”
Google also provides specialized support across different domains.
Google has also upgraded its image generation capabilities with Imagen 3, now accessible across all Gemini tiers. This powerful model offers users a greater ability to generate and customize high-quality images.
However, to ensure responsible use, Imagen 3 incorporates safeguards, including SynthID for watermarking AI-generated content. Additionally, image generation of people is restricted to Gemini Advanced, Business, and Enterprise users, with prohibitions on photorealistic identifiable individuals, depictions of minors, or excessively graphic content.
Despite the advancements, Gemini’s Imagen 3 still struggles to match the proficiency of ChatGPT’s Dall-E in generating photorealistic human images. This limitation can be attributed to several factors:
Google’s cautious approach prioritizes safety and ethical considerations, especially in the context of generating images of people. This cautiousness can result in limitations on the model’s ability to produce highly realistic human figures.
Generating photorealistic human images remains a complex technical challenge, requiring the model to capture subtle nuances in facial expressions, body language, and skin tones.
However, Google say’s Imagen 3’s people generation capability is still under development and is being gradually rolled out to users. This staged approach allows Google to collect data, refine the model, and implement additional safety measures.
Google remains committed to improving Imagen 3’s capabilities and expanding its availability. As the model is refined and safety measures are strengthened, we can anticipate that Gemini will narrow the gap with Dall-E in generating photorealistic human figures.