Technology

The Gambit of Stability AI for the Future of Video Is Stable Video Diffusion

Published

5 months ago

November 23, 2023

Komal

Stability AI recently announced the release of Stable Video Diffusion, a text-to-video tool that aims to carve out a chunk of the nascent generative video space, following the successful launch of a text-to-image model, the controversial launch of a text-to-music model, and the largely unnoticed launch of a text generation model.

Stability AI describes the model as “Stable Video Diffusion [is] a latent video diffusion model for high-resolution state-of-the-art text-to-video and image-to-video generation,” and further states in the official announcement that the model “Spanning across modalities including image, language, audio, 3D, and code, our portfolio is a testament to Stability AI’s dedication to amplifying human intelligence.”

This flexibility opens up a world of possibilities in advertising, education, and entertainment when combined with open-source technology. Researchers claim that Stable Video Diffusion can ”outperform image-based methods at a fraction of their compute budget,” It is currently accessible in a research preview.

The technical capabilities of Stable Video Diffusion are very impressive. “Human preference studies reveal that the resulting model outperforms state-of-the-art image-to-video models,” according to the study. Stability asserts that its model outperforms closed models in user preference studies, demonstrating its evident confidence in the model’s ability to convert static images into dynamic video content.

Under the general heading of Stable Video Diffusion, Stability AI has created two models: SVD and SVD-XT. While SVD-XT extends to 24 frames using the same architecture, the SVD model converts still images into 576 x 1024 videos in 14 frames. Both models are at the forefront of open-source text-to-video technology, with the ability to generate videos at frame rates varying from three to thirty frames per second.

Stable Video Diffusion faces competition from cutting-edge models such as those created by Pika Labs, Runway, and Meta in the quickly developing field of artificial intelligence video generation. Though currently limited to 512×512 pixel resolution videos, the latter’s recently announced Emu Video, which is similar in its text-to-video capability, exhibits significant potential with its unique approach to image editing and video creation.

Stability AI is overcoming obstacles despite its technological accomplishments, such as moral dilemmas with copyrighted data used for AI training. The model is “not intended for real-world or commercial applications at this stage,” the company emphasizes, with a focus on improving it in response to community feedback and safety concerns.

Based on the popularity of the most potent open-source image generation models, SD 1.5 and SDX, this new entry into the video generation space suggests a future in which the boundaries between the imagined and the real are not only blurred but elegantly redrawn.

Related Topics:AIInnovation GenerativeVideo OpenSourceAI StabilityAI StableVideoDiffusion TextToVideo VideoGeneration

Up Next

An introduction to AI content creation

Don't Miss

For a limited-edition Galaxy Z Flip 5, Samsung and Maison Margiela collaborate once more

Komal

Technology

AI Features of the Google Pixel 8a Leaked before the Device’s Planned Release

Published

1 day ago

April 27, 2024

Kajal Chavan

A new smartphone from Google is anticipated to be unveiled during its May 14–15 I/O conference. The forthcoming device, dubbed Pixel 8a, will be a more subdued version of the Pixel 8. Despite being frequently spotted online, the smartphone has not yet received any official announcements from the company. A promotional video that was leaked is showcasing the AI features of the Pixel 8a, just weeks before its much-anticipated release. Furthermore, internet leaks have disclosed software support and special features.

Tipster Steve Hemmerstoffer obtained a promotional video for the Pixel 8a through MySmartPrice. The forthcoming smartphone is anticipated to include certain Pixel-only features, some of which are demonstrated in the video. As per the video, the Pixel 8a will support Google’s Best Take feature, which substitutes faces from multiple group photos or burst photos to “replace” faces that have their eyes closed or display undesirable expressions.

There will be support for Circle to Search on the Pixel 8a, a feature that is presently present on some Pixel and Samsung Galaxy smartphones. Additionally, the leaked video implies that the smartphone will come equipped with Google’s Audio Magic Eraser, an artificial intelligence (AI) tool for eliminating unwanted background noise from recorded videos. In addition, as shown in the video, the Pixel 8a will support live translation during voice calls.

The phone will have “seven years of security updates” and the Tensor G3 chip, according to the leaked teasers. It’s unclear, though, if the phone will get the same amount of Android OS updates as the more expensive Pixel 8 series phones that have the same processor. In the days preceding its planned May 14 launch, the company is anticipated to disclose additional information about the device.

Technology

Apple Unveils a new Artificial Intelligence Model Compatible with Laptops and Phones

Published

1 day ago

April 27, 2024

Kajal Chavan

All of the major tech companies, with the exception of Apple, have made their generative AI models available for use in commercial settings. The business is, nevertheless, actively engaged in that area. Wednesday saw the release of Open-source Efficient Language Models (OpenELM), a collection of four incredibly compact language models—the Hugging Face model library—by its researchers. According to the company, OpenELM works incredibly well for text-related tasks like composing emails. The models are now ready for development and the company has maintained them as open source.

In comparison to models from other tech giants like Microsoft and Google, the model is extremely small, as previously mentioned. 270 million, 450 million, 1.1 billion, and 3 billion parameters are present in Apple’s latest models. On the other hand, Google’s Gemma model has 2 billion parameters, whereas Microsoft’s Phi-3 model has 3.8 billion. Minimal versions are compatible with phones and laptops and require less power to operate.

Apple CEO Tim Cook made a hint in February about the impending release of generative AI features on Apple products. He said that Apple has been working on this project for a long time. About the details of the AI features, there is, however, no more information available.

Apple, meanwhile, has declared that it will hold a press conference to introduce a few new items this month. Media invites to the “special Apple Event” on May 7 at 7 AM PT (7:30 PM IST) have already begun to arrive from the company. The invite’s image, which shows an Apple Pencil, suggests that the event will primarily focus on iPads.

It seems that Apple will host the event entirely online, following in the footsteps of October’s “Scary Fast” event. It is implied in every invitation that Apple has sent out that viewers will be able to watch the event online. Invitations for a live event have not yet been distributed.
Apple has released other AI models before this one. The business previously released the MGIE image editing model, which enables users to edit photos using prompts.

Technology

Google Expands the Availability of AI Support with Gemini AI to Android 10 and 11

Published

1 day ago

April 27, 2024

Kajal Chavan

Android 10 and 11 are now compatible with Google’s Gemini AI, which was previously limited to Android 12 and above. As noted by 9to5google, this modification greatly expands the pool of users who can take advantage of AI-powered support for their tablets and smartphones.

Due to a recent app update, Google has lowered the minimum requirement for Gemini, which now makes its advanced AI features accessible to a wider range of users. Previously, Gemini required Android 12 or later to function. The AI assistant can now be installed and used on Android 10 devices thanks to the updated Gemini app, version v1.0.626720042, which can be downloaded from the Google Play Store.

This expansion, which shows Google’s goal to make AI technology more inclusive, was first mentioned by Sumanta Das on X and then further highlighted by Artem Russakoviskii. Only the most recent versions of Android were compatible with Gemini when it was first released earlier this year. Google’s latest update demonstrates the company’s dedication to expanding the user base for its AI technology.

Gemini is now fully operational after updating the Google app and Play Services, according to testers using Android 10 devices. Tests conducted on an Android 10 Google Pixel revealed that Gemini functions seamlessly and a user experience akin to that of more recent models.

Because users with older Android devices will now have access to the same AI capabilities as those with more recent models, the wider compatibility has important implications for them. Expanding Gemini’s support further demonstrates Google’s dedication to making advanced AI accessible to a larger segment of the Android user base.

Users of Android 10 and 11 can now access Gemini, and they can anticipate regular updates and new features. This action marks a significant turning point in Google’s AI development and opens the door for future functional and accessibility enhancements, improving everyone’s Android experience.