Technology

The Three Biggest Advancements in AI for 2023

Published

4 months ago

December 22, 2023

Komal

The Three Biggest Advancements in AI for 2023

In many ways, the year 2023 marked the beginning of people’s understanding of artificial intelligence (AI) and its potential. That was the year governments started to take AI risk seriously and the year chatbots went viral for the first time. These advancements weren’t so much new inventions as they were concepts and technologies that were coming of age after a protracted gestation period.

However, there were also a lot of fresh inventions. These are the top three from the previous year:

Differentiation

Although the term “multimodality” may sound technical, it’s important to know that it refers to an AI system’s capacity to handle a wide variety of data types, including audio, video, images, and text.

This year marked the first time that robust multimodal AI models were made available to the general public. The first of these, GPT-4 from OpenAI, let users upload images in addition to text inputs. With its ability to “see” images, GPT-4 offers up a plethora of possibilities. For instance, you could ask it to decide what to have for dinner based on a picture of what’s in your refrigerator. OpenAI released the capability for users to communicate with ChatGPT via voice and text in September.

Announced in December, Google DeepMind’s most recent model, Gemini, is also capable of processing audio and images. In a Google launch video, the model was shown using a post-it note with a line drawing to identify a duck. In the same video, Gemini came up with an image of a pink and blue plush octopus after being shown a picture of pink and blue yarn and asked what they could make. (The promotional film gave the impression that Gemini was watching moving images and reacting to voice commands in real time. However, Google stated in a blog post on its website that the video had been trimmed for brevity and that the model was being prompted with text prompts rather than audio and still images, even though the model does have

“I think the next landmark that people will think back to, and remember, is [AI systems] going much more fully multimodal,” Google DeepMind co-founder Shane Legg said on a podcast in October. “It’s early days in this transition, and when you start really digesting a lot of video and other things like that, these systems will start having a much more grounded understanding of the world.” In an interview with TIME in November, OpenAI CEO Sam Altman said multimodality in the company’s new models would be one of the key things to watch out for next year.

Multimodality offers benefits beyond making models more practical. The models can also be trained on a wealth of new data sets, including audio, video, and images, which together contain more information about the world than text can. Many of the world’s leading AI companies hold the view that these models will become more powerful or capable as a result of this new training data. It is a step toward “artificial general intelligence,” the kind of system that can equal human intellect, producing labor that is economically valuable and leading to new scientific discoveries. This is the hope held by many AI scientists.

AI under the Constitution

How to integrate AI with human values is one of the most important unsolved issues in the field. If artificial intelligence and power surpass that of humans, these systems have the potential to unleash immense damage on our species—some even predict its extinction—unless they are somehow restrained by laws that prioritize human well-being.

The method that OpenAI employed to align ChatGPT (in order to steer clear of the racist and sexist tendencies of previous models) was successful, but it necessitated a significant amount of human labor. This method is called “reinforcement learning with human feedback,” or RLHF. If the AI’s response was beneficial, safe, and adhered to OpenAI’s list of content guidelines, human raters would evaluate it and award it the computational equivalent of a dog treat. OpenAI created a reasonably safe and efficient chatbot by rewarding the AI for good behavior and punishing it for bad behavior.

However, the RLHF process’s scalability is seriously questioned due to its heavy reliance on human labor. It costs a lot. It is susceptible to the prejudices or errors committed by certain raters. The longer the list of rules, the greater the likelihood of failure. And it doesn’t seem like it will work for AI systems that get so strong that they start doing things that are incomprehensible to humans.

Constitutional AI, which was initially introduced in a December 2022 paper by researchers at the prestigious AI lab Anthropic, aims to solve these issues by utilizing the fact that AI systems are now able to comprehend natural language. The concept is very straightforward. You start by creating a “constitution” that outlines the principles you want your AI to uphold. Subsequently, the AI is trained to grade responses according to how closely they adhere to the constitution. The model is then given incentives to produce responses that receive higher scores. Reward learning from AI feedback has replaced reinforcement learning from human feedback. The Anthropic researchers stated that “these methods make it possible to control AI behavior more precisely and with far fewer human labels.” Anthropic’s 2023 response to ChatGPT, Claude, was aligned using constitutional AI. (Among the investors in Anthropic is Salesforce, whose CEO and co-chair of TIME is Marc Benioff.)

“With constitutional AI, you’re explicitly writing down the normative premises with which your model should approach the world,” Jack Clark, Anthropic’s head of policy, told TIME in August. “Then the model is training on that.” There are still problems, like the difficulty of making sure the AI has understood both the letter and the spirit of the rules, (“you’re stacking your chips on a big, opaque AI model,” Clark says,) but the technique is a promising addition to a field where new alignment strategies are few and far between.

Naturally, Constitutional AI does not address the issue of whose values AI ought to be in line with. However, Anthropic is attempting to make that decision more accessible to all. The lab conducted an experiment in October wherein it asked a representative sample of one thousand Americans to assist in selecting rules for a chatbot. The results showed that, despite some polarization, it was still possible to draft a functional constitution based on statements that the group reached a consensus on. These kinds of experiments may pave the way for a time when the general public has far more influence over AI policy than it does now, when regulations are set by a select group of Silicon Valley executives.

Text to Video

The rapidly increasing popularity of text-to-video tools is one obvious result of the billions of dollars that have been invested in AI this year. Text-to-image technologies had just begun to take shape a year ago; today, a number of businesses are able to convert sentences into moving pictures with ever-increasing precision.

One of those businesses is Runway, an AI video startup with offices in Brooklyn that aims to enable anyone to make movies. With its most recent model, Gen-2, users can perform video-to-video editing—that is, altering an already-existing video’s style in response to a text prompt, such as transforming a picture of cereal boxes on a tabletop into a nighttime cityscape.

“Our mission is to build tools for human creativity,” Runway’s CEO Cristobal Valenzuela told TIME in May. He acknowledges that this will have an impact on jobs in the creative industries, where AI tools are quickly making some forms of technical expertise obsolete, but he believes the world on the other side is worth the upheaval. “Our vision is a world where human creativity gets amplified and enhanced, and it’s less about the craft, and the budget, and the technical specifications and knowledge that you have, and more about your ideas.” (Investors in Runway include Salesforce, where TIME co-chair and owner Marc Benioff is CEO.)

Pika AI, another startup in the text-to-video space, claims to be producing millions of new videos every week. The startup, which is headed by two Stanford dropouts, debuted in April but has already raised money valued at between $200 and $300 million, according to Forbes. Free tools like Pika, aimed more at the average user than at professional filmmakers, are attempting to change the face of user-generated content. Though text-to-video tools are computationally expensive, don’t be shocked if they start charging for access once the venture capital runs out. That could happen as soon as 2024.

Up Next

According to Apple research, your iPhone may soon include some amazing AI technology

Don't Miss

How Man-made consciousness Can Customize Instruction

Komal

Technology

AI Features of the Google Pixel 8a Leaked before the Device’s Planned Release

Published

1 day ago

April 27, 2024

Kajal Chavan

A new smartphone from Google is anticipated to be unveiled during its May 14–15 I/O conference. The forthcoming device, dubbed Pixel 8a, will be a more subdued version of the Pixel 8. Despite being frequently spotted online, the smartphone has not yet received any official announcements from the company. A promotional video that was leaked is showcasing the AI features of the Pixel 8a, just weeks before its much-anticipated release. Furthermore, internet leaks have disclosed software support and special features.

Tipster Steve Hemmerstoffer obtained a promotional video for the Pixel 8a through MySmartPrice. The forthcoming smartphone is anticipated to include certain Pixel-only features, some of which are demonstrated in the video. As per the video, the Pixel 8a will support Google’s Best Take feature, which substitutes faces from multiple group photos or burst photos to “replace” faces that have their eyes closed or display undesirable expressions.

There will be support for Circle to Search on the Pixel 8a, a feature that is presently present on some Pixel and Samsung Galaxy smartphones. Additionally, the leaked video implies that the smartphone will come equipped with Google’s Audio Magic Eraser, an artificial intelligence (AI) tool for eliminating unwanted background noise from recorded videos. In addition, as shown in the video, the Pixel 8a will support live translation during voice calls.

The phone will have “seven years of security updates” and the Tensor G3 chip, according to the leaked teasers. It’s unclear, though, if the phone will get the same amount of Android OS updates as the more expensive Pixel 8 series phones that have the same processor. In the days preceding its planned May 14 launch, the company is anticipated to disclose additional information about the device.

Technology

Apple Unveils a new Artificial Intelligence Model Compatible with Laptops and Phones

Published

1 day ago

April 27, 2024

Kajal Chavan

All of the major tech companies, with the exception of Apple, have made their generative AI models available for use in commercial settings. The business is, nevertheless, actively engaged in that area. Wednesday saw the release of Open-source Efficient Language Models (OpenELM), a collection of four incredibly compact language models—the Hugging Face model library—by its researchers. According to the company, OpenELM works incredibly well for text-related tasks like composing emails. The models are now ready for development and the company has maintained them as open source.

In comparison to models from other tech giants like Microsoft and Google, the model is extremely small, as previously mentioned. 270 million, 450 million, 1.1 billion, and 3 billion parameters are present in Apple’s latest models. On the other hand, Google’s Gemma model has 2 billion parameters, whereas Microsoft’s Phi-3 model has 3.8 billion. Minimal versions are compatible with phones and laptops and require less power to operate.

Apple CEO Tim Cook made a hint in February about the impending release of generative AI features on Apple products. He said that Apple has been working on this project for a long time. About the details of the AI features, there is, however, no more information available.

Apple, meanwhile, has declared that it will hold a press conference to introduce a few new items this month. Media invites to the “special Apple Event” on May 7 at 7 AM PT (7:30 PM IST) have already begun to arrive from the company. The invite’s image, which shows an Apple Pencil, suggests that the event will primarily focus on iPads.

It seems that Apple will host the event entirely online, following in the footsteps of October’s “Scary Fast” event. It is implied in every invitation that Apple has sent out that viewers will be able to watch the event online. Invitations for a live event have not yet been distributed.
Apple has released other AI models before this one. The business previously released the MGIE image editing model, which enables users to edit photos using prompts.

Technology

Google Expands the Availability of AI Support with Gemini AI to Android 10 and 11

Published

1 day ago

April 27, 2024

Kajal Chavan

Android 10 and 11 are now compatible with Google’s Gemini AI, which was previously limited to Android 12 and above. As noted by 9to5google, this modification greatly expands the pool of users who can take advantage of AI-powered support for their tablets and smartphones.

Due to a recent app update, Google has lowered the minimum requirement for Gemini, which now makes its advanced AI features accessible to a wider range of users. Previously, Gemini required Android 12 or later to function. The AI assistant can now be installed and used on Android 10 devices thanks to the updated Gemini app, version v1.0.626720042, which can be downloaded from the Google Play Store.

This expansion, which shows Google’s goal to make AI technology more inclusive, was first mentioned by Sumanta Das on X and then further highlighted by Artem Russakoviskii. Only the most recent versions of Android were compatible with Gemini when it was first released earlier this year. Google’s latest update demonstrates the company’s dedication to expanding the user base for its AI technology.

Gemini is now fully operational after updating the Google app and Play Services, according to testers using Android 10 devices. Tests conducted on an Android 10 Google Pixel revealed that Gemini functions seamlessly and a user experience akin to that of more recent models.

Because users with older Android devices will now have access to the same AI capabilities as those with more recent models, the wider compatibility has important implications for them. Expanding Gemini’s support further demonstrates Google’s dedication to making advanced AI accessible to a larger segment of the Android user base.

Users of Android 10 and 11 can now access Gemini, and they can anticipate regular updates and new features. This action marks a significant turning point in Google’s AI development and opens the door for future functional and accessibility enhancements, improving everyone’s Android experience.