0
Please log in or register to do it.

Google and Meta made notable artificial intelligence (AI) announcements on Thursday, unveiling new models that make significant advancements. The search giant unveiled Gemini 1.5, an updated AI model that understands long-term context across multiple modalities. Meanwhile, Meta announced the launch of its Video Joint Embedding Predictive Architecture (V-JEPA) model, a non-generative training method for advanced machine learning (ML) through visual media. Both products offer new ways to explore AI capabilities. Notably, OpenAI also launched Sora, its first text-to-video generation model, on Thursday.

Google Gemini 1.5 model details

Demis Hassabis, CEO of Google DeepMind, announced the release of Gemini 1.5 in a blog post. The latest model is built on the Transformer and Mixture of Experts (MoE) architecture. There are expected to be multiple versions, but currently only the Gemini 1.5 Pro model has been released for initial testing. Hassabis said the medium-sized multimode model can perform at a similar level to the company’s largest generative model, the Gemini 1.0 Ultra, and is available in a Gemini Advanced subscription along with the Google One AI Premium plan.

The biggest improvement in Gemini 1.5 is its ability to handle long context information. The standard Pro version comes with a 1,28,000 token context window. By comparison, Gemini 1.0 had a context window of 32,000 tokens. A token can be understood as a whole part or subsection of a word, image, video, audio, or code that acts as a building block for processing information by the foundational model. “The larger the model’s context window, the more information it can take in and process at a given prompt, making the output more consistent, relevant, and useful,” Hassabis explained.

Along with the standard Pro version, Google is also releasing a special model with a context window for up to 1 million tokens. This is available in private preview to a limited group of developers and enterprise customers. There is no dedicated platform, but you can experiment with Google’s AI Studio and Vertex AI, a cloud console tool for testing generative AI models. Google says this version can handle 1 hour of video, 11 hours of audio, a code base of over 30,000 lines or over 7,00,000 words of code at once.

Meta publicly launched V-JEPA in an X (formerly Twitter) post. Rather than a generative AI model, it is a training method that allows ML systems to understand and model the physical world by watching images. The company called it an important step toward advanced machine intelligence (AMI), the vision of Yann LeCun, one of the three “Godfathers of AI.”

Essentially, it is a predictive analytics model that learns entirely from visual media. Not only can you understand what’s happening in the video, but you can also predict what will happen next. To train it, the company claims to have used a new masking technique where parts of the video are obscured in both time and space. This means that some frames of the video are completely removed and some other frames contain black fragments, forcing the model to predict both the current frame and the next frame. According to the company, this model was able to do both efficiently. In particular, this model can predict and analyze videos up to 10 seconds long.

“For example, if the model needs to be able to distinguish between people putting down their pens, people picking up their pens, and people pretending to put down their pens but not actually doing so, V-JEPA is significantly better than previous methods. Advanced gesture recognition tasks,” Meta said in a blog post.

Currently, the V-JEPA model uses only visual data. This means that the video does not contain audio input. Meta now plans to integrate audio along with video in its ML models. Another goal of the company is to improve its long video capabilities.

Affiliate links may be generated automatically. Please see our Ethics Statement for more information.

Gwangju City opens emergency situation room in preparation for mass medical shutdown
'Big 5' majors begin mass resignation... Some submitted resignation letters and returned (comprehensive)

Reactions

0
0
0
0
0
0
Already reacted for this post.

Reactions

Your email address will not be published. Required fields are marked *