How Meta is Building AI-Enhanced Video Experiences

September 24, 2025 by Ian Zenoni

What do the recent announcements at Meta Connect 2025, including the new Meta Ray-Ban Display Smart Glasses, mean for the future of video delivery and immersive experiences?

Meta’s announcements at the Connect 2025 conference highlighted a clear strategic focus on integrating artificial intelligence to create a more seamless and high-quality user experience, especially for those with special accessibility requirements. Wowza’s recent announcements at the IBC conference share many similarities, with a common central theme of enhancing viewers’ experiences through AI-powered video enhancements.

Breaking Down Meta’s “Perceptual Superpowers”

Meta has a series of Smart Glasses with embedded cameras and sensors, but the newest and most advanced model, which was unveiled at Meta Connect 2025, is the Meta Ray-Ban Display. These smart glasses give consumers a personal AI companion that is available and accessible through a convenient heads-up display embedded within the lenses themselves. While other smart glasses Meta offers, like the Meta Ray-Bans and the Oakley Meta Vanguard also have embedded cameras to capture high-quality point-of-view footage, only the new Meta Ray-Ban Display glasses have this heads-up display.

Meta is delivering capabilities and doing research into technology that can enable what its calling “perceptual superpowers.” These are tools that tap into contextual AI to process and analyze video captured in real-time by the high-resolution cameras in their AI Smart Glasses, including the Meta Ray-Ban and new Meta Ray-Ban Display glasses. At Meta Connect this year, new research and features were announced highlighting Meta infusing these Smart Glasses with AI capabilities. Beyond an integrated AI assistant for texting, voice calls, and contextual analysis like you have with Siri or Gemini, Meta has been investing in:

Live speech-to-text transcription for real-time captions in a heads-up display
On-screen translations for English, Spanish, French, or Italian text seen in the world
Cutting-edge research into “enhanced hearing” capabilities with contextual AI

Live Transcription & Real-Time Captions in Meta Ray-Ban Display Smart Glasses

Notably, live speech-to-text transcription with on-screen caption generation are some of the key capabilities of these new smart glasses. This leverages the data captured by the embedded cameras and microphones, layering in AI analysis to isolate speech from background noise and to identify individual speakers, and displays the captions in a highly visible, yet unobtrusive, display within the lens.

Image with on screen captions — *Meta Ray-Ban Display Smart Glasses use live AI speech-to-text transcription to generate on-screen captions in real time.*

Translating Text Instantly with Meta Ray-Ban Display Smart Glasses

A unique feature of these smart glasses is their text capabilities. First, the glasses can view and analyze text. Then, they can translate the text. For instance, it can be translated into English, Spanish, French, or Italian. Afterward, the translated text appears on the heads-up display.

In addition, this lays the groundwork for more languages, including those with special characters. Furthermore, the glasses also improve accessibility. For example, the well-lit lens helps people with visual impairments. Specifically, it can assist with difficult fonts, colors, or low-contrast signs.

Image displaying translated text from French to English — *Translate text into English, Spanish, French, or Italian using the embedded cameras and AI in Meta Ray-Ban Display Smart Glasses.*

Researching Improvements for Enhanced Hearing & More Accessible Experiences

Looking to the future, Meta is investing in research and development for other features that improve and augment the experiences for its users. This includes what it is calling “enhanced hearing” capabilities that leverage AI with detailed telemetry data to draw focus and isolate desired audio. In other words, someone wearing these glasses could more easily hear their conversation in a crowded setting by dynamically filtering out background noise based on where the user’s head is pointing. Similarly, these devices could use AI to analyze audio and filter out specific sounds, such as appliances, traffic, or other background noises. This reinforces Meta’s goals of enhancing and transforming how individuals communicate globally.

How Wowza’s Intelligent Media Infrastructure Build on These New Capabilities

While Meta provides these capabilities on a more consumer level, Wowza is bringing these capabilities into the enterprise, broadcasting, streaming, and surveillance markets. Recently, at the IBC 2025 conference in Amsterdam, Wowza announced a slew of new platform capabilities aimed at enhancing the overall experience for engineers and operations professionals building video systems in their organizations. From live and VOD clipping workflows to new AI enhancements, Wowza is delivering the flexibility and control modern teams need to optimize their media infrastructure.

AI Caption and Subtitle Generation in Wowza

Real-time captions help media providers meet accessibility requirements. They also help them reach new audiences. For instance, captions can be provided in a user’s preferred language. This also helps those with hearing impairments.

To create low-latency and accurate captions, you need a good transcription workflow. You also need reliable data to train these AI models. Wowza has partnered with industry leaders like Azure, Whisper, and Verbit. They offer this functionality right out of the box. Additionally, you can “Bring Your Own Model” for specialized needs.

Learn more in our recent blog.

AI Object Detection in Wowza

Wowza is integrating leading computer vision models into mission-critical media workflows to provide customizable object detection capabilities. These advanced AI workflows in Wowza enable real-time monitoring and alerting, for surveillance and security use cases or to streamline live clipping or editing workflows. Just like with AI Caption or Subtitle Generation, Wowza allows users full “Bring Your Own Model” flexibility for specialized or custom-trained vision models for bespoke monitoring applications.

Recognize specific objects or custom logos
Monitor elements and track evolving trends
Implement security and surveillance alerts

Get a first-hand look:

Get A Demo

Custom Wowza Agents via Model Context Protocol (MCP)

Wowza has always prioritized flexibility, whether it’s how you deploy Wowza (on-prem, in the cloud, or a hybrid approach) or how you integrate Wowza into your broader technology ecosystem. That same control extends to these AI capabilities, beyond flexible support for importing custom models.

By integrating Wowza Agents with your media technology stack, you can control complex media workflows with simple natural-language prompts. Simply tell your agent what you want to accomplish and let Wowza do the heavy lifting. Perfect for lean DevOps and media operations teams.

What Perceptual Superpowers Mean for The Future of Video

Meta’s “perceptual superpowers” point to a future where video is understood as it’s captured. This is thanks to AI at the edge, accessibility by default, and real-time insights. This same approach is reshaping professional workflows.

Low-latency AI captioning makes streams more usable and compliant. Object detection provides operational awareness for live production and monitoring. Agent-driven control via Model Context Protocol (MCP) simplifies complex media workflows into simple instructions.

These patterns define the next era of streaming. It will be model-agnostic, interoperable, and deployable across cloud, on-prem, and hybrid systems. Teams that adopt this approach won’t just deliver video. They’ll deliver continuously-enriched experiences that can scale.

Learn more or try Wowza today

More Resources

More Resources

More Resources

Best-in-Class Customer Support

Help Center

More Resources

More Resources

More Resources

More Resources

Best-in-Class Customer Support

Help Center

More Resources

How Meta is Building AI-Enhanced Video Experiences

Breaking Down Meta’s “Perceptual Superpowers”

Live Transcription & Real-Time Captions in Meta Ray-Ban Display Smart Glasses

Translating Text Instantly with Meta Ray-Ban Display Smart Glasses

Researching Improvements for Enhanced Hearing & More Accessible Experiences

How Wowza’s Intelligent Media Infrastructure Build on These New Capabilities

AI Caption and Subtitle Generation in Wowza

AI Object Detection in Wowza

Custom Wowza Agents via Model Context Protocol (MCP)

What Perceptual Superpowers Mean for The Future of Video

About Ian Zenoni

FREE TRIAL

Search Wowza Resources

Subscribe

Follow Us

Categories