Wowza Video Intelligence Framework

Transform video feeds into structured, actionable signals that power real-time response.

Organizations generate more live video than teams can monitor. Wowza’s Video Intelligence Framework (VIF) connects streams to AI, turning detections into metadata, alerts, overlays, and event signals to make video part of the workflow.

Request A Demo    Download Data Sheet

Intelligent Video Monitoring That Powers Real-Time Incident Response

No Camera Left Behind

Operationalize the video infrastructure you already have

Take footage from any camera, across any protocol or codec, and use it to power intelligent video monitoring workflows without replacing hardware or redesigning your network.

  • IP cameras, IoT sensors, PTZ cameras
  • Mobile devices, drones, dashcams, bodycams 
  • RTSP, RTMP/S, SRT, WebRTC, UDP, MPEG-TS 
  • Object detection, captioning, and event logging

Deploy On Your Terms

Run AI where your video lives

VIF is built for real-world environments, including regulated, air-gapped, edge, and hybrid deployments. Cut cloud costs, meet compliance requirements, and scale video intelligence on your terms.

  • Stable ingest across on-prem, air-gapped, and hybrid networks
  • Edge AI deployments for cost optimization at scale
  • Optimized for NVIDIA GPUs
  • Custom routing via APIs, SDKs, Java modules, MCP

Total Model Flexibility

Turn detections into usable operational outputs

VIF connects to pre-trained and custom AI models to identify key objects, behaviors, or conditions, then converts them into structured outputs for alerts, logging, automation, and workflows without needing to rebuild your streaming infrastructure as models evolve.

  • Integrated with Roboflow’s RF-DETR model out of the box
  • Plug-and-play setup via Google Colab notebook with validated code
  • Generate structured JSON metadata to feed observability and logging platforms
  • Power webhook notification alerts with reliable low latency

Where Video Intelligence
Framework Lives in Your Stack

Most AI video tools sit outside the stream. VIF runs inference inside your live pipeline, bringing AI directly to the source.

Respond In Seconds, Not Minutes

Real-World Applications for Wowza’s Video Intelligence Framework

View Demos

Complete Flexibility For Any Workflow

We adapt to your architecture, not the other way around.

  • On-premises, hybrid, edge, private cloud, air-gapped
  • Supports modern + legacy cameras, all major protocols
  • Integrates with VMS, storage, analytics, IAM, and access control systems
  • Deep APIs, SDKs, Java modules, MCP for custom workflow automation
  • No lock-in; compatible with your existing ecosystem

Mission-Critical Reliability

When public safety is at stake, mistakes can’t happen.

  • 99.99% uptime across global deployments
  • Predictable latency and resilience under network pressure
  • Proven across nearly 10k mission-critical camera networks
  • SOC 2 Type II, encryption in transit/at rest, role-based access, air-gapped support
  • Implementation support, architecture reviews, performance tuning, and maintenance via Design Center

What is VIF?

Wowza’s Video Intelligence Framework (VIF) is a flexible intelligence layer for Wowza Streaming Engine that extracts and operationalizes data from video feeds. VIF takes frames from live video streams, routes them to AI models for inference, and converts the results into structured outputs that your existing operational systems can act on. It works with the cameras and infrastructure you already have, with no cloud dependency.

What types of cameras and video sources does VIF support?

VIF works with any video source that can push a stream to Wowza Streaming Engine, including IP cameras, PTZ cameras, drones, body cameras, dash cameras, mobile devices, IoT sensors, and broadcast encoders. Supported ingest protocols include RTMP, RTSP, SRT, and WebRTC. Because VIF operates at the streaming layer, it’s camera-agnostic, so there’s no need to replace existing hardware or standardize on a specific manufacturer.

Can VIF run entirely on-premises or in air-gapped environments?

Yes. VIF is fully deployable on-premises with zero cloud dependency. All inference runs locally on your hardware, and no video data needs to leave your network. This includes support for air-gapped environments common in defense, corrections, critical infrastructure, and government deployments. Docker images can be pre-loaded, and all five output channels remain operational without internet connectivity.

What AI models does VIF support?

VIF ships with support for RF-DETR, a transformer-based object detection model that recognizes 80 common object classes (people, vehicles, equipment, etc.) with inference latency starting at 10 milliseconds. It also includes an experimental scene analysis capability using a CLIP-based model for natural-language scene matching. Beyond the built-in models, VIF also supports custom-trained RF-DETR models. You can train a model on your own dataset and deploy it alongside the default model without changing your pipeline configuration.

What operational and monitoring platforms can VIF integrate with?

VIF connects to external platforms via standard webhooks and in-band metadata without requiring any proprietary connectors. Confirmed integrations include Datadog, Splunk, Elastic, PagerDuty, Opsgenie, and ServiceNow. VIF can also deliver alerts to traffic management systems, VMS platforms, SIEM tools, and any system that accepts HTTPS webhook payloads. The JSON event payload includes timestamp, object class, confidence score, bounding box coordinates, and tracking ID.

Can I train a model to detect objects specific to my environment?

Yes. Wowza provides a guided training workflow using Google Colab notebooks. You supply a labeled dataset (your own images or publicly available datasets via Roboflow), configure training parameters, and the notebook produces a custom model weights file you can deploy directly into VIF. A typical training cycle takes roughly half a day of active effort, with approximately five to six hours of GPU training time. You don’t need a massive dataset to start. A small, clean, well-labeled set can produce strong initial results using transfer learning from the pre-trained base model. It’s important to note that model performance is entirely dependent on the quality and quantity of your training data. Wowza provides the tooling and infrastructure for training and inference, but the accuracy of custom model outputs is the customer’s responsibility.

What happens after VIF detects an object or event?

Detection results are routed simultaneously across five independent output channels: timed metadata embedded in the video stream (ID3 tags), visual bounding-box overlays burned into a companion output stream, structured JSON log files written locally, webhook alerts dispatched to external systems via HTTP POST, and a custom Java listener API for building tailored integrations inside Wowza Streaming Engine. This means a single detection can simultaneously trigger an operator alert in your incident management platform, annotate the live video feed, and log the event for compliance without any additional configuration per channel.

How does VIF handle false positives?

VIF provides per-stream tuning controls for managing false positive rates. You can adjust confidence thresholds, set minimum consecutive frames before an object is tracked, configure tracking persistence, and control how detection events are batched or aggregated before delivery. These settings are accessible through the built-in management dashboard or via REST API and can be changed at runtime without restarting the streaming engine. This allows operators to calibrate sensitivity per camera based on real-world conditions. A high-traffic freeway camera and a fixed perimeter camera can run at different thresholds simultaneously.

Is Wowza responsible for the accuracy of detection results?

Wowza provides the inference infrastructure, the training tooling, and a reference detection model, but the accuracy of any AI model is determined by the data it was trained on, the conditions it operates in, and how it has been tuned for a given environment. For custom-trained models, the customer owns the training data, controls the training process, and is responsible for validating that the model performs at an acceptable level before putting it into production. Wowza is not responsible for the accuracy or outputs of customer-trained models. We recommend running the built-in validation and inference test steps in the training notebook against real-world footage from your environment before deploying any custom model into a live workflow. Use VIF’s per-stream confidence thresholds and false positive controls to calibrate detection sensitivity for your operational requirements.

What happens to the video frames after inference?

VIF processes frames in memory during inference and does not retain video frames by default. The system produces structured event metadata (timestamp, camera ID, detection type, confidence score, and bounding box coordinates) which can be logged and audited independently. This architecture supports environments with strict data retention requirements where frame storage creates legal or compliance exposure.

Does VIF require a GPU? What are the hardware requirements?

Yes, VIF requires an NVIDIA GPU with CUDA 12.8+ and Turing architecture (SM 7.5) or newer. The minimum validated configuration is an NVIDIA T4 (16 GB), with an L4 (24 GB) recommended for production workloads. A baseline deployment supports approximately eight concurrent 720p streams at up to 15 frames per second. Additional hardware specs include 8 vCPU minimum (16 recommended), 32 GB RAM, and 32 GB of free storage (NVMe recommended).

What deployment options are available?

VIF supports four deployment topologies: an all-in-one Docker container (ideal for evaluation and lower-volume production), a split-server configuration with a dedicated GPU host (for independent scaling of inference), multi-instance deployments (where multiple streaming engines share inference resources or vice versa), and fully air-gapped on-premises deployments. You can start with the all-in-one Docker approach and upgrade to split-server without changing your stream configuration.

If the inference service goes offline, does it affect my video streams?

No. VIF is non-blocking by design and operates asynchronously. If the inference service becomes unavailable, your live streams continue uninterrupted. Detection coverage pauses until the inference service is restored, but video ingest, processing, and delivery are never impacted.

Do I need to modify my existing hardware or Wowza Streaming Engine setup to use VIF?

VIF plugs directly into the Wowza Streaming Engine processing pipeline. Your existing ingest and delivery configurations remain untouched. Stream configuration for VIF, including model selection, confidence thresholds, output channels, and sampling rates, is managed through a dedicated panel in the Streaming Engine Manager or via REST API.

The same protocol and codec flexibility of Wowza Streaming Engine also applies to VIF. So, any camera or device, across any deployment, can be modernized with VIF. There are suggested minimum hardware requirements for running VIF locally, on-prem. You can view detailed technical specifications on our documentation page.

Featured Resources

FEATURED VIDEO

How To Set Up & Use VIF

MORE VIDEOS →
FEATURED DATA SHEET

Video Intelligence Framework Data Sheet

MORE DATA SHEETS →
FEATURED VIDEO

How To Train Custom Models for VIF

MORE VIDEOS →

Power your monitoring and surveillance workflows with intelligent, dependable media infrastructure.