Load balance NVIDIA accelerated transcoding across GPUs with the Wowza Streaming Engine Java API

This article describes how to use the ITranscoderVideoLoadBalancer interface in the Wowza Streaming Engine Transcoder to balance NVIDIA-accelerated transcoding across multiple GPUs. Information applies to both the legacy NVIDIA and EVA transcoding pipelines. Load balancing is performed at the server level.

Notes:
  • Wowza Streaming Engine 4.8.0 has a known issue with GPU performance when using NVIDIA hardware-accelerated decoding. For more information, see Wowza Streaming Engine 4.8.0 Release Notes. To address this issue, use Wowza Streaming Engine 4.8.5 or later.
  • Wowza Streaming Engine™ 4.6.0 or later is required.

The ITranscoderVideoLoadBalancer interface is invoked during transcoder execution:

public interface ITranscoderVideoLoadBalancer
{
	public abstract void init(IServer server, TranscoderContextServer transcoderContextServer);
	public abstract void onHardwareInspection(TranscoderContextServer transcoderContextServer);
	public abstract void onTranscoderSessionCreate(LiveStreamTranscoder liveStreamTranscoder);
	public abstract void onTranscoderSessionInit(LiveStreamTranscoder liveStreamTranscoder);
	public abstract void onTranscoderSessionDestroy(LiveStreamTranscoder liveStreamTranscoder);
	public abstract void onTranscoderSessionLoadBalance(LiveStreamTranscoder liveStreamTranscoder);
}

Where:

  • init is invoked when the server first starts.
  • onHardwareInspection is invoked when the Transcoder is started, and the server hardware is inspected to determine whether accelerated hardware resources are available.
  • onTranscoderSessionCreate is invoked when the transcoder session is created.
  • onTranscoderSessionInit is invoked after the transcoder session is fully initialized and the transcoder template has been read.
  • onTranscoderSessionDestroy is invoked when the transcoder session is destroyed.
  • onTranscoderSessionLoadBalance is invoked when the transcoder session is in the state just before the decoder, scaler, and encoder are initialized.

Configure the ITranscodeVideoLoadBalancer interface

To use the ITranscoderVideoLoadBalancer interface, you must do the following:

  1. Create a custom class that extends the TranscoderVideoLoadBalancerBase (which inherits the ITranscoderVideoLoadBalancer interface described above) and overrides the methods to do the load balancing. For more information, see Example class - TranscoderVideoLoadBalancerCUDASimple.
     
  2. Then add the following server-level property that points to your custom implementation to the [install-dir]/conf/Server.xml file:
    <Property>
    	<Name>transcoderVideoLoadBalancerClass</Name>
    	<Value>[custom-class-path]</Value>
    </Property>

    Where the [custom-class-path] is the path pointing to your custom implementation. For example, if you're using the built-in TranscoderVideoLoadBalancerCUDASimple example class, below, you'd add the following:
    <Property>
    	<Name>transcoderVideoLoadBalancerClass</Name>
    	<Value>com.wowza.wms.transcoder.model.TranscoderVideoLoadBalancerCUDASimple</Value>
    </Property>
  3. (Optional) If you're load balancing against GPUs with unequal capabilities, you can use the transcoderVideoLoadBalancerCUDASimpleGPUWeights property to specify weight factors for each card based on their capabilities relative to the highest-performing card. See Load balancing across unequal GPUs.

Example class - TranscoderVideoLoadBalancerCUDASimple

The TranscoderVideoLoadBalancerCUDASimple class is built into Wowza Streaming Engine (4.5.0.01 and later). It can be used without any additional coding to do fairly simplistic load balancing of complete transcoder sessions across multiple GPUs. It load-balances complete sessions because this is required for accelerated scaling (decoding, scaling, and encoding must be done on the same GPU so that frame data can stay on the device). update here

import java.util.List;
import java.util.Map;

import com.wowza.util.JSON;
import com.wowza.wms.application.WMSProperties;
import com.wowza.wms.logging.WMSLoggerFactory;
import com.wowza.wms.media.model.MediaCodecInfoVideo;
import com.wowza.wms.server.IServer;

public class TranscoderVideoLoadBalancerCUDASimple extends TranscoderVideoLoadBalancerBase
{
  private static final Class<TranscoderVideoLoadBalancerCUDASimple> CLASS = TranscoderVideoLoadBalancerCUDASimple.class;
  private static final String CLASSNAME = "TranscoderVideoLoadBalancerCUDASimple";
  
  public static final int DEFAULT_GPU_WEIGHT_SCALE = 1;
  public static final int DEFAULT_WEIGHT_FACTOR_ENCODE = 5;
  public static final int DEFAULT_WEIGHT_FACTOR_DECODE = 1;
  public static final int DEFAULT_WEIGHT_FACTOR_SCALE = 1;
  
  public static final int LOAD_MAG = 1000;
  
  public static final String PROPNAME_TRANSCODER_SESSION = "TranscoderVideoLoadBalancerCUDASimpleSessionInfo";
  
  class SessionInfo
  {
    private int gpuid = 0;
    private long load = 0;
    
    public SessionInfo(int gpuid, long load)
    {
      this.gpuid = gpuid;
      this.load = load;
    }
  }
  
  class GPUInfo
  {
    private int gpuid = 0;
    private long currentLoad = 0;
    private int weight = 0;
    
    private int getWeight()
    {
      return this.weight;
    }
    
    private long getUnWeightedLoad()
    {
      return currentLoad;
    }
    
    private long getWeightedLoad()
    {
      long load = 0;
      if (weight > 0)
        load = (currentLoad*gpuWeightScale)/weight;
      return load;
    }
  }
  
  private Object lock = new Object();
  private TranscoderContextServer transcoderContextServer = null;
  private boolean available = false;
  private int countGPU = 0;
  private int gpuWeightScale = DEFAULT_GPU_WEIGHT_SCALE;
  private int[] gpuWeights = null;
  private int weightFactorEncode = DEFAULT_WEIGHT_FACTOR_ENCODE;
  private int weightFactorDecode = DEFAULT_WEIGHT_FACTOR_DECODE;
  private int weightFactorScale = DEFAULT_WEIGHT_FACTOR_SCALE;
  private GPUInfo[] gpuInfos = null;

  @Override
  public void init(IServer server, TranscoderContextServer transcoderContextServer)
  {
    this.transcoderContextServer = transcoderContextServer;
    
    WMSProperties props = server.getProperties();
    
    this.weightFactorEncode = props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorEncode", this.weightFactorEncode);
    this.weightFactorDecode = props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorDecode", this.weightFactorDecode);
    this.weightFactorScale = props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorScale", this.weightFactorScale);

    String weightsStr = props.getPropertyStr("transcoderVideoLoadBalancerCUDASimpleGPUWeights", null);
    if (weightsStr != null)
    {
      String[] values = weightsStr.split(",");
      int maxWeight = 0;
      this.gpuWeights = new int[values.length];
      for(int i=0;i<values.length;i++)
      {
        String value = values[i].trim();
        if (value.length() <= 0)
        {
          this.gpuWeights[i] = -1;
          continue;
        }

        int weight = -1;
        try
        {
          weight = Integer.parseInt(value);
          if (weight < 0)
            weight = 0;
        }
        catch(Exception e)
        {
        }
        
        this.gpuWeights[i] = weight;
        if (weight > maxWeight)
          maxWeight = weight;
      }
      
      this.gpuWeightScale = maxWeight;
      for(int i=0;i<this.gpuWeights.length;i++)
      {
        if (this.gpuWeights[i] < 0)
          this.gpuWeights[i] = this.gpuWeightScale;
      }
    }
    
    WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".init: weightFactorEncode:"+weightFactorEncode+" weightFactorDecode:"+weightFactorDecode+" weightFactorScale:"+weightFactorScale);
  }

  @Override
  public void onTranscoderSessionCreate(LiveStreamTranscoder liveStreamTranscoder)
  {
  }

  @Override
  public void onTranscoderSessionInit(LiveStreamTranscoder liveStreamTranscoder)
  {
  }

  @Override
  public void onTranscoderSessionDestroy(LiveStreamTranscoder liveStreamTranscoder)
  {    
    if (this.countGPU > 1)
    {
      WMSProperties props = liveStreamTranscoder.getProperties();
      Object sessionInfoObj = props.get(PROPNAME_TRANSCODER_SESSION);
      if (sessionInfoObj != null && sessionInfoObj instanceof SessionInfo)
      {
        SessionInfo sessionInfo = (SessionInfo)sessionInfoObj;
        
        if (sessionInfo.gpuid < gpuInfos.length)
        {
          synchronized(this.lock)
          {
            gpuInfos[sessionInfo.gpuid].currentLoad -= sessionInfo.load;
            sessionInfo.load = 0;
          }
          
          WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionDestroy["+liveStreamTranscoder.getContextStr()+"]: Removing GPU session: gpuid:"+sessionInfo.gpuid+" load:"+sessionInfo.load);
        }
      }
    }
  }
  
  @Override
  public void onHardwareInspection(TranscoderContextServer transcoderContextServer)
  {    
    //{"infoCUDA":{"availabe":true,"availableFlags":65651,"countGPU":1,"driverVersion":368.81,"cudaVersion":8000,"isCUDAOldH264WindowsAvailable":false,"gpuInfo":[{"name":"GeForce GTX 960M","versionMajor":5,"versionMinor":0,"clockRate":1097500,"multiprocessorCount":5,"totalMemory":2147483648,"coreCount":640,"isCUDANVCUVIDAvailable":true,"isCUDAH264EncodeAvailable":true,"isCUDAH265EncodeAvailable":false,"getCUDANVENCVersion":5}]},"infoQuickSync":{"availabe":true,"availableFlags":537,"versionMajor":1,"versionMinor":19,"isQuickSyncH264EncodeAvailable":true,"isQuickSyncH265EncodeAvailable":true,"isQuickSyncVP8EncodeAvailable":false,"isQuickSyncVP9EncodeAvailable":false,"isQuickSyncH264DecodeAvailable":true,"isQuickSyncH265DecodeAvailable":false,"isQuickSyncMP2DecodeAvailable":true,"isQuickSyncVP8DecodeAvailable":false,"isQuickSyncVP9DecodeAvailable":false},"infoVAAPI":{"available":false},"infoX264":{"available":false},"infoX265":{"available":false}}
    
    boolean available = false;
    int countGPU = 0;

    String jsonStr = transcoderContextServer.getHardwareInfoJSON();
    if (jsonStr != null)
    {
      try
      {
        JSON jsonData = new JSON(jsonStr);
                
        if (jsonData != null)
        {
          Map<String, Object> entries = jsonData.getEntrys();
          
          Map<String, Object> infoCUDA = (Map<String, Object>)entries.get("infoCUDA");
          if (infoCUDA != null)
          {
            
            Object availableObj = infoCUDA.get("availabe");
            if (availableObj != null && availableObj instanceof Boolean)
            {
              available = ((Boolean)availableObj).booleanValue();
            }
            
            if (available)
            {
              Object countGPUObj = infoCUDA.get("countGPU");
              if (countGPUObj != null && countGPUObj instanceof Integer)
              {
                countGPU = ((Integer)countGPUObj).intValue();
              }
            }
          }
        }
      }
      catch(Exception e)
      {
        WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onHardwareInspection: Parsing JSON: ", e);
      }
    }
        
    WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onHardwareInspection: CUDA available:"+available+" countGPU:"+countGPU);

    synchronized(lock)
    {
      this.available = available;
      this.countGPU = countGPU;
      
      if (this.countGPU > 1)
      {
        this.gpuInfos = new GPUInfo[this.countGPU];
        for(int i=0;i<this.gpuInfos.length;i++)
        {
          this.gpuInfos[i] = new GPUInfo();
          
          this.gpuInfos[i].gpuid = i;
          
          if (this.gpuWeights != null && i < this.gpuWeights.length)
            this.gpuInfos[i].weight = this.gpuWeights[i];
          else
            this.gpuInfos[i].weight = gpuWeightScale;
        }
      }
    }
  }
  
  @Override
  public void onTranscoderSessionLoadBalance(LiveStreamTranscoder liveStreamTranscoder)
  {    
    try
    {
      while(true)
      {
        if (this.gpuInfos == null)
          break;
        
        TranscoderStream transcoderStream = liveStreamTranscoder.getTranscodingStream();
        if (transcoderStream == null)
          break;
        
        TranscoderSession transcoderSession = liveStreamTranscoder.getTranscodingSession();
        if (transcoderSession == null)
          break;
        
        TranscoderSessionVideo transcoderSessionVideo = transcoderSession.getSessionVideo();
        if (transcoderSessionVideo == null)
          break;
        
        MediaCodecInfoVideo codecInfoVideo = null;
        if (transcoderSessionVideo.getCodecInfo() != null)
          codecInfoVideo = transcoderSession.getSessionVideo().getCodecInfo();

        long loadDecode = 0;
        long loadScale = 0;
        long loadEncode = 0;
        
        boolean isScalerCUDA = false;
        
        TranscoderStreamSourceVideo transcoderStreamSourceVideo = null;
        TranscoderStreamScaler transcoderStreamScaler = null;
        
        TranscoderStreamSource transcoderStreamSource = transcoderStream.getSource();
        if (transcoderStreamSource != null)
        {
          transcoderStreamSourceVideo = transcoderStreamSource.getVideo();
          if (transcoderStreamSourceVideo != null && codecInfoVideo != null && (transcoderStreamSourceVideo.isImplementationNVCUVID() || transcoderStreamSourceVideo.isImplementationCUDA() || transcoderStreamSourceVideo.isImplementationNVENCEVA()))
          {
            loadDecode = codecInfoVideo.getFrameWidth() * codecInfoVideo.getFrameHeight();
          }
          else
            transcoderStreamSourceVideo = null;
        }
        
        transcoderStreamScaler = transcoderStream.getScaler();
        if (transcoderStreamScaler != null)
        {
          isScalerCUDA = transcoderStreamScaler.isImplementationCUDA() || transcoderStreamScaler.isImplementationCUDAEVA();
        }
        
        List<TranscoderStreamDestination> destinations = transcoderStream.getDestinations();
        if (destinations == null)
          break;
        
        for(TranscoderStreamDestination destination : destinations)
        {
          if (!destination.isEnable())
            continue;

          TranscoderStreamDestinationVideo destinationVideo = destination.getVideo();
          
          if (destinationVideo == null)
            continue;
          
          if (destinationVideo.isPassThrough() || destinationVideo.isDisable())
            continue;
          
          TranscoderVideoFrameSizeHolder frameSizeHolder = destinationVideo.getFrameSizeHolder();
          if (frameSizeHolder == null)
            continue;
          
          if (isScalerCUDA)
            loadScale += frameSizeHolder.getActualWidth() * frameSizeHolder.getActualHeight();
          
          if (destinationVideo.isImplementationNVENC() || destinationVideo.isImplementationCUDA() || destinationVideo.isImplementationNVENCEVA())
            loadEncode += frameSizeHolder.getActualWidth() * frameSizeHolder.getActualHeight();
        }
        
        long totalLoad = (loadDecode*weightFactorDecode) + (loadScale*weightFactorScale) + (loadEncode*weightFactorEncode);
        if (totalLoad <= 0)
          break;
        
        totalLoad /= LOAD_MAG;
        
        if (totalLoad <= 0)
          totalLoad = 1;
        
        int gpuid = -1;
        
        synchronized(lock)
        {
          long leastLoad = Long.MAX_VALUE;
          
          for(int i=0;i<gpuInfos.length;i++)
          {
            if (gpuInfos[i].getWeightedLoad() < leastLoad)
            {
              leastLoad = gpuInfos[i].getWeightedLoad();
              gpuid = i;
            }
          }
          
          if (gpuid >= 0)
            gpuInfos[gpuid].currentLoad += totalLoad;
        }

        WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionLoadBalance["+liveStreamTranscoder.getContextStr()+"]: gpuid:"+gpuid+" load:"+totalLoad+" [decode:"+loadDecode+" scale:"+loadScale+" encode:"+loadEncode+"]");

        if (gpuid >= 0)
        {
          liveStreamTranscoder.getProperties().put(PROPNAME_TRANSCODER_SESSION, new SessionInfo(gpuid, totalLoad));

          if (transcoderStreamSourceVideo != null)
            transcoderStreamSourceVideo.setGPUID(gpuid);
          
          if (transcoderStreamScaler != null && isScalerCUDA)
            transcoderStreamScaler.setGPUID(gpuid);
          
          for(TranscoderStreamDestination destination : destinations)
          {
            if (!destination.isEnable())
              continue;

            TranscoderStreamDestinationVideo destinationVideo = destination.getVideo();
            
            if (destinationVideo == null)
              continue;
            
            if (destinationVideo.isPassThrough() || destinationVideo.isDisable())
              continue;
                                    
            if (destinationVideo.isImplementationNVENC() || destinationVideo.isImplementationCUDA() || destinationVideo.isImplementationNVENCEVA())
              destinationVideo.setGPUID(gpuid);
          }
        }
        break;
      }
    }
    catch(Exception e)
    {
      WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionLoadBalance: Parsing JSON: ", e);
    }
  }  
}

Load balancing across unequal GPUs

The built-in TranscoderVideoLoadBalancerCUDASimple class can load-balance across NVIDIA CUDA-based cards that aren't equally capable. If your CUDA cards do not have the same capability, you can specify weighting factors for each CUDA card. The weighting is specified using the Server property transcoderVideoLoadBalancerCUDASimpleGPUWeights. This property takes a comma-separated list of weight factors for each CUDA card. We recommend using a factor of 100 for the highest-performing card and lower weighting factors for lower-capacity cards. The lower weighting factors should be based on a card's capacity relative to the highest performing card. The CUDA card order is logged in the server logs on Wowza Streaming Engine startup, and the comma-separated list of weighting factors must be in the same order. For example, if you have an M5000 card (index 0) and an M2000 card (index 1) in your server, the transcoderVideoLoadBalancerCUDASimpleGPUWeights factor might be set as follows:

<Property>
	<Name>transcoderVideoLoadBalancerCUDASimpleGPUWeights</Name>
	<Value>100,66</Value>
</Property>

This indicates that the M2000 card has 66 percent of the capacity of the M5000 card.