We recently started having problems were some streams get progressively larger delays as they run, like 15 minutes or more. This appears to be server-side, because whenever the broadcaster stops the stream, any remaining time is lost and not recorded to the file.
It might be related to a recent server upgrade, and so there might be a setting somewhere that affects it, or maybe some glitch in the AWS software or hardware, or maybe bandwidth limitations.
For a long time, over 6 months, we were using an Extra Large EC2 instance with Wowza 2 and DevPay for our primary server. This is the one that takes the inbound streams from broadcasters. We've removed all other tasks from the server, other than taking the stream, doing the processing on it, and sending it over to our origin server that handles the load balancing.
We were getting very high CPU usage on Friday night, when we have around 50 streams coming in, so we decided to upgrade it to a High-CPU server. The process for doing this was the same as normal, where we started up the exact same AMI that we had been using, upgraded to the patch 7 by copying the files from it into the folders, and adding in our custom Java modules and some file-uploading scripts.
But, this last weekend, we found that certain streams were having the huge delays, but not all. From initial glance, it was the ones that were using Tricaster or the VP6 codec for sending. A lot of our broadcasts were perfectly fine, and those were sent using Flash Media Live Encoder, with reasonable settings, H.264 and AAC, and the adjustment for "Drop Frames" enabled, so that if it gets behind it adjusts by skipping frames. As far as we know, there's no option for dropping frames or making other adjustments in time for Tricaster.
This delay appeared to be server-side, because refreshing the video player did not catch up to the current time, like we see when the viewer has limited internet. It would only catch up if the broadcaster disconnected and started streaming again, though the time in that delay would not be present. However, one user said that the iOS (Cupertino streaming) was actually caught up, other than the normal 30 second delay in HLS.
Possible problems and thoughts:
On Friday nights, we have a large number of games being broadcasted at the same time. According to our Scoutapp monitoring, during this time, we had an average of 2,395 KB/s in, and 9,139 KB/s out, with a max of 2,694 in, and 11,591 out. If the inbound bandwidth was being limited, it might explain why the events with "Drop Frames" enabled looked ok (if that was the actual factor), though they were stuttering a bit during the live broadcast. The contradictory data is that we had two recordings of a game on Saturday that got 10 minutes cut off on each half. This could be related to the internet where they were streaming, or some other cause, though.
Disk IO limitation.
We record two versions of our files, one full recording of the broadcast in flv, and five minute chunks in mp4. My first thought was that it was getting delayed because Wowza couldn't write to the disk fast enough, or there was some problem with writing to our mounted EBS drive. I haven't ruled this out, but it's looking less likely. We also pull in VOD files to slip into live streams, and we had at least one game that wasn't able to play it even though it was cached on the server already.
With the server transition, there could be some settings that didn't get set right, or should be changed for the new server. This is the one I'm unsure about, so I'm pasting in what I have for it.
Streams.xml (unchanged by us, I believe):
VHost.xml (changed the HTTPProviders):
Our /live Application.xml (includes our custom module and properties):
These are the three explanations that I'm working with right now, but there could be others. Any help on this matter would be appreciated.