I have about 200 hours of video in 750 files that I have transcripts for, but I need to align them. I have spent pretty much the entire week trying out software after software.
There are quite a few open source projects to get it done, I am just having trouble putting them all together. Many of them tend to use a combination of SoX, HTK, and CMU Sphinx.
Kaldi (http://sourceforge.net/p/kaldi/discussion/1355348/thread/40dec03f/)
Voxforge (http://www.voxforge.org/home/dev/autoaudioseg)
https://github.com/srubin/p2fa-vislab
https://github.com/yzernik/radio-news-reader
https://github.com/netAction/transcript
I have looked at half a dozen commercial software solutions, most all of them are manual where they setup a 5 second loop where you can type out the 5 seconds and it keeps looping till you tell it to go to the next loop. CaptionMaker by Telestream is by far the best, but the basic $1,100 version can’t batch them, you have to buy the $11,000 version for that.
The other option is hiring a company to do it for me, but that is over $1/minute for a total of almost $15,000!
We have a constantly changing video library and could really use a solution that would allow us to do it in-house and wonder what other people are using. I don’t need a ton of options, I just need to be able to spit out something like this:
00:00:01:01 00:00:06:29 Sweet so start by talking about carbohydrates if you
00:00:09:27 00:00:13:13 look at the word Carbo hydrate, why did they name
00:00:13:15 00:00:14:13 them carbohydrates?
Of course, I don’t really care what format it is in, I will manipulate the data once I get it, the big thing is just getting it.