By Tat Banerjee
At VideoTranslator, we do a lot of work in what is called the Internationalization And Localization industry.
That being said, we are not transcribers, translators, or voice over artists. We use AI to do transcription, translation and synthetic (AI) dubbing.
When do we do this work for our clients? Generally when a client needs a managed service. This happens when a client is looking to try our tech and/or better understand the value proposition, or has their own reasons for us to provide this service.
When this happens, the first task on our side is simple old transcription. That is what we are going to look at today — how do to a simple transcription.
How long does transcription generally take?
Google is our friend here. From the team at Opal Transcription Services, “The industry is four hours of transcription time for one hour of clear audio, or a 4:1 ratio — that is, one hour of transcription time for a 15-minute long recording.”
This is a pretty good way of thinking about it. Generally it will take about 4x the time of the video content.
Can you really do transcription at 3x faster?
Maybe. We think so, but there are caveats. This is how we did out testing. For this demonstration, we will use our standard Your Money: Peter Switzer video.
Your Money was a short lived channel, but Peter Switzer has a very distinctive Australian accent, so we use this clip as a standardised test bed for a a number of different processes internally.
The below is how we tested our hunch.
Step 1: Create a new item and upload the video
Step 1 is the same every time; select the relevant template and upload the video.
Once the new item opens, upload the video — once uploaded it looks like below.
Step 2: Use Action -> Transcribe to transcribe your video
Click on Action -> Transcribe to use the AI to transcribe your content. We used Australian English here.
Depending on your file size, this can take time. The Your Money video is about 5 mb, and takes milliseconds. Basically, the bigger your video the longer it will take.
Step 3: Clean Up The Transcription — Post Editing
This is where the majority of the work takes place. Here is what you need to do:
- Scroll down and the video will pop out (Picture-In-Picture) — This is point 3 in the image above. The text in yellow is a projection, so you can change the colour (point 1) for ease of transcription.
- Edit times and text — The editor works in real time (point 2), make changes as you go. Realistically, simply hit play and edit to your heart’s content.
- Download the SRT or copy paste the content — Depending on what you are doing, you will either (a) add open captions and download the video, (b) download the SRT file, or (c) access the captions directly.
We recently added the ability to directly access the captions without the time stamps – option (c) above. Click the button highlighted in the below image to use this functionality.
Did We Get To A 3x Efficiency Gain?
The above was not super scientific but we did get to the 3x gain. Effectively, the above means that the time it takes to transcribe the video is a little bit more than the video length itself.
This is due to stopping and starting the PIP while you correct the captions. We found, for a 15 minute video, transcribing takes us about 20 minutes in total give or take. To be precise, 18 and a bit, but this is dependent on how good the original AI produced transcript is, which in turn is dependent on the kind of content in the video.
Assuming 20 minutes though, given some video content will be faster and some will be slower, we get an improvement from 60 minutes -> 20 minutes, giving us the 3x improvement.
Using the original 4:1 ratio, we think the average video of 15 minutes will take 20 minutes to transcribe, as opposed to 60 minutes for a fully human transcription. This is how we got to the 3x improvement.
Curious about how we could help your business? Check out our managed service, or try our app for free! Alternately drop us an email at email@example.com.