Codecpage - Subtitle generation

Subtitle generation with Subtitle Edit and Whisper

Images to Text

DVB transmissions often have graphic subtitles that many video playing devices cannot decode. Subtitle Edit can convert them to text (*.srt) files with various options of OCR processors, automatically and almost error free. Just record DVB including subtitles and let Subtitle Edit do its job, then multiplex the .srt files into your video file using e.g. MKVtoolNix. Note that Subtitle Edit has millions more features, just discover!

Speech to Text

Subtitle Edit also offers an audio to subtile conversion based on Vosk, and also on Whisper.
Vosk is pretty fast (about 4x realtime on a good I5 CPU) but is not perfect, good results only with a clearly spoken comment.
Whisper, based on openAI, is a lot slower on the CPU, but it also runs on a GPU. With a decent GPU, like a GTX980 e.g., it can achieve up to 20x realtime speed, using the "small" language model.

Whisper in Subtitle Edit

(status September 2023; this refers to Subtitle edit 4.01)

As mentioned above, Whisper can be used with Subtitle Edit. But the only option really working with the GPU so far appears to be Const-me:

Get the latest Version of Subtitle Edit, now 4.01.
Select from the menu Video, Audio to Text , Whisper.
In the window opening up, select Const-me.
Download the Whisper model you want (small, medium, large...)

Most astonishing, here even the large model works on a 4GB GPU, and the conversion is very fast, 20 times real speed on a GTX980 with the small and still 4x with the large model. All models even run on a comparably tiny GTX730 GPU, the small model at about 2x real speed.

Subtitle Edit automatically corrects for Whisper timing errors. The latter option is not yet available separately in Subtitle Edit's batch processor (yet it doesn't currently correct for text sometimes appearing much too early and then for very long).

There also is a necessary tweak for reliability:
The latest beta of Subtitle Edit has an "Advanced" key to enter additional parameters to const-me, which is badly needed:

Enter --max-context 1 here.
Without it, many conversions may often go off rail, suddenly repeating one subtitle line for several minutes and subsequently failing to deliver good subtitles. You may experiment with values above 1, but I always had some glitches with that. I guess it's the same issue that requires --condition_on_previous_text False with the Python/GIT version of Whisper.

You may test const-me also via command prompt (see here how to get it), which displays text generated in real time, and also lists all parameters available. To do this, open a command prompt in C:\Users\(your user name)\AppData\Roaming\Subtitle Edit\Whisper\Const-me.

The quality gain for the medium and large models vs. the small one is not always obvious, but the large model knows a lot more things, which eliminates many spelling errors and sometimes produces results so good that it's almost uncanny. Sometimes it's getting weird, though, e.g., if a line saying "Copyright xxx" appears where there is not the slightest spoken text like that or any connection to xxx whatsoever... .

Maybe the larger model is better for rare languages, I would guess it is.

Post processing Whisper's srt files:

Last minute notes:

From Subtitle Edit 4.03 on, there is a "break long lines" option directly in the Whisper start window, so an additional batch processing is no more necessary for this!

Subtitle Edit 4.05 beta now has a working version of Purfview's Faster Whisper. Well, it's actually not faster than const-me, takes 1.5 times as long with the small model and a lot more with the larger ones, as it loads more on the CPU when graphics memory gets short. But it does a pretty good job with the small model, the most care-free there is, so far. Keep advanced settings just at default. More on this may come when I did more jobs with it.

You may want to process srt files generated by Whisper alone, for two crucial purposes:
Splitting long lines (Whisper makes many very long ones), and granting the right character encoding by saving the files as UTF8 with BOM. This step is not necessary if we generated srt files from within Subtitle Edit, with the right options.

Subtitle Edit's batch function serves for this.The following images show some options to use for better line splitting, and some options useful for the batch processing.

srt split options

srt batch options

Note: Auto balance lines may fail quite often, so for reliability, maybe better leave it out.

You may also want to have the subtitles displayed in the preview window overlayed to the video, like when playing them back on TV. For this, just download mpv lib and then set your font size::

Showing subs overlayed in video window: download mpv lib

.Joining the subtitles with the videos

At last, we want to merge the subtitles with the videos.
Manually, this is done with MKVtoolNix. It's quite self explaining so I won't provide more to it here.
But you may also want to use the batch processing tricks described here, and in this case you can use the mkvmerge.exe program that comes with MKVtoolNix and is found in it's program folder. Run it within a command window and you'll get its help text describing the parameters available.
I recommend adding a path entry for MVtoolNix' program folder so you can use the tools from anywhere in a command window.

How to get a Command prompt:

Just open a folder , type "cmd" into its address line, and press Enter.

Now you are in a command prompt window in the same folder.

Up to Windows 9, a command prompt could also be obtained by pressing Shift and right-clicking on a folder, then selecting 'open command prompt here'. In Windows 10/11, only Powershell is offered. There, an equivalent of a command prompt can be obtained in by selecting 'open Powershell window', and within the Powershell window, entering "cmd".

A genuine command prompt can also be opened from the Start Menu, ..Windows, ..System. (Right- click for options, such as as Run as Administrator). Then navigate to specific folders by cd <directory_name>. Entering cd.. gets you one level up.

A more convenient way may be to install OpenCommandPromptHere from 4dots-Software, letting you choose if you want the prompt in normal or admin mode, or FileMenuTools from Lopesoft, the latter one coming with lots more right-click tools.

If you want to build yourself a genuine command window option for any directory in windows 10/11 like it was in Windows 9, see here and here.
Or, in short (beware, only try this if you know what you're doing!):

Start regedit, goto key HKEY_CLASSES_ROOT\Directory\shell\cmd
Select permissions of the cmd key, extended, change the assignee to you username, then ok and ok,
Then enter your username into the groups/usernames list, and allow it full access.
Change the name of HideBasedOnVelocityId to ShowBasedOnVelocityId
Perform the same steps with HKEY_CLASSES_ROOT\Directory\Background\shell\cmd

Now you can get the command prompt here option by right-clicking on a folder or on the background of an opened folder.

Preparing cmd batches:

Sometimes it may be convenient to do batch operations in the command window.
If you are subtitling many files at once, it may especially be useful to merge subtitles into large numbers of video files automatically.
Here are some tricks for batch subtitle generation:

Open a command prompt in the folder containing your files.
Enter dir /b (or dir /b /s if you want to span subdirectories) in the window popping up, then copy all of the listing out of the command window (select the text, then type ctrl-C) then may paste this into some application window for further processing, typing ctrl-V.
Now Excel or alike would come in handy: mark a column, then paste the dir listing by ctrl-V.
- Fill columns before and after with other parts of the commands you want to have executed, as well as some "s to allow for filenames with blanks in them.
- Then mark the entire field and copy by ctrl-C, paste it into an editor (the editor from Windows Accessories) window by ctrl-V, and save the text to some file. re-open the text, mark one of the many tabs (wide spaces) in the text and use replace-all to change them to nothing. Now we have all command lines ready.
- Select all , copy/paste them into a command window from the same directory where your files sit, and off you go!.

Why all that window copy/pasting? It avoids problems with special characters in file names.
These techniques also allow for making batch files processing through entire directory trees.

That's about all. The following paragraphs are left here for back reference only:

Whisper with Python and GIT (not recommended anymore)

Whisper can be used as a stand-alone app with Python and GIT. Yet this is my recommendation anymore, as Const-me that comes with Subtitle Edit works much faster and needs no cumbersome installetion. Yet if you want to waste some time, do this:

Install Nvidia CUDA 12, even if the current torch version is for 11.6, that's no problem. Only if CUDA 12 says your hardware isn't fit, use 11.6.
- You need to sign up as a developer for the download CUDA, which is easily done.
- Do a custom install of CUDA, selecting only the runtime part, and the display driver (if your's not yet up to date).

Download CUDNN and copy its files into a new folder, C:\Program Files\NVIDIA\CUDNN\v8.x\
zLib is a data compression software library that is needed by CUDNN.
- Download and extract the zLib package.
- Copy the zlibwapi.dll from it into C:\Program Files\NVIDIA\CUDNN\v8.x\bin\
Add a path entry C:\Program Files\NVIDIA\CUDNN\v8.x\bin

How to add a path entry: right click on Computer, select Properties, Advanced Options, Environment Variables. There are two path strings you can edit, one for just your use an one for the system. Use system for hardware related things like CUDA. Click on the one intended, then "add new entry".

We also need ffmpeg command line version in order to extract audio from various file formats:
- Download ffmpeg essential.
- Then create a folder C:\Program Files\ffmpeg\ and copy all files into it.
- Add a path entry C:\Program Files\ffmpeg\bin\
Download GIT, 2.39 or newer, and install it, all options default, but you may want to deselect adding to context menu.
- Note: if you had GIT installed before, in particular without CUDA, then first uninstall it, and also delete the GIT folder in Program Files. Maybe you need to do so with Python as well.
- Add path entries C:\Program Files\Git\bin\git.exe and C:\Program Files\Git\cmd (the latter should already be there)
Install Python 3.8 (or a version below 3.11), choose "add to environment variables" and "install pylauncher".
Reboot in order to make sure the new path entries are loaded.
Get a command window in admin mode (Start menu - under Windows, System) right click on it, run as admin) and enter the following commands:
- pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
This will download approx. 2,4 GB !.
Note that it's crucial to get exactly this version, 116, others won't work !
- python.exe -m pip install --upgrade pip
- pip install git+https://github.com/openai/whisper.git --use-pep517
(pep517 avoids some complaint about Wheel package not installed)

Now you may test, in a command window, supposed you have a file named test.wav and you've opened a command window and navigated to the same directory (see hint on getting a window with a command prompt below):

whisper "test.mkv" --model small --language English --device cuda

This will download the small language model, then generate text and subtitle files for test.mkv or whatever file you specified.

Some texts are difficult enough to get whisper in error loops; you will notice this when some output lines are repeated many times instead of generating the subsequent text. In this case, the option

--condition_on_previous_text False

will help. It may also speed up the process in these cases, and it reduces memory usage (e.g. only 2.4 GB of GPU memory instead of 3).
It's however not always better, may sometimes result in phrases from one line being repeated in the next one.

Let's now explore the specific requirements and results for CPU and GPU processing.

Whisper packet sizes, speed, memory usage
This is comparing an i5 CPU@4x4GHz an a GTX980 graphics card.

model	file size MB	RAM GB	RAM GB	RAM GB	RAM GB	speed factor	speed factor
		CPU	CPU c.o.p.t.	GPU	GPU c.o.p.t.	CPU	GPU
small	472	1.8	1.8	2.5	3.4	0.45	5
medium	1492	4,7	4,5	6,5?	8,5?	0,18	2?
large	3015	8,8	9,4	12?	18?	0.1	1?

c.o.p.t. means --condition_on_previous_text True (the default setting)

Note that in addition to the above numbers, approx. 3 GB extra CPU memory may be necessary with Windows10, provided no other memory intensive apps are running.

With GPU, memory is absolutely critical, the process stops if even only a bit is missing. So most currently installed graphics cards may not be able to run the large models.

Note that the Const-me option in Subtitle edit is not only 4 times faster, but also allows to use even the large model efficiently with only 4 GB graphics RAM!

Copyright (C) 2023; all rights reserved. All materials in these pages are presented for scientific evaluation of video technologies only. They may not be copied from here and used for entertainment or commercial activities of any kind.
We do not have any relation to and do not take any responsibility for any software and links mentioned on this site. This website does not contain any illegal software for download. If we, at all, take up any 3rd party software here, it's with the explicit permission of the author(s) and regarding all possible licensing and copyright issues, as to our best knowledge. All external download links go to the legal providers of the software concerned, as to our best knowledge.
Any trademarks mentioned here are the property of their owners. To our knowledge no trademark or patent infringement exists in these documents; any such infringement would be purely unintentional.
If you have any questions or objections about materials posted here, please e-mail us immediately.
You may use the information presented herein at your own risk and responsibility only. We do also not guarantee the correctness of any information on this site or others and do not encourage or recommend any use of it.
One further remark: These pages are covering only some aspects of PC video and are not intended to be a complete overview or an introduction for beginners.