On Windows it seems to default to some encoding other than UTF-8 (possibly UTF-16 or CP-1252) and since the chapter file is saved as UTF-8, it breaks characters outside typical range. Like ø, æ, and other stuff.
The default is still SubRip SRT, but you can now change the output format to almost any of the available Codec options. There is no option to leave the subtitle format as-is yet. I.e., if there's a SRT and WebVTT subtitle, leave them both as-is.
Like always, you can configure a default in your config file, e.g.,
```yaml
dl:
sub_format: vtt
```
Note though that SSA, SSAv4, fTTML, and fVTT are not yet supported. There are no plans to support fTTML or fVTT.
Chardet was detecting a mixture of mostly cp1252 and MacRoman encoding, where it should just be left as-is when parsing. The actual text within it perhaps may want to go through `try_ensure_utf8` when parsed, but not the entire box.
* Add option for automatic subtitle character encoding normalization
The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).
* Remove Subtitle parameter `auto_fix_encoding`
Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.
* Move Subtitle encoding fixing code out of if drm tree
* Use chardet as a last ditch effort fixing Subs, or return original data
* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8
* Add Shivelight as a contributor
---------
Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
Note: There is some breaking changes here. If you manually worked with the Enum names here, then some of them have changed to better reflect the code points usage.
Generally speaking it should not affect service code.
I'm not happy with the approach used here to make portable installations of Devine, therefore for now I will remove the information relating to portable installations.
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
It also looks for the "expected 2 but parsed 1" error which is likely an error while parsing the WVD version field. If this happens, it will inform the user to use `pywidevine migrate`.
This is mainly to lessen confusion on service name typo's or new users getting used to the CLI.
It also changed the Exceptions on the methods of Service from ClickException to a KeyError since they are intended to be used on the core codebase outside of the context of Click.
This used to be used even before devine was public, but it was constantly changed back and forth between an urljoin(), another form of urljoin (something custom or something I can't remember), and an if check + addition.
However, I can confirm that a simple if check will not work as the Base URI might not even be in the same relative root. The if checks have also been inconsistent with some checking if it starts with http(s)://, and some checking if it does not have the base URI at the start of the string.
This if check method does not work as well as an urljoin() has the potential to. It also fixes some services as some HLS playlists would have the m3u8 URL on a completely different root, subdomain, or even domain, causing it to completely break when trying to download segments.
We cannot actually do this check. The Content-Length value will be the size after being further encoded or compressed. While we can find out what it was compressed with via the Content-Encoding header, we cannot match the downloaded length with the Content-Length header as requests will automatically decompress/decode according to the Content-Encoding header.
On new installs, or where the `WVDs` folder is not made yet, then the shutil.move() assumes it's a file path and moves the `.wvd` file to the WVDs folder path, as a file. If the folder existed but was empty, this error wouldn't have occurred.
DASH and normal URL downloads now both decrypt one large single or merged file after all downloads are finished. This leaves a bit of a "pause" between progress bar movement which looks a bit odd. So mark the track as being in a Decrypting state.
Since DASH doesn't have the ability to change keys dynamically per-track (Representation), there's no need for the DASH downloader to decrypt segments as they are downloaded (like HLS).
This halves the amount of processes needing to be opened as well as the I/O usage. It may result in noticeably lower CPU usage. Since the IOPS is lowered, you may even see an increase in download speed if downloading to something like a meh HDD.
This also fixes decryption in some weird edge-cases where decrypting each segment individually resulted in timestamp anomalies causing shaka to fail.
Some Servers may not response with the Content-Length header, even if it's from segmented media. I.e., if it's a subtitle URL. The requests downloader required the header to be present as it downloads each URL, which is not possible.
Now it tries to get it if possible, and verifies the download size with the Content-Length value if it could be obtained.
HEAD requests were made to sum a total file size of the download operation. However, the downloader is may be used on URLs where the content is not segmented media. Therefore, the server may not support or respond with the Content-Length header which causes the requests downloader to crash before it even gets a chance to begin downloading.
Even still, this total size value isn't really necessary, and would cause possibly 100s of HEAD requests (in quick succession of each other) on segmented sources. It would also add up-front delay before it actually starts to download.