Commit Graph

412 Commits

Author SHA1 Message Date
rlaphoenix 167b45475e Only decode text direction entities in Sub files
Previously, all entities were decoded in Subtitle files because of a problem with SubtitleEdit and it's /ReverseRtlStartEnd option not being entity-aware.

It actually ends up reversing the `;` of `&rlm;`, instead of the actual value of `&rlm;`. Therefore, I decoded all entities before SubtitleEdit could have processed the Subtitle, but this has caused problems with more advanced formats like TTML and WebVTT as `&lt;` would decode to `<` causing syntax errors, among other problematic characters.

According to the TTML and WebVTT spec, html entity encoding is allowed, and that makes sense or you wouldn't be able to use `<` etc. Any failure for players to show the decoded character would be a player problem and be out of scope with Devine.
2024-02-05 12:37:21 +00:00
rlaphoenix 568cb616df Use /ConvertColorsToDialog when converting subs to SRT format
This is because SubtitleEdit keeps color-related information when converting to SRT from WebVTT, TTML, and such formats. Why? Not 100% sure. Maybe some players support colors, but generally if you are using SubRip, it's because you either only want basic text subs, or your player doesn't support these "fancy" ooh-la-la colors.

This is a better solution to just stripped out the information. As the option name suggests, it isn't just removing the color information but rather using it to detect different speakers, then appropriately "dialogify" the captions when needed. I.e., start each speaker's sentence with `- `, and separate them with a new line.

The dash-style dialog formatting is quite vital to know if a caption is all spoken by one speaker versus multiple. Not particularly necessary for non-SDH captioning, but would be wanted for SDH subtitles.
2024-02-05 12:10:33 +00:00
rlaphoenix 3b62b50e25 Add support for SegmentBase and BaseURL-only DASH Manifests 2024-02-05 10:22:40 +00:00
rlaphoenix c06ea4cea8 Rework Chapter System, add `Chapters` class
Overall this commit is to just make working with Chapters a lot less manual and convoluted. The current system has you specify information that can easily be automated, like Chapter order and numbers, which is one of the main changes in this commit.

Note: This is a Breaking change and requires updates to your Service code. The `get_chapters()` method must be updated. For more information see the updated doc-string for `Service.get_chapters()`.

- Added new Chapters class which automatically sorts Chapters by timestamp.
- Chapter class has been significantly reworked to be much more generic. Most operations have been mvoed to the new Chapters class.
- Chapter objects can no longer specify a Chapter number. The number is now automatically set based on it's sorted order in the Chapters object, which is all done automatically.
- Chapter objects can now provide a timestamp in more formats. Timestamp's are now verified more efficiently.
- Chapter objects ID is now a crc32 hash of the timestamp and name instead of just basically their number.
- The Chapters object now also has an ID which is also a crc32 hash of all of the Chapter IDs it holds. This ID can be used for stuff like temp paths.
- `Service.get_chapters()` must now return a Chapters object. The Chapters object may be empty. The Chapters object must hold Chapter objects.
- Using `Chapter {N}` or `Act {N}` Chapters and so on is no longer permitted. You should instead leave the name blank if there's no descriptive name to use for it.
- If you or a user wants `Chapter {N}` names, then they can use the config option `chapter_fallback_name` set to `"Chapter {i:02}"`. See the config documentation for more info.
- Do not add a `00:00:00.000` Chapter, at all. This is automatically added for you if there's at least 1 Chapter with a timestamp after `00:00:00.000`.
2024-02-05 01:42:43 +00:00
rlaphoenix 2affb62ad0 Fix SegmentList source/media join with Base URL in DASH download_track() 2024-02-03 05:26:52 +00:00
rlaphoenix 30abe26321 Improve caching of keys to vaults log 2024-01-29 17:02:30 +00:00
rlaphoenix 3dbe0caa52 Fix Cookie update at the end of dl command 2024-01-29 16:28:40 +00:00
rlaphoenix 837061cf91 Rework Profile/Authentication System
- Removed `devine auth` command and sub-commands due to lack of support, risk of data, and general quirks of it.
- Removed `profiles` config data, you must now specify which profile you wish to use each time with -p/--profile. If you use a specific profile a lot more than others, you should make it the default. See below.
- Added a `default` key to each service mapping in `credentials` that will be used if -p/--profile is not specified.
- Each service mapping in `credentials` is no longer forced to use profiles. You can now simply specify `Service: username:password` if you only use one credential.
- Auth-less Services now simply have to specify no credential and have no cookie file.
- There is no longer an error for not having a cookie and/or credential for the chosen profile, as a profile no longer has to be chosen.
- Cookies are now checked for in 3 different locations in the following order:
1. `/Cookies/{Service Name}.txt`
2. `/Cookies/Service Name/{profile}.txt`
3. `/Cookies/Service Name/default.txt`
This means you now have more options on organization and layout of Cookie files, similarly to the new Credentials config.
Note: `/Cookies/Service Name/.txt` also works as an alternative to `default.txt`. The benefit of this is `.txt` will always be at the top of your folder.
2024-01-29 06:34:22 +00:00
rlaphoenix 1c6e91b6f9 Rename --group to --tag 2024-01-29 03:54:17 +00:00
rlaphoenix e9dc53735c Fix BaseURLs starting with `../` in DASH download_track() 2024-01-29 03:26:15 +00:00
rlaphoenix e967c7c8d1 Add custom RESTful Vault API Interface 2024-01-24 20:09:59 +00:00
rlaphoenix c08c45fc16 Prioritize loading configs next to devine over other locations 2024-01-24 18:44:01 +00:00
rlaphoenix 3b788c221a Look for a config file in 2 more locations
This is to aid using Devine in a portable folder by trying to load configs next to Devine's code.
2024-01-24 18:41:24 +00:00
rlaphoenix 21687e6649 No longer create an empty config in the user configs folder 2024-01-24 18:39:36 +00:00
rlaphoenix de7122a179 Add basic control file to Requests and Curl-Impersonate downloaders 2024-01-23 10:06:42 +00:00
rlaphoenix c53330046c Improve Dependencies list in README 2024-01-23 09:57:04 +00:00
rlaphoenix 6450d4d447 Change default downloader from aria2c to requests
This is to reduce the amount of required dependencies by not strictly requiring aria2c out of the box. You can always change the downloader back to aria2c in the config.
2024-01-23 09:56:25 +00:00
rlaphoenix 5e858e1259 Delete file on failure in Requests and Curl-Impersonate downloaders 2024-01-23 09:46:24 +00:00
rlaphoenix ba93c78b99 Add missing while loop to Curl-Impersonate downloader 2024-01-23 09:45:31 +00:00
rlaphoenix 172ab64017 Add missing while loop to Requests downloader 2024-01-21 18:47:19 +00:00
rlaphoenix 2056e056a4 Unescape HTML Entities in Subtitles after Downloading
This fixes some Subtitles having e.g., `&amp;` instead of just `&`, but especially for special entities like `&rlm;` which enables Right-to-Left mode on Hebrew and Arabic Subtitles.
2024-01-18 16:25:39 +00:00
rlaphoenix 26d067915f Fix output directory and filename for single-URL aria2c downloads 2024-01-17 04:49:37 +00:00
rlaphoenix 746c55d188 Fix progress total on single-URL requests downloads
Previously, it would show the download as fully complete after the first 1024-byte chunk was downloaded, as the Progress Bar total value was set to the amount of URLs. This is because it assumed there would be multiple URLs to download at once, and would advance the progress bar each time one of the downloads completed instead.

This changes it so that if there's only one URL to download, then it calculates the total amount of chunks to download which corrects the progress bar advances.
2024-01-14 01:24:51 +00:00
rlaphoenix 0493d28914 Manually specify the output format with Shaka-Packager
It normally auto-detects the format from the file extension. The supports formats are "MP4" and "WEBM". The input files to shaka-packager are currently always ".mp4", so this isn't particularly an issue.

However, I want to add this just as a pre-caution in case it isn't. This isn't an issue if the input file is another format, like WEBM, as this only controls the output format, the format devine wants, not the input and output format.
2024-01-12 01:17:18 +00:00
rlaphoenix 0116c278af Absorb original file and path in Decrypt, Repack, & Range Operations
To possibly support download resuming in the future, the file names for the decrypt, repack, and change range functions were simplified and once output has finished it then deletes the original input file and re-uses the original input file path.

The file names were changed to just append `_repack`, `_decrypted`, `_full_range` etc. to the filename rather than using a duplex extension (`.repack.mp4`, `.decrypted.mp4`, `.range0.mp4`).

This is all so that code to check if the file was already downloaded can be simpler. Instead of having to check if 4x different possible file names for a completed download existed, it checks one.
2024-01-12 01:11:47 +00:00
rlaphoenix ee56bc87c2 Use new Subtitle.convert() in dl command for --sub-format 2024-01-12 00:51:06 +00:00
rlaphoenix e76bc7201d Add convert() method to Subtitle class 2024-01-12 00:50:27 +00:00
rlaphoenix f4d8bc8dd0 Add support for parsing SubRip (SRT) in Subtitle.parse() 2024-01-12 00:37:22 +00:00
rlaphoenix 14ebe4ee1b Ensure input is UTF-8 when parsing TTML and WebVTT Subtitles
This fixes some conversion errors when working with non-latin languages like Russian (crylic) and Arabic.
2024-01-12 00:36:43 +00:00
rlaphoenix 96f1cbb260 Remove empty caption lists post-parsing in Subtitle.parse()
This issue is common with Now TV where it for some reason parses into "two" languages. "en" and "eng". This results in one empty caption list, and one non empty caption list. The empty caption list tends to be first.

This issue causes a multitude of snowballing problems later down the codebase like when converting to SRT it will result in "MULTI-LANGUAGE SRT" header, which most programs do not recognize, like mkvmerge, causing a mux failure.
2024-01-12 00:30:52 +00:00
rlaphoenix 9683c34337 Improve readability of Subtitle.parse() method 2024-01-12 00:27:19 +00:00
rlaphoenix c6c2e9ca51 Add Curl-Impersonate Downloader via curl_cffi project
The browser to imitate can be set in the config:

For example,
```yaml
curl_impersonate:
    browser: chrome110
```

It will default to using chrome110 if no value is set in the config.

A list of available Browsers are listed here: https://github.com/yifeikong/curl_cffi#sessions
2024-01-11 22:29:49 +00:00
rlaphoenix a9de9748ec Remove saldl from downloaders config docs 2024-01-09 22:35:45 +00:00
rlaphoenix e8e3d4a90f Remove 5-attempt loop from DASH and HLS Downloads
These are unnecessary now as all downloaders have retry functionality built-in.
2024-01-09 13:00:39 +00:00
rlaphoenix cc4900a2ed Remove uses of the downloader's silent arg in DASH and HLS
This was originally done to prevent *all* aria2c logs unless on the last attempt, at which if it failed all attempts it would let aria2c log the error.

However, that's bad practice as aria2c may produce errors or warnings on say the 3rd attempt, and the 3rd attempt may have otherwise succeeded, with warnings or errors. It also generally shouldn't be necessary.
2024-01-09 12:54:27 +00:00
rlaphoenix 009a880371 Silence at the log_buffer not the stdout in aria2c
This is so we can still obtain progress data while calling aria2c silently
2024-01-09 12:52:14 +00:00
rlaphoenix 9f04676b5c Get Cookie Header for each URL in aria2c 2024-01-09 12:41:15 +00:00
rlaphoenix 552a0f13a5 Add retry attempts to Requests downloader 2024-01-09 12:09:21 +00:00
rlaphoenix fa3cee11b7 Move Download Cancel/Skip Events to constants 2024-01-09 11:55:05 +00:00
rlaphoenix ce457df151 Change wording from Download Stopped to Download Cancelled 2024-01-09 11:38:58 +00:00
rlaphoenix d566aa2547 Show Licensing and Licensed Messages via Rich 2024-01-09 11:34:14 +00:00
rlaphoenix 09edb696ba Change to safer default values for -j, -x, and -s in aria2c
The original values would cause blocks by some Services. Therefore, it is better to default to safer values. The new values match the defaults used by aria2c as listed in their docs.
2024-01-09 10:22:28 +00:00
rlaphoenix a7bbac7bcc Get -j, -x, and -s from aria2c config, default to 16 2024-01-09 10:18:52 +00:00
rlaphoenix dbfefc1d97 Pretty up and improve readability of aria2c arguments 2024-01-09 10:05:03 +00:00
rlaphoenix 316f8f0530 Set Referer & User-Agent via dedicated options instead Header in aria2c 2024-01-09 09:57:31 +00:00
rlaphoenix 347c31d717 No longer retrieve timestamp of downloads in aria2c
For downloads by devine, there's generally no reason to retrieve this information when it will be decrypted, repacked, remuxed, and so on anyway. Requesting the timestamp will just mean more requests being made, perhaps slowing down the download.
2024-01-09 09:56:15 +00:00
rlaphoenix e54d4b4f41 Move unsupported proxy check to start of aria2c function 2024-01-09 09:55:12 +00:00
rlaphoenix 484338cf50 Remove unnecessary --min-split-size from aria2c downloader
This was added by another team member a long time ago, seemingly for the purposes of preventing a split on DASH/HLS segment files, as they would be already quite small.

However, just because they are small it isn't exactly a problem to have it split, and it would only split if the segment file size fits the default split size of 20M at least twice. I.e., if the segment is 45M, it will split twice. If the segment is 25M, it actually won't split at all. You may think 25M will split by 20M into two downloads, but actually the split size must explicitly fit for it to split. So for 2 downloads it will need to be 40MB in size, then 60, then 80, and so on.

A 40M or bigger segment file does in my opinion deserve to be split as it may genuinely reap speed benefits.
2024-01-09 09:52:22 +00:00
rlaphoenix a3ab971132 Fix infinite loop in Track.get_init_segment
If the Server returns a Content-Length Header with a value of 0, then the code near-after it would end up looping response streamed chunks of 0-length size, which would go on forever.
2024-01-09 02:45:10 +00:00
rlaphoenix 58cb00b18b Implement --no-proxy to disable all uses proxies and proxy providers
This prevents a service from setting a proxy if geofenced, and also discards any manually provided proxy from `--proxy`.
2024-01-09 02:40:49 +00:00