Compare commits

..

127 Commits

Author SHA1 Message Date
Ninja Jiraiya 2d7440ed95 Merge pull request 'master' (#3) from DRMTalks/devine:master into master
Reviewed-on: #3
2024-03-14 03:25:35 +00:00
rlaphoenix e7294c95d1 fix(requests): Block until connection freed if too many connections 2024-03-13 17:15:13 +00:00
rlaphoenix 36b070f729 fix(requests): Manually compute default max_workers or pool size is None 2024-03-13 17:12:06 +00:00
Ninja Jiraiya d072190b11 Merge pull request 'master' (#2) from DRMTalks/devine:master into master
Reviewed-on: #2
2024-03-12 14:01:23 +00:00
rlaphoenix 458ad70fae fix(Video): Delete original file after using remove_eia_cc() 2024-03-12 11:08:15 +00:00
rlaphoenix 9fce56cc66 fix(Video): Delete original file after using change_color_range() 2024-03-12 11:07:40 +00:00
Ninja Jiraiya 905f5706eb Merge pull request 'master' (#1) from DRMTalks/devine:master into master
Reviewed-on: #1
2024-03-12 02:57:18 +00:00
rlaphoenix 1bff87bd70 fix(requests): Set HTTP pool connections/maxsize to max workers
This allows requests to open and save/cache up to *max_workers* amount of TCP connections. In most situations it will still only save and re-use one TCP Connection since it always tries to re-use the connection if one is available.

However, in situations where downloads are from more than 10 Host/Port combinations (the default pool connections/maxsize) then this will improve download speeds.
2024-03-12 01:06:42 +00:00
rlaphoenix 5376e4c042 refactor(Service): Go back to the default pool_maxsize in Session
The pool_maxsize value here isn't actually doing much. It should have also been applied to pool_connections. What we realistically needed was just pool_block to prevent opening too much connections (causing a warning). The default pool_connections=10 and pool_maxsize=10 is fine. The downloader doesn't currently use this value.
2024-03-12 00:59:30 +00:00
rlaphoenix c77d521a42 refactor(Track): Default the track name to it's lang's script/territory
This allows you to override the whole track name instead of just prefixing before the script/territory. If you want no track name at all, you can set the track name to an empty string.

The script "Zzzz" (placeholder?) and territory "ZZ" (placeholder?) are not used. The script/territory values are only used if available and if necessary. I.e., fr-CA will use "Canada" but fr-FR will NOT use "France", it will be blank.
2024-03-10 15:19:39 +00:00
rlaphoenix f0b589c8a5 refactor(Track): Remove TERRITORY_MAP constant, trim SAR China manually
e.g., Hong Kong SAR China, Macao SAR China
2024-03-10 15:13:01 +00:00
rlaphoenix 4f79550301 fix(Track): Fix order of operation mistake in get_track_name 2024-03-09 19:56:41 +00:00
rlaphoenix 73d9bc4f94 fix(HLS): Remove save dir even if final merge wasn't needed 2024-03-09 19:44:40 +00:00
rlaphoenix 35501bdb9c fix(DASH): Fix merge regression from recent commit
An else tree was used in 4d6c72ba30 when it shouldn't have been.

Fixes #81
2024-03-09 17:52:50 +00:00
rlaphoenix 1d5d4fd347 fix(dl): Use click.command() instead of click.group() 2024-03-09 01:40:21 +00:00
rlaphoenix 4d6c72ba30 fix(DASH/HLS): Don't merge folders, skip final merge if only 1 segment 2024-03-09 01:37:55 +00:00
rlaphoenix 77e663ebee feat(search): New Search command, Service method, SearchResult Class 2024-03-08 21:32:55 +00:00
rlaphoenix 10a01b0b47 fix(Track): Compute Track ID from the `this` variable, not `self` 2024-03-08 19:22:33 +00:00
rlaphoenix 4c395edc53 fix(dl): Add single mux job if there's no video tracks
Fixes regression from v3.1.0 with --audio-only, --subs-only and --chapters-only.
2024-03-08 19:06:21 +00:00
rlaphoenix eeccdc37cf fix(MultipleChoice): Simplify super() call and value types
It was using the wrong instance, leaving the convert() method to seemingly default to str() for the returned chosen value types (or something, I don't really see why this works).
2024-03-08 17:09:20 +00:00
rlaphoenix 423ff289db feat(Track): Allow Track to choose downloader to use
The downloader property must be a Callable of the same signature as the aria2c, curl_impersonate, and requests downloader functions. You can pass it these functions by importing, or a custom function of a matching signature.

Note: It will still override the chosen downloader and use a fallback one in the case of using aria2c downloader but the download uses the HTTP Range header.

Closes #70
2024-03-08 16:48:44 +00:00
rlaphoenix ba801739fe fix(aria2c): Support aria2(c) 1.37.0 by handling upstream regression
From aria2c's changelog (2007-09-02):

```
Now *.aria2 contorol file is first saved to *.aria2__temp and if it is successful, then renamed to *.aria2.
This prevents *.aria2 file from being truncated or corrupted when file system becomes out of space.
```

It seems something went wrong in 1.37.0 resulting in these files sometimes not being renamed back to `.aria2` and then being left there for good. The fix for devine would be to simply detect `.aria2__temp` and delete them once all segments finish downloading. My only worry here is the root cause for why it has failed to rename. Did the download actually complete without error? According to aria2c's RPC, no errors occurred. There's no way to add support for Aria2(c) 1.37.0 without this sort of change as the files to seem to download correctly regardless of the file not being renamed and then deleted.

Fixes #71
2024-03-08 16:15:50 +00:00
rlaphoenix 79506dda75 chore(HLS): Remove commented-out code from get_supported_key()
This is code I forgot to remove while testing the HLS rework which released in v3.0.0.
2024-03-08 15:48:39 +00:00
rlaphoenix ccac55897c refactor(ClearKey): Only use User-Agent if none set in from_m3u_key 2024-03-08 15:45:52 +00:00
rlaphoenix e0aa0e37d3 feat(ClearKey): Pass session not proxy str in from_m3u_key method
This reduces the amount of connections being made by quite a bit for playlists that constantly change keys, or have new key data for every single segment (e.g., Pluto sometimes).

It also allows you to pass headers and cookies, while still also being able to supply a proxy.
2024-03-08 15:44:41 +00:00
rlaphoenix c974a41b6d fix(dl): Include chapters when muxing
This is a regression from the newer mux-job code that was brought in alongside the multiple `-r/--range` mux jobs feature in v3.1.0.

Fixes #79
2024-03-08 15:30:36 +00:00
rlaphoenix 2bbe033efb fix(Tracks): Improve constructor typing, add Chapter(s) to typing 2024-03-08 15:20:40 +00:00
rlaphoenix 5950a4d4fa docs(changelog): Add v3.1.0 Changes 2024-03-05 17:11:47 +00:00
rlaphoenix 8d44920120 docs(version): Bump to v3.1.0 2024-03-05 17:09:34 +00:00
rlaphoenix f8871c1ef0 docs(changelog): Add git-cliff configuration
Conventional Commit scopes don't seem entirely compatible with Keep a Changelog's sections/headers, so I have abandoned the Keep a Changelog sections/headers for custom ones that more accurately represent the commit's scope.
2024-03-05 17:08:26 +00:00
rlaphoenix f7f974529b build: Explicitly use marisa-trie==1.1.0 for Python 3.12 wheels
The current version of langcodes (v3.3.0) is quite old and doesn't have explicit support for Python 3.11+ yet. It does work on Python 3.12 but one of it's dependencies, marisa-trie==0.7.8, does not have wheels for Python 3.12.

By explicitly using the pre-release version of one of langcode's dependencies, language-data, which is what depends on marisa-trie, we can upgrade to marisa-trie==1.1.0 which does have a wheel for Python 3.12.
2024-03-05 16:31:25 +00:00
rlaphoenix 0201c41feb feat(dl): Support multiple -r/--range and mux ranges separately
Multiple -r/--range values can be used with multiple -q/--quality values.

Closes #63
2024-03-04 13:11:43 +00:00
rlaphoenix 6e8efc3f63 fix(HLS): Use filtered out segment key info
Also simplifies calculation of wanted segment range when decrypting. Instead of storing the starting segment index number with the encryption_data variable, we just grab the first segment that isn't already merged.

Fixes #77
2024-03-04 12:51:00 +00:00
rlaphoenix 499fc67ea0 feat(cli): Implement MultipleChoice click param based on Choice param
This can be used in-place to click.Choice() when you want to choose multiple values. Values must be separated by `,` character. This does mean the `,` character cannot be in the choice sequence.
2024-03-04 11:06:56 +00:00
rlaphoenix b7b88f66ce feat(dl): Change --vcodec default to None, use any codec 2024-03-04 10:41:07 +00:00
rlaphoenix 1adc551926 refactor(dl): Remove unused `get_profiles()` method 2024-03-04 09:31:15 +00:00
rlaphoenix 77976c7e74 feat(Subtitle): Convert from fTTML->TTML & fVTT->WebVTT post-download 2024-03-02 15:37:12 +00:00
rlaphoenix cae47017dc refactor: Move dl command's download_track() to Track.download() 2024-03-02 15:08:22 +00:00
rlaphoenix f510095bcf feat(dl): Skip video lang filter if --v-lang unused & only 1 video lang
This hopefully improves user-experience for anyone using Devine mainly for content outside the English language. For example, if you do -l it and there's only English video track's available, then there's really no need to filter by language and fail.

However, it still attempts filtering if you explicitly used --v-lang. If the user expected all episodes to be French by using `--v-lang fr`, and the service had one random episode in English, then the user would very likely want to be informed to verify and decide how they want to deal with it if it really was English.
2024-03-02 12:54:17 +00:00
rlaphoenix a7c2210f0b fix(version): The `__version__` variable forgot to be updated 2024-03-02 06:10:01 +00:00
rlaphoenix 76dc54fc13 fix(dl): Have --sub-format default to None to keep original sub format 2024-03-01 05:18:46 +00:00
rlaphoenix c516f54a07 refactor(DASH): Change how Video FPS is gotten to remove FutureWarning log 2024-03-01 05:15:47 +00:00
rlaphoenix 289808b80c refactor(DASH): Move data values from track url to track data property 2024-03-01 05:08:59 +00:00
rlaphoenix 90c544966a refactor(Track): Rename extra to data, enforce type as dict
Setting data as a dictionary allows more places of code (including DASH, HLS, Services, etc) to get/set what they want by key instead of typically by index (list/tuple). Tuples or lists were typically in services because DASH and HLS stored needed data as a tuple and services did not want to interrupt or remove that data, even though it would be fine.
2024-03-01 04:29:45 +00:00
rlaphoenix a6a5699577 refactor(Track): Move delete and move methods near start of Class 2024-03-01 04:15:46 +00:00
rlaphoenix 866de402fb refactor(Track): Return new path on move(), raise exceptions on errors 2024-03-01 04:14:44 +00:00
rlaphoenix 3ceabd0c74 feat(Track): Add a name property to use for the Track Name 2024-03-01 04:11:53 +00:00
rlaphoenix 2a6fb96c3d fix(Track): Don't use fallback values "Zzzz"/"ZZ" for track name 2024-03-01 04:11:53 +00:00
rlaphoenix c14b37a696 fix(Track): Don't modify lang when getting name 2024-03-01 04:11:53 +00:00
rlaphoenix 5b7c72d270 refactor(Track): Move the path class instance variable with the rest 2024-03-01 04:11:52 +00:00
rlaphoenix 3358c4d203 refactor(Track): Remove unnecessary bool casting 2024-03-01 04:11:52 +00:00
rlaphoenix 6e9f977642 docs(Track): Remove unnecessary comments 2024-03-01 04:11:52 +00:00
rlaphoenix bd90bd6dca feat(Track): Make ID optional, Automatically compute one if not provided 2024-03-01 04:11:52 +00:00
rlaphoenix fa9db335d6 refactor(Track): Rename Descriptor's M3U & MPD to HLS & DASH 2024-03-01 04:11:52 +00:00
rlaphoenix ec5bd39c1b refactor(Track): Remove unused DRM enum 2024-03-01 04:11:52 +00:00
rlaphoenix ba693e214b refactor(Track): Remove swap() method and it's uses
Re-using the same track path and file name with a different output file, is not ideal as the files contents are different and the target file name specifies what processing it had done on it, which is useful during debugging when browsing the temp directory.
2024-03-01 03:04:07 +00:00
rlaphoenix 470e051100 refactor(Track): Add type checks, improve typing 2024-03-01 02:43:43 +00:00
rlaphoenix 944cfb0273 ci(pre-commit): Add a conventional-commit hook 2024-03-01 02:17:41 +00:00
rlaphoenix 27b3693cc1 docs(changelog): Add v3.0.0 changes 2024-03-01 00:03:09 +00:00
rlaphoenix 9aeab18dc3 Bump to v3.0.0 2024-03-01 00:01:33 +00:00
rlaphoenix a5fb5d33f1 Update default curl-impersonate browser to chrome120 2024-02-29 23:58:14 +00:00
rlaphoenix a55f4f6ac7 Update dependencies 2024-02-29 23:57:57 +00:00
rlaphoenix 1039de021b Update the copyright year and project description 2024-02-29 23:25:23 +00:00
rlaphoenix be0ed0b0fb Simplify `Tracks.__add__` method, support Chapter(s) & Track objects 2024-02-29 23:19:05 +00:00
rlaphoenix 97efb59e5f Only decode text direction entities in Sub files (cont.)
Already did this for HLS, but somehow forgot to for DASH and direct URLs.
2024-02-29 22:06:57 +00:00
rlaphoenix 4073cefc74 Remove Subtitle.remove_multi_lang_srt_header()
The root cause of the error which required calling this function was identified and fixed in this release.
2024-02-29 22:06:02 +00:00
Arias800 75641bc8ee
Add default shaka-packager build name (#74)
If the user build Shaka-packager manually, the default name will be “packager”.
Adding it to the list will ensure that Devine detects the app in this situation.
2024-02-27 22:48:54 +00:00
rlaphoenix 0c20160ddc Implement `__add__` to Tracks class 2024-02-20 22:06:39 +00:00
rlaphoenix eef397f2e8 HLS: Don't include map data if discontinuity/end of playlist was decrypted
The decrypt() call just before it would have included the map data for us, as it was needed to decrypt. Therefore, it would not need to be added again when merge_discontinuity() is called. In some cases re-adding the map data can cause playback or final merge failure.
2024-02-20 20:12:09 +00:00
rlaphoenix b829ea5c5e DASH: Detect SDH subtitles via AudioPurposeCS:2007=2 2024-02-20 19:29:21 +00:00
rlaphoenix 7f898cf2df HLS: Fix map data exists check when merging segments
`map_data` may resolve Truthy, while `map_data[1]` itself could be None, resulting in `None` being written to the stream.
2024-02-20 02:14:58 +00:00
rlaphoenix 2635d06d58 Set stop event & mark track failed if new HLS DRM fails to license 2024-02-20 01:46:47 +00:00
rlaphoenix 8de3a95c6b Flush file buffers when merging DASH or HLS segments 2024-02-20 01:35:58 +00:00
rlaphoenix 1259a26b14 Create and use new utility to get file extension from URLs/Paths
Fixes #73
2024-02-19 18:14:50 +00:00
rlaphoenix c826a702ab DASH: Fix URL concatenation in some edge cases
In some of the urljoin()'s it would end with `/None`, e.g., `http://.../some_base_value/None`, when it should just join with the base value only.
2024-02-19 17:45:40 +00:00
rlaphoenix 1b76e8ee28 Aria2c: Fix shutdown condition edge condition when URLs > 1000
`stopped_downloads` is capped to just 1000 objects even though I asked for 999999 downloads, so if aria2c is downloading more than 1000 URLs the count of stopped downloads will never match the count of download URLs and never stop.
2024-02-17 23:33:52 +00:00
rlaphoenix d65d29efa3 Remove unnecessary LANGUAGE_MUX_MAP
This language tag/code mapping table is no longer needed as of MKVToolNix v67, which has been the minimum supported version for some time now already.
2024-02-17 23:19:07 +00:00
rlaphoenix 81dca063fa Consolidate typing of Requests/MozillaCookieJar typing to CookieJar 2024-02-16 21:02:06 +00:00
rlaphoenix 9e0515609f HLS: Ignore possible folders when doing naive final merge 2024-02-16 18:41:05 +00:00
rlaphoenix 323577a5fd HLS: Update first segment of EXT-X-KEY state data on discontinuity 2024-02-16 18:21:21 +00:00
rlaphoenix e26e55caf3 HLS: Don't reset EXT-X-KEY state data on discontinuity 2024-02-16 16:50:12 +00:00
rlaphoenix 506ba0f615 HLS: Only merge relevant segments on discontinuity 2024-02-16 16:49:42 +00:00
rlaphoenix 2388c85894 HLS: Ensure all segments to decrypt in range exist 2024-02-16 16:49:13 +00:00
rlaphoenix 7587243aa2 HLS: Don't decrypt on key change if there were no prior segments 2024-02-16 16:48:38 +00:00
rlaphoenix 6a37fe9d1b HLS: Don't merge on discontinuity, if it's the first segment
How the m3u8 parser handles/groups #EXT-X to segment objects means the #EXT-X-DISCONTINUITY (`discontinuity` property) is tied to whatever segment is below it's line. Therefore, there's never a scenario where we need to merge+decrypt and the first every segment of the for loop, as there's no segments before it.

This can happen from just slightly off-spec playlists (can't blame it) but also from the OnSegmentFilter filtering out all segments before the first EXT-X-DISCONTINUITY. Common to happen when filtering out bumpers/intros.
2024-02-16 00:15:36 +00:00
rlaphoenix eac5ed5b61 Aria2c: Fix completed progress information
For some reason aria2c has like 700 internal "download" structs per actual URL it was downloading, probably something to do with multiple connections/split, don't know don't care, as this way works just fine.
2024-02-15 23:54:10 +00:00
rlaphoenix a8a89aab9c Aria2c: Fallback to an empty list if stopped_downloads is None
This was fine during most testing in the `for` loop below it, but there's also a len() a bit below that.
2024-02-15 23:45:44 +00:00
rlaphoenix 837015b4ea HLS: Fix incorrect last segment i when decrypting first segment 2024-02-15 23:44:00 +00:00
rlaphoenix 1f11ed258b DASH: Update progress bar when merging segments 2024-02-15 20:06:42 +00:00
rlaphoenix 4e12b867f1 Aria2c: Improve download progress and error handling 2024-02-15 19:19:37 +00:00
rlaphoenix e8b07bf03a DASH: Don't set Range Header if no bytes range value
This caused a HTTP 501 Not Implemented on some CDNs.
2024-02-15 19:10:52 +00:00
rlaphoenix 630a9906ce Rework the Aria2c Downloader
- Downloads are now multithreaded directly in the downloader.
- Now reuses connections instead of having to close and reopen connections for every single download.
- Progress updates are now yielded back to the caller instead of drilling down a progress callable.
- Instead of parsing download progress information in a very hacky way from the stdout stream, use aria2's RPC interface.
- Added a new utility get_free_port which is needed to choose aria2's RPC port as I do not want to use the default port in case the user is already using this port for another tool or reason. Also, to try mitigate port scanning attacks that target aria2 RPC ports.
- The config entry `aria2c.max_concurrent_downloads` is now actually used by aria2c when downloading.
- The `--max-concurrent-downloads` option and config value now defaults to `min(32,(cpu_count+4))` (usually around 16 for above average systems) instead of 5.
- Automated pproxy proxy rerouter is made via subprocess instead of trying to re-do what the pproxy entry point does for us, less code, less trouble, and was ultimately easier to implement.
2024-02-15 17:26:39 +00:00
rlaphoenix 2b7fc929f6 Rework the HLS downloader, add support for new downloaders
- It now downloads all segment files multi-threaded first before any decryption or merging operations (excluding init data, which will be downloaded in sequence/order after all the segments are downloaded)
- Once all segments are downloaded it then starts to go through and do any merging/decryption/init data stuff/e.t.c afterwards.
- Segments are no longer decrypted one by one. If segments use the same EXT-X-KEY data, then they will be merged together and then decrypted. This should see a noticeable speed increase for Widevine DRM.
2024-02-15 17:26:39 +00:00
rlaphoenix e5a330df7e Add support for the new Downloaders to direct URLs 2024-02-15 17:26:39 +00:00
rlaphoenix a1ed083b74 Add support for the new Downloaders to DASH 2024-02-15 17:26:39 +00:00
rlaphoenix 0e96d18af6 Rework the Requests and Curl-Impersonate Downloaders
- Downloads are now multithreaded directly in the downloader.
- Requests and Curl-Impersonate use one singular Session for all downloads, keeping connections alive and cached so it doesn't have to close and reopen connections for every single download.
- Progress updates are now yielded back to the caller instead of drilling down a progress callable.
2024-02-15 17:26:39 +00:00
rlaphoenix 709901176e Use CRC32 instead of MD5 for Track IDs in DASH/HLS 2024-02-15 10:56:51 +00:00
rlaphoenix bd185126b6 HLS: Skip merging continuity if all segments were skipped
If all segments of a continuity is skipped, i.e. by OnSegmentFilter, then this code fails as the folder wouldn't exist.
2024-02-13 17:03:42 +00:00
rlaphoenix cd194e3192 Add new Track Event, OnSegmentDownloaded
Like OnDownloaded but called every time a DASH or HLS segment is downloaded. The path to the downloaded segment file is passed to the callable.
2024-02-10 18:10:09 +00:00
rlaphoenix 87779f4e7d Move Track OnDownloaded event before decryption 2024-02-10 18:05:35 +00:00
rlaphoenix a98d1d98ac Add a new Subtitle Track Event, OnConverted
This runs after a Subtitle has been converted to another format, and only if it was converted. It is passed the new subtitle format codec value.
2024-02-10 18:05:35 +00:00
rlaphoenix c18fe5706b Pass DRM and Segment objects to Track OnDecrypted event 2024-02-10 17:48:26 +00:00
rlaphoenix 439e376b38 No longer pass the track through track events
If you are setting a callable onto a track event, then you have access to the track variable, so just include/use that in your lambda/callable.
2024-02-10 17:47:12 +00:00
rlaphoenix 7be24a130d Give some documentation on Track events 2024-02-10 17:19:48 +00:00
rlaphoenix 8bf6e4d87e Improve typing of Chapters constructor 2024-02-10 12:47:14 +00:00
rlaphoenix 92e00ed667 Fix OGM Chapter Regex patterns in Chapters class 2024-02-10 12:42:17 +00:00
rlaphoenix 66edf577f9 Allow Chapter Timestamp to be float, fix typing 2024-02-10 12:35:02 +00:00
rlaphoenix a544b1e867 Merge HLS segments first by discontinuity then via FFmpeg
HLS playlists where each segment is in an mp4 container seems to corrupt when the EXT-X-MAP is changed out, unless you first merge segments by discontinuity and then merge the merges via FFmpeg (which demuxes all the merged segment continuities and then concatanates them together, probably giving it new init data too).
2024-02-09 08:33:17 +00:00
rlaphoenix 167b45475e Only decode text direction entities in Sub files
Previously, all entities were decoded in Subtitle files because of a problem with SubtitleEdit and it's /ReverseRtlStartEnd option not being entity-aware.

It actually ends up reversing the `;` of `&rlm;`, instead of the actual value of `&rlm;`. Therefore, I decoded all entities before SubtitleEdit could have processed the Subtitle, but this has caused problems with more advanced formats like TTML and WebVTT as `&lt;` would decode to `<` causing syntax errors, among other problematic characters.

According to the TTML and WebVTT spec, html entity encoding is allowed, and that makes sense or you wouldn't be able to use `<` etc. Any failure for players to show the decoded character would be a player problem and be out of scope with Devine.
2024-02-05 12:37:21 +00:00
rlaphoenix 568cb616df Use /ConvertColorsToDialog when converting subs to SRT format
This is because SubtitleEdit keeps color-related information when converting to SRT from WebVTT, TTML, and such formats. Why? Not 100% sure. Maybe some players support colors, but generally if you are using SubRip, it's because you either only want basic text subs, or your player doesn't support these "fancy" ooh-la-la colors.

This is a better solution to just stripped out the information. As the option name suggests, it isn't just removing the color information but rather using it to detect different speakers, then appropriately "dialogify" the captions when needed. I.e., start each speaker's sentence with `- `, and separate them with a new line.

The dash-style dialog formatting is quite vital to know if a caption is all spoken by one speaker versus multiple. Not particularly necessary for non-SDH captioning, but would be wanted for SDH subtitles.
2024-02-05 12:10:33 +00:00
rlaphoenix 3b62b50e25 Add support for SegmentBase and BaseURL-only DASH Manifests 2024-02-05 10:22:40 +00:00
rlaphoenix c06ea4cea8 Rework Chapter System, add `Chapters` class
Overall this commit is to just make working with Chapters a lot less manual and convoluted. The current system has you specify information that can easily be automated, like Chapter order and numbers, which is one of the main changes in this commit.

Note: This is a Breaking change and requires updates to your Service code. The `get_chapters()` method must be updated. For more information see the updated doc-string for `Service.get_chapters()`.

- Added new Chapters class which automatically sorts Chapters by timestamp.
- Chapter class has been significantly reworked to be much more generic. Most operations have been mvoed to the new Chapters class.
- Chapter objects can no longer specify a Chapter number. The number is now automatically set based on it's sorted order in the Chapters object, which is all done automatically.
- Chapter objects can now provide a timestamp in more formats. Timestamp's are now verified more efficiently.
- Chapter objects ID is now a crc32 hash of the timestamp and name instead of just basically their number.
- The Chapters object now also has an ID which is also a crc32 hash of all of the Chapter IDs it holds. This ID can be used for stuff like temp paths.
- `Service.get_chapters()` must now return a Chapters object. The Chapters object may be empty. The Chapters object must hold Chapter objects.
- Using `Chapter {N}` or `Act {N}` Chapters and so on is no longer permitted. You should instead leave the name blank if there's no descriptive name to use for it.
- If you or a user wants `Chapter {N}` names, then they can use the config option `chapter_fallback_name` set to `"Chapter {i:02}"`. See the config documentation for more info.
- Do not add a `00:00:00.000` Chapter, at all. This is automatically added for you if there's at least 1 Chapter with a timestamp after `00:00:00.000`.
2024-02-05 01:42:43 +00:00
rlaphoenix 2affb62ad0 Fix SegmentList source/media join with Base URL in DASH download_track() 2024-02-03 05:26:52 +00:00
rlaphoenix 30abe26321 Improve caching of keys to vaults log 2024-01-29 17:02:30 +00:00
rlaphoenix 3dbe0caa52 Fix Cookie update at the end of dl command 2024-01-29 16:28:40 +00:00
rlaphoenix 837061cf91 Rework Profile/Authentication System
- Removed `devine auth` command and sub-commands due to lack of support, risk of data, and general quirks of it.
- Removed `profiles` config data, you must now specify which profile you wish to use each time with -p/--profile. If you use a specific profile a lot more than others, you should make it the default. See below.
- Added a `default` key to each service mapping in `credentials` that will be used if -p/--profile is not specified.
- Each service mapping in `credentials` is no longer forced to use profiles. You can now simply specify `Service: username:password` if you only use one credential.
- Auth-less Services now simply have to specify no credential and have no cookie file.
- There is no longer an error for not having a cookie and/or credential for the chosen profile, as a profile no longer has to be chosen.
- Cookies are now checked for in 3 different locations in the following order:
1. `/Cookies/{Service Name}.txt`
2. `/Cookies/Service Name/{profile}.txt`
3. `/Cookies/Service Name/default.txt`
This means you now have more options on organization and layout of Cookie files, similarly to the new Credentials config.
Note: `/Cookies/Service Name/.txt` also works as an alternative to `default.txt`. The benefit of this is `.txt` will always be at the top of your folder.
2024-01-29 06:34:22 +00:00
rlaphoenix 1c6e91b6f9 Rename --group to --tag 2024-01-29 03:54:17 +00:00
rlaphoenix e9dc53735c Fix BaseURLs starting with `../` in DASH download_track() 2024-01-29 03:26:15 +00:00
rlaphoenix e967c7c8d1 Add custom RESTful Vault API Interface 2024-01-24 20:09:59 +00:00
rlaphoenix c08c45fc16 Prioritize loading configs next to devine over other locations 2024-01-24 18:44:01 +00:00
rlaphoenix 3b788c221a Look for a config file in 2 more locations
This is to aid using Devine in a portable folder by trying to load configs next to Devine's code.
2024-01-24 18:41:24 +00:00
rlaphoenix 21687e6649 No longer create an empty config in the user configs folder 2024-01-24 18:39:36 +00:00
rlaphoenix de7122a179 Add basic control file to Requests and Curl-Impersonate downloaders 2024-01-23 10:06:42 +00:00
rlaphoenix c53330046c Improve Dependencies list in README 2024-01-23 09:57:04 +00:00
rlaphoenix 6450d4d447 Change default downloader from aria2c to requests
This is to reduce the amount of required dependencies by not strictly requiring aria2c out of the box. You can always change the downloader back to aria2c in the config.
2024-01-23 09:56:25 +00:00
rlaphoenix 5e858e1259 Delete file on failure in Requests and Curl-Impersonate downloaders 2024-01-23 09:46:24 +00:00
rlaphoenix ba93c78b99 Add missing while loop to Curl-Impersonate downloader 2024-01-23 09:45:31 +00:00
34 changed files with 4068 additions and 2574 deletions

View File

@ -2,6 +2,11 @@
# See https://pre-commit.com/hooks.html for more hooks # See https://pre-commit.com/hooks.html for more hooks
repos: repos:
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v3.1.0
hooks:
- id: conventional-pre-commit
stages: [commit-msg]
- repo: https://github.com/mtkennerly/pre-commit-hooks - repo: https://github.com/mtkennerly/pre-commit-hooks
rev: v0.3.0 rev: v0.3.0
hooks: hooks:

View File

@ -2,8 +2,265 @@
All notable changes to this project will be documented in this file. All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Versions [3.0.0] and older use a format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
but versions thereafter use a custom changelog format using [git-cliff](https://git-cliff.org).
## [3.1.0] - 2024-03-05
### Features
- *cli*: Implement MultipleChoice click param based on Choice param
- *dl*: Skip video lang filter if --v-lang unused & only 1 video lang
- *dl*: Change --vcodec default to None, use any codec
- *dl*: Support multiple -r/--range and mux ranges separately
- *Subtitle*: Convert from fTTML->TTML & fVTT->WebVTT post-download
- *Track*: Make ID optional, Automatically compute one if not provided
- *Track*: Add a name property to use for the Track Name
### Bug Fixes
- *dl*: Have --sub-format default to None to keep original sub format
- *HLS*: Use filtered out segment key info
- *Track*: Don't modify lang when getting name
- *Track*: Don't use fallback values "Zzzz"/"ZZ" for track name
- *version*: The `__version__` variable forgot to be updated
### Changes
- Move dl command's download_track() to Track.download()
- *dl*: Remove unused `get_profiles()` method
- *DASH*: Move data values from track url to track data property
- *DASH*: Change how Video FPS is gotten to remove FutureWarning log
- *Track*: Add type checks, improve typing
- *Track*: Remove swap() method and it's uses
- *Track*: Remove unused DRM enum
- *Track*: Rename Descriptor's M3U & MPD to HLS & DASH
- *Track*: Remove unnecessary bool casting
- *Track*: Move the path class instance variable with the rest
- *Track*: Return new path on move(), raise exceptions on errors
- *Track*: Move delete and move methods near start of Class
- *Track*: Rename extra to data, enforce type as dict
### Builds
- Explicitly use marisa-trie==1.1.0 for Python 3.12 wheels
## [3.0.0] - 2024-03-01
### Added
- Support for Python 3.12.
- Audio track's Codec Enum now has [FLAC](https://en.wikipedia.org/wiki/FLAC) defined.
- The Downloader to use can now be set in the config under the [downloader key](CONFIG.md#downloader-str).
- New Multi-Threaded Downloader, `requests`, that makes HTTP(S) calls using [Python-requests](https://requests.readthedocs.io).
- New Multi-Threaded Downloader, `curl_impersonate`, that makes HTTP(S) calls using [Curl-Impersonate](https://github.com/yifeikong/curl-impersonate) via [Curl_CFFI](https://github.com/yifeikong/curl_cffi).
- HLS manifests specifying a Byte range value without starting offsets are now supported.
- HLS segments that use `EXT-X-DISCONTINUITY` are now supported.
- DASH manifests with SegmentBase or only BaseURL are now supported.
- Subtitle tracks from DASH manifests now automatically marked as SDH if `urn:tva:metadata:cs:AudioPurposeCS:2007 = 2`.
- The `--audio-only/--subs-only/--chapters-only` flags can now be used simultaneously. For example, `--subs-only`
with `--chapters-only` will get just Subtitles and Chapters.
- Added `--video-only` flag, which can also still be simultaneously used with the only "only" flags. Using all four
of these flags will have the same effect as not using any of them.
- Added `--no-proxy` flag, disabling all uses of proxies, even if `--proxy` is set.
- Added `--sub-format` option, which sets the wanted output subtitle format, defaulting to SubRip (SRT).
- Added `Subtitle.reverse_rtl()` method to use SubtitleEdit's `/ReverseRtlStartEnd` functionality.
- Added `Subtitle.convert()` method to convert the loaded Subtitle to another format. Note that you cannot convert to
fTTML or fVTT, but you can convert from them. SubtitleEdit will be used in precedence over pycaption if available.
Converting to SubStationAlphav4 requires SubtitleEdit, but you may want to manually alter the Canvas resolution after
the download.
- Added support for SubRip (SRT) format subtitles in `Subtitle.parse()` via pycaption.
- Added `API` Vault Client aiming for a RESTful like API.
- Added `Chapters` Class to hold the new reworked `Chapter` objects, automatically handling stuff like order of the
Chapters, Chapter numbers, loading from a chapter file or string, and saving to a chapter file or string.
- Added new `chapter_fallback_name` config option allowing you to set a Chapter Name Template used when muxing Chapters
into an MKV Container with MKVMerge. Do note, it defaults to no Chapter Fallback Name at all, but MKVMerge will force
`Chapter {i:02}` at least for me on Windows with the program language set to English. You may want to instead use
`Chapter {j:02}` which will do `Chapter 01, Intro, Chapter 02` instead of `Chapter 01, Intro, Chapter 03` (an Intro
is not a Chapter of story, but it is the 2nd Chapter marker, so It's up to you how you want to interpret it).
- Added new `Track.OnSegmentDownloaded` Event, called any time one of the Track's segments were downloaded.
- Added new `Subtitle.OnConverted` Event, called any time that Subtitle is converted.
- Implemented `__add__` method to `Tracks` class, allowing you to add to the first Tracks object. For example, making
it handy to merge HLS video tracks with DASH tracks, `tracks = dash_tracks + hls_tracks.videos`, or for iterating:
`for track in dash.videos + hls.videos: ...`.
- Added new utility `get_free_port()` to get a free local port to use, though it may be taken by the time it's used.
### Changed
- Moved from my forked release of pymp4 (`rlaphoenix-pymp4`) back to the original `pymp4` release as it is
now up-to-date with some of my needed fixes.
- The DASH manifest is now stored in the Track `url` property to be reused by `DASH.download_track()`.
- Encrypted DASH streams are now downloaded in full and then decrypted, instead of downloading and decrypting
each individual segment. Unlike HLS, DASH cannot dynamically switch out the DRM/Protection information.
This brings both CPU and Disk IOPS improvements, as well as fixing rare weird decryption anomalies like broken
or odd timestamps, decryption failures, or broken a/v continuity.
- When a track is being decrypted, it now displays "Decrypting" and afterward "Decrypted" in place of the download
speed.
- When a track finishes downloaded, it now displays "Downloaded" in place of the download speed.
- When licensing is needed and fails, the track will display "FAILED" in place of the download speed. The track
download will cancel and all other track downloads will be skipped/cancelled; downloading will end.
- The fancy smart quotes (`“` and `”`) are now stripped from filenames.
- All available services are now listed if you provide an invalid service tag/alias.
- If a WVD file fails to load and looks to be in the older unsupported v1 format, then instructions on migrating to
v2 will be displayed.
- If Shaka-Packager prints an error (i.e., `:ERROR:` log message) it will now raise a `subprocess.CalledProcessError`
exception, even if the process return code is 0.
- The Video classes' Primaries, Transfer, and Matrix classes had changes to their enum names to better represent their
values and uses. See the changed names in the [commit](https://github.com/devine-dl/devine/commit/c159672181ee3bd07b06612f256fa8590d61795c).
- SubRip (SRT) Subtitles no longer have the `MULTI-LANGUAGE SRT` header forcefully removed. The root cause of the error
was identified and fixed in this release.
- Since `Range.Transfer.SDR_BT_601_625 = 5` has been removed, `Range.from_cicp()` now internally remaps CICP transfer
values of `5` to `6` (which is now `Range.Transfer.BT_601 = 6`).
- Referer and User-Agent Header values passed to the aria2(c) downloader is now set via the dedicated `--referer` and
`--user-agent` options respectively, instead of `--header`.
- The aria2(c) `-j`, `-x`, and `-s` option values can now be set by the config under the `aria2c` key in the options'
full names.
- The aria2(c) `-x`, and `-s` option values now use aria2(c)'s own default values for them instead of `16`. The `j`
option value defaults to ThreadPoolExecutor's algorithm of `min(32,(cpu_count+4))`.
- The download progress bar now states `LICENSING` on the speed text when licensing DRM, and `LICENSED` once finished.
- The download progress bar now states `CANCELLING`/`CANCELLED` on the speed text when cancelling downloads. This is to
make it more clear that it didn't just stop, but stopped as it was cancelled.
- The download cancel/skip events were moved to `constants.py` so it can be used across the codebase easier without
argument drilling. `DL_POOL_STOP` was renamed to `DOWNLOAD_CANCELLED` and `DL_POOL_SKIP` to `DOWNLOAD_LICENCE_ONLY`.
- The Cookie header is now calculated for each URL passed to the aria2(c) downloader based on the URL. Instead of
passing every single cookie, which could have two cookies with the same name aimed for different host names, we now
pass only cookies intended for the URL.
- The aria2(c) process no longer prints output to the terminal directly. Devine now only prints contents of the
captured log messages to the terminal. This allows filtering out of errors and warnings that isn't a problem.
- DASH and HLS no longer download segments silencing errors on all but the last retry as the downloader rework makes
this unnecessary. The errors will only be printed on the final retry regardless.
- `Track.repackage()` now saves as `{name}_repack.{ext}` instead of `{name}.repack.{ext}`.
- `Video.change_color_range()` now saves as `{name}_{limited|full}_range.{ext}` instead of `{name}.range{0|1}.{ext}`.
- `Widevine.decrypt()` now saves as `{name}_decrypted.{ext}` instead of `{name}.decrypted.{ext}`.
- Files starting with the save path's name and using the save path's extension, but not the save path, are no longer
deleted on download finish/stop/failure.
- The output container format is now explicitly specified as `MP4` when calling `shaka-packager`.
- The default downloader is now `requests` instead of `aria2c` to reduce required external dependencies.
- Reworked the `Chapter` class to only hold a timestamp and name value with an ID automatically generated as a CRC32 of
the Chapter representation.
- The `--group` option has been renamed to `--tag`.
- The config file is now read from three more locations in the following order:
1) The Devine Namespace Folder (e.g., `%appdata%/Python/Python311/site-packages/devine/devine.yaml`).
2) The Parent Folder to the Devine Namespace Folder (e.g., `%appdata%/Python/Python311/site-packages/devine.yaml`).
3) The AppDirs User Config Folder (e.g., `%localappdata%/devine/devine.yaml`).
Location 2 allows having a config at the root of a portable folder.
- An empty config file is no longer created when no config file is found.
- You can now set a default cookie file for a Service, [see README](README.md#cookies--credentials).
- You can now set a default credential for a Service, [see config](CONFIG.md#credentials-dictstr-strlistdict).
- Services are now auth-less by default and the error for not having at least a cookie or credential is removed.
Cookies/Credentials will only be loaded if a default one for the service is available, or if you use `-p/--profile`
and the profile exists.
- Subtitles when converting to SubRip (SRT) via SubtitleEdit will now use the `/ConvertColorsToDialog` option.
- HLS segments are now merged by discontinuity instead of all at once. The merged discontinuities are then finally
merged to one file using `ffmpeg`. Doing the final merge by byte concatenation did not work for some playlists.
- The Track is no longer passed through Event Callables. If you are able to set a function on an Even Callable, then
you should have access to the track reference to call it directly if needed.
- The Track.OnDecrypted event callable is now passed the DRM and Segment objects used to Decrypt. The segment object is
only passed from HLS downloads.
- The Track.OnDownloaded event callable is now called BEFORE decryption, right after downloading, not after decryption.
- All generated Track ID values across the codebase has moved from md5 to crc32 values as code processors complain
about its use surrounding security, and it's length is too large for our use case anyway.
- HLS segments are now downloaded multi-threaded first and then processed in sequence thereafter.
- HLS segments are no longer decrypted one-by-one, requiring a lot of shaka-packager processes to run and close.
They now merged and decrypt in groups based on their EXT-X-KEY, before being merged per discontinuity.
- The DASH and HLS downloaders now pass multiple URLs to the downloader instead of one-by-one, heavily increasing speed
and reliability as connections are kept alive and re-used.
- Downloaders now yield back progress information in the same convention used by `rich`'s `Progress.update()` method.
DASH and HLS now pass the yielded information to their progress callable instead of passing the progress callable to
the downloader.
- The aria2(c) downloader now uses the aria2(c) JSON-RPC interface to query for download progress updates instead of
parsing the stdout data in an extremely hacky way.
- The aria2(c) downloader now re-routes non-HTTP proxies via `pproxy` by a subprocess instead of the now-removed
`start_pproxy` utility. This way has proven to be easier, more reliable, and prevents pproxy from messing with rich's
terminal output in strange ways.
- All downloader function's have an altered signature but ultimately similar. `uri` to `urls`, `out` (path) was removed,
we now calculate the save path by passing an `output_dir` and `filename`. The `silent`, `segmented`, and `progress`
parameters were completely removed.
- All downloader `urls` can now be a string or a dictionary containing extra URL-specific options to use like
URL-specific headers. It can also be a list of the two types of URLs to downloading multi-threaded.
- All downloader `filenames` can be a static string, or a filename string template with a few variables to use. The
template system used is f-string, e.g., `"file_{i:03}{ext}"` (ext starts with `.` if there's an extension).
- DASH now updates the progress bar when merging segments.
- The `Widevine.decrypt()` method now also searches for shaka-packager as just `packager` as it is the default build
name. (#74)
### Removed
- The `devine auth` command and sub-commands due to lack of support, risk of data, and general quirks with it.
- Removed `profiles` config, you must now specify which profile you wish to use each time with `-p/--profile`. If you
use a specific profile a lot more than others, you should make it the default.
- The `saldl` downloader has been removed as their binary distribution is whack and development has seemed to stall.
It was only used as an alternative to what was at the time the only downloader, aria2(c), as it did not support any
form of Byte Range, but `saldl` did, which was crucial for resuming extremely large downloads or complex playlists.
However, now we have the requests downloader which does support the Range header.
- The `Track.needs_proxy` property was removed for a few design architectural reasons.
1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it
bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all
communication indefinitely, not switch in and out depending on the track or service.
2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could
download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly
download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator
of bad play and flag you.
3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used
by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it
back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it,
make the calls, then assuming your still in the same function you can add it back. If you're not in the same
function, well, time for some spaghetti code.
- The `Range.Transfer.SDR_BT_601_625 = 5` key and value has been removed as I cannot find any official source to verify
it as the correct use. However, usually a `transfer` value of `5` would be PAL SD material so it better matches `6`,
which is (now named) `Range.Transfer.BT_601 = 6`. If you have something specifying transfer=5, just remap it to 6.
- The warning log `There's no ... Audio Tracks, likely part of an invariant playlist, continuing...` message has been
removed. So long as your playlist is expecting no audio tracks, or the audio is part of the video transport, then
this wouldn't be a problem whatsoever. Therefore, having it log this annoying warning all the time is pointless.
- The `--min-split-size` argument to the aria2(c) downloader as it was only used to disable splitting on
segmented downloads, but the newer downloader system wouldn't really need or want this to be done. If aria2 has
decided based on its other settings to have split a segment file, then it likely would benefit from doing so.
- The `--remote-time` argument from the aria2(c) downloader as it may need to do a GET and a HEAD request to
get the remote time information, slowing the download down. We don't need this information anyway as it will likely
be repacked with `ffmpeg` or multiplexed with `mkvmerge`, discarding/losing that information.
- DASH and HLS's 5-attempt retry loop as the downloaders will retry for us.
- The `start_pproxy` utility has been removed as all uses of it now call `pproxy` via subprocess instead.
- The `LANGUAGE_MUX_MAP` constant and it's usage has been removed as it is no longer necessary as of MKVToolNix v54.
### Fixed
- Uses of `__ALL__` with Class objects have been correct to `__all__` with string objects, following PEP8.
- Fixed value of URL passed to `Track.get_key_id()` as it was a tuple rather than the URL string.
- The `--skip-dl` flag now works again after breaking in v[1.3.0].
- Move WVD file to correct location on new installations in the `wvd add` command.
- Cookie data is now passed to downloaders and use URLs based on the URI it will be used for, just like a browser.
- Failure to get FPS in DASH when SegmentBase isn't used.
- An error message is now returned if a WVD file fails to load instead of raising an exception.
- Track language information within M3U playlists are now validated with langcodes before use. Some manifests use the
property for arbitrary data that their apps/players use for their own purposes.
- Attempt to fix non-UTF-8 and mixed-encoding Subtitle downloads by automatically converting to UTF-8. (#43)
Decoding is attempted in the following order: UTF-8, CP-1252, then finally chardet detection. If it's neither UTF-8
nor CP-1252 and chardet could not detect the encoding, then it is left as-is. Conversion is done per-segment if the
Subtitle is segmented, unless it's the fVTT or fTTML formats which are binary.
- Chapter Character Encoding is now explicitly set to UTF-8 when muxing to an MKV container as Windows seems to default
to latin1 or something, breaking Chapter names with any sort of special character within.
- Subtitle passed through SubtitleEdit now explicitly use UTF-8 character encoding as it usually defaulted to UTF-8
with Byte Order Marks (aka UTF-8-SIG/UTF-8-BOM).
- Subtitles passed through SubtitleEdit now use the same output format as the subtitle being processed instead of SRT.
- Fixed rare infinite loop when the Server hosting the init/header data/segment file responds with a `Content-Length`
header with a value of `0` or smaller.
- Removed empty caption lists/languages when parsing Subtitles with `Subtitle.parse()`. This stopped conversions to SRT
containing the `MULTI-LANGUAGE SRT` header when there was multiple caption lists, even though only one of them
actually contained captions.
- Text-based Subtitle formats now try to automatically convert to UTF-8 when run through `Subtitle.parse()`.
- Text-based Subtitle formats now have `&lrm;` and `&rlm;` HTML entities unescaped post-download as some rendering
libraries seems to not decode them for us. SubtitleEdit also has problems with `/ReverseRtlStartEnd` unless it's
already decoded.
- Fixed two concatenation errors surrounding DASH's BaseURL, sourceURL, and media values that start with or use `../`.
- Fixed the number values in the `Newly added to x/y Vaults` log, which now states `Cached n Key(s) to x/y Vaults`.
- File write handler now flushes after appending a new segment to the final save path or checkpoint file, reducing
memory usage by quite a bit in some scenarios.
### New Contributors
- [Shivelight](https://github.com/Shivelight)
## [2.2.0] - 2023-04-23 ## [2.2.0] - 2023-04-23
@ -428,6 +685,8 @@ This release brings a huge change to the fundamentals of Devine's logging, UI, a
Initial public release under the name Devine. Initial public release under the name Devine.
[3.1.0]: https://github.com/devine-dl/devine/releases/tag/v3.1.0
[3.0.0]: https://github.com/devine-dl/devine/releases/tag/v3.0.0
[2.2.0]: https://github.com/devine-dl/devine/releases/tag/v2.2.0 [2.2.0]: https://github.com/devine-dl/devine/releases/tag/v2.2.0
[2.1.0]: https://github.com/devine-dl/devine/releases/tag/v2.1.0 [2.1.0]: https://github.com/devine-dl/devine/releases/tag/v2.1.0
[2.0.1]: https://github.com/devine-dl/devine/releases/tag/v2.0.1 [2.0.1]: https://github.com/devine-dl/devine/releases/tag/v2.0.1

105
CONFIG.md
View File

@ -11,13 +11,12 @@ which does not keep comments.
## aria2c (dict) ## aria2c (dict)
- `max_concurrent_downloads` - `max_concurrent_downloads`
Maximum number of parallel downloads. Default: `5` Maximum number of parallel downloads. Default: `min(32,(cpu_count+4))`
Note: Currently unused as downloads are multi-threaded by Devine rather than Aria2c. Note: Overrides the `max_workers` parameter of the aria2(c) downloader function.
Devine internally has a constant set value of 16 for it's parallel downloads.
- `max_connection_per_server` - `max_connection_per_server`
Maximum number of connections to one server for each download. Default: `1` Maximum number of connections to one server for each download. Default: `1`
- `split` - `split`
Split a file into N chunks and download each chunk on it's own connection. Default: `5` Split a file into N chunks and download each chunk on its own connection. Default: `5`
- `file_allocation` - `file_allocation`
Specify file allocation method. Default: `"prealloc"` Specify file allocation method. Default: `"prealloc"`
@ -67,25 +66,45 @@ DSNP:
default: chromecdm_903_l3 default: chromecdm_903_l3
``` ```
## credentials (dict) ## chapter_fallback_name (str)
Specify login credentials to use for each Service by Profile as Key (case-sensitive). The Chapter Name to use when exporting a Chapter without a Name.
The default is no fallback name at all and no Chapter name will be set.
The value should be `email:password` or `username:password` (with some exceptions). The fallback name can use the following variables in f-string style:
The first section does not have to be an email or username. It may also be a Phone number.
- `{i}`: The Chapter number starting at 1.
E.g., `"Chapter {i}"`: "Chapter 1", "Intro", "Chapter 3".
- `{j}`: A number starting at 1 that increments any time a Chapter has no title.
E.g., `"Chapter {j}"`: "Chapter 1", "Intro", "Chapter 2".
These are formatted with f-strings, directives are supported.
For example, `"Chapter {i:02}"` will result in `"Chapter 01"`.
## credentials (dict[str, str|list|dict])
Specify login credentials to use for each Service, and optionally per-profile.
For example, For example,
```yaml ```yaml
AMZN: ALL4: jane@gmail.com:LoremIpsum100 # directly
AMZN: # or per-profile, optionally with a default
default: jane@example.tld:LoremIpsum99 # <-- used by default if -p/--profile is not used
james: james@gmail.com:TheFriend97 james: james@gmail.com:TheFriend97
jane: jane@example.tld:LoremIpsum99
john: john@example.tld:LoremIpsum98 john: john@example.tld:LoremIpsum98
NF: NF: # the `default` key is not necessary, but no credential will be used by default
john: john@gmail.com:TheGuyWhoPaysForTheNetflix69420 john: john@gmail.com:TheGuyWhoPaysForTheNetflix69420
``` ```
Credentials must be specified per-profile. You cannot specify a fallback or default credential. The value should be in string form, i.e. `john@gmail.com:password123` or `john:password123`.
Any arbitrary values can be used on the left (username/password/phone) and right (password/secret).
You can also specify these in list form, i.e., `["john@gmail.com", ":PasswordWithAColon"]`.
If you specify multiple credentials with keys like the `AMZN` and `NF` example above, then you should
use a `default` key or no credential will be loaded automatically unless you use `-p/--profile`. You
do not have to use a `default` key at all.
Please be aware that this information is sensitive and to keep it safe. Do not share your config. Please be aware that this information is sensitive and to keep it safe. Do not share your config.
## curl_impersonate (dict) ## curl_impersonate (dict)
@ -141,7 +160,7 @@ AMZN:
bitrate: CVBR bitrate: CVBR
``` ```
or to change the output subtitle format from the default (SubRip SRT) to WebVTT, or to change the output subtitle format from the default (original format) to WebVTT,
```yaml ```yaml
sub_format: vtt sub_format: vtt
@ -153,8 +172,8 @@ Choose what software to use to download data throughout Devine where needed.
Options: Options:
- `aria2c` (default) - https://github.com/aria2/aria2 - `requests` (default) - https://github.com/psf/requests
- `requests` - https://github.com/psf/requests - `aria2c` - https://github.com/aria2/aria2
- `curl_impersonate` - https://github.com/yifeikong/curl-impersonate (via https://github.com/yifeikong/curl_cffi) - `curl_impersonate` - https://github.com/yifeikong/curl-impersonate (via https://github.com/yifeikong/curl_cffi)
Note that aria2c can reach the highest speeds as it utilizes threading and more connections than the other Note that aria2c can reach the highest speeds as it utilizes threading and more connections than the other
@ -188,12 +207,28 @@ provide the same Key ID and CEK for both Video and Audio, as well as for multipl
You can have as many Key Vaults as you would like. It's nice to share Key Vaults or use a unified Vault on You can have as many Key Vaults as you would like. It's nice to share Key Vaults or use a unified Vault on
Teams as sharing CEKs immediately can help reduce License calls drastically. Teams as sharing CEKs immediately can help reduce License calls drastically.
Two types of Vaults are in the Core codebase, SQLite and MySQL Vaults. Both directly connect to an SQLite or MySQL Three types of Vaults are in the Core codebase, API, SQLite and MySQL. API makes HTTP requests to a RESTful API,
Server. It has to connect directly to the Host/IP. It cannot be in front of a PHP API or such. Beware that some Hosts whereas SQLite and MySQL directly connect to an SQLite or MySQL Database.
do not let you access the MySQL server outside their intranet (aka Don't port forward or use permissive network
interfaces).
### Connecting to a MySQL Vault Note: SQLite and MySQL vaults have to connect directly to the Host/IP. It cannot be in front of a PHP API or such.
Beware that some Hosting Providers do not let you access the MySQL server outside their intranet and may not be
accessible outside their hosting platform.
### Using an API Vault
API vaults use a specific HTTP request format, therefore API or HTTP Key Vault APIs from other projects or services may
not work in Devine. The API format can be seen in the [API Vault Code](devine/vaults/API.py).
```yaml
- type: API
name: "John#0001's Vault" # arbitrary vault name
uri: "https://key-vault.example.com" # api base uri (can also be an IP or IP:Port)
# uri: "127.0.0.1:80/key-vault"
# uri: "https://api.example.com/key-vault"
token: "random secret key" # authorization token
```
### Using a MySQL Vault
MySQL vaults can be either MySQL or MariaDB servers. I recommend MariaDB. MySQL vaults can be either MySQL or MariaDB servers. I recommend MariaDB.
A MySQL Vault can be on a local or remote network, but I recommend SQLite for local Vaults. A MySQL Vault can be on a local or remote network, but I recommend SQLite for local Vaults.
@ -219,7 +254,7 @@ make tables yourself.
- You may give trusted users CREATE permission so devine can create tables if needed. - You may give trusted users CREATE permission so devine can create tables if needed.
- Other uses should only be given SELECT and INSERT permissions. - Other uses should only be given SELECT and INSERT permissions.
### Connecting to an SQLite Vault ### Using an SQLite Vault
SQLite Vaults are usually only used for locally stored vaults. This vault may be stored on a mounted Cloud storage SQLite Vaults are usually only used for locally stored vaults. This vault may be stored on a mounted Cloud storage
drive, but I recommend using SQLite exclusively as an offline-only vault. Effectively this is your backup vault in drive, but I recommend using SQLite exclusively as an offline-only vault. Effectively this is your backup vault in
@ -244,34 +279,6 @@ together.
- `set_title` - `set_title`
Set the container title to `Show SXXEXX Episode Name` or `Movie (Year)`. Default: `true` Set the container title to `Show SXXEXX Episode Name` or `Movie (Year)`. Default: `true`
## profiles (dict)
Pre-define Profiles to use Per-Service.
For example,
```yaml
AMZN: jane
DSNP: john
```
You can also specify a fallback value to pre-define if a match was not made.
This can be done using `default` key. This can help reduce redundancy in your specifications.
```yaml
AMZN: jane
DSNP: john
default: james
```
If a Service doesn't require a profile (as it does not require Credentials or Authorization of any kind), you can
disable the profile checks by specifying `false` as the profile for the Service.
```yaml
ALL4: false
CTV: false
```
## proxy_providers (dict) ## proxy_providers (dict)
Enable external proxy provider services. Enable external proxy provider services.

View File

@ -2,7 +2,7 @@
<img src="https://user-images.githubusercontent.com/17136956/216880837-478f3ec7-6af6-4cca-8eef-5c98ff02104c.png"> <img src="https://user-images.githubusercontent.com/17136956/216880837-478f3ec7-6af6-4cca-8eef-5c98ff02104c.png">
<a href="https://github.com/devine-dl/devine">Devine</a> <a href="https://github.com/devine-dl/devine">Devine</a>
<br/> <br/>
<sup><em>Open-Source Movie, TV, and Music Downloading Solution</em></sup> <sup><em>Modular Movie, TV, and Music Archival Software</em></sup>
<br/> <br/>
<a href="https://discord.gg/34K2MGDrBN"> <a href="https://discord.gg/34K2MGDrBN">
<img src="https://img.shields.io/discord/841055398240059422?label=&logo=discord&logoColor=ffffff&color=7289DA&labelColor=7289DA" alt="Discord"> <img src="https://img.shields.io/discord/841055398240059422?label=&logo=discord&logoColor=ffffff&color=7289DA&labelColor=7289DA" alt="Discord">
@ -59,19 +59,23 @@ A command-line interface is now available, try `devine --help`.
### Dependencies ### Dependencies
The following is a list of programs that need to be installed manually. I recommend installing these with [winget], The following is a list of programs that need to be installed by you manually.
[chocolatey] or such where possible as it automatically adds them to your `PATH` environment variable and will be
easier to update in the future.
- [aria2(c)] for downloading streams and large manifests.
- [CCExtractor] for extracting Closed Caption data like EIA-608 from video streams and converting as SRT. - [CCExtractor] for extracting Closed Caption data like EIA-608 from video streams and converting as SRT.
- [FFmpeg] (and ffprobe) for repacking/remuxing streams on specific services, and evaluating stream data. - [FFmpeg] (and ffprobe) for repacking/remuxing streams on specific services, and evaluating stream data.
- [MKVToolNix] v54+ for muxing individual streams to an `.mkv` file. - [MKVToolNix] v54+ for muxing individual streams to an `.mkv` file.
- [shaka-packager] for decrypting CENC-CTR and CENC-CBCS video and audio streams. - [shaka-packager] for decrypting CENC-CTR and CENC-CBCS video and audio streams.
- (optional) [aria2(c)] to use as a [downloader](CONFIG.md#downloader-str).
For portable downloads, make sure you put them in your current working directory, in the installation directory, > [!TIP]
or put the directory path in your `PATH` environment variable. If you do not do this then their binaries will not be > You should install these from a Package Repository if you can; including winget/chocolatey on Windows. They will
able to be found. > automatically add the binary's path to your `PATH` environment variable and will be easier to update in the future.
> [!IMPORTANT]
> Most of these dependencies are portable utilities and therefore do not use installers. If you do not install them
> from a package repository like winget/choco/pacman then make sure you put them in your current working directory, in
> Devine's installation directory, or the binary's path into your `PATH` environment variable. If you do not do this
> then Devine will not be able to find the binaries.
[winget]: <https://winget.run> [winget]: <https://winget.run>
[chocolatey]: <https://chocolatey.org> [chocolatey]: <https://chocolatey.org>
@ -248,22 +252,33 @@ sure that the version of devine you have locally is supported by the Service cod
> automatically download. Python importing the files triggers the download to begin. However, it may cause a delay on > automatically download. Python importing the files triggers the download to begin. However, it may cause a delay on
> startup. > startup.
## Profiles (Cookies & Credentials) ## Cookies & Credentials
Just like a streaming service, devine associates both a cookie and/or credential as a Profile. You can associate up to Devine can authenticate with Services using Cookies and/or Credentials. Credentials are stored in the config, and
one cookie and one credential per-profile, depending on which (or both) are needed by the Service. This system allows Cookies are stored in the data directory which can be found by running `devine env info`.
you to configure multiple accounts per-service and choose which to use at any time.
Credentials are stored in the config, and Cookies are stored in the data directory. You can find the location of these To add a Credential to a Service, take a look at the [Credentials Config](CONFIG.md#credentials-dictstr-strlistdict)
by running `devine env info`. However, you can manage profiles with `devine auth --help`. E.g. to add a new John for information on setting up one or more credentials per-service. You can add one or more Credential per-service and
profile to Netflix with a Cookie and Credential, take a look at the following CLI call, use `-p/--profile` to choose which Credential to use.
`devine auth add John NF --cookie "C:\Users\John\Downloads\netflix.com.txt --credential "john@gmail.com:pass123"`
You can also delete a credential with `devine auth delete`. E.g., to delete the cookie for John that we just added, run To add a Cookie to a Service, use a Cookie file extension to make a `cookies.txt` file and move it into the Cookies
`devine auth delete John --cookie`. Take a look at `devine auth delete --help` for more information. directory. You must rename the `cookies.txt` file to that of the Service tag (case-sensitive), e.g., `NF.txt`. You can
also place it in a Service Cookie folder, e.g., `/Cookies/NF/default.txt` or `/Cookies/NF/.txt`.
> __Note__ Profile names are case-sensitive and unique per-service. They also have no arbitrary character or length You can add multiple Cookies to the `/Cookies/NF/` folder with their own unique name and then use `-p/--profile` to
> limit, but for convenience I don't recommend using any special characters as your terminal may get confused. choose which one to use. E.g., `/Cookies/NF/sam.txt` and then use it with `--profile sam`. If you make a Service Cookie
folder without a `.txt` or `default.txt`, but with another file, then no Cookies will be loaded unless you use
`-p/--profile` like shown. This allows you to opt in to authentication at whim.
> [!TIP]
> - If your Service does not require Authentication, then do not define any Credential or Cookie for that Service.
> - You can use both Cookies and Credentials at the same time, so long as your Service takes and uses both.
> - If you are using profiles, then make sure you use the same name on the Credential name and Cookie file name when
> using `-p/--profile`.
> [!WARNING]
> Profile names are case-sensitive and unique per-service. They have no arbitrary character or length limit, but for
> convenience sake I don't recommend using any special characters as your terminal may get confused.
### Cookie file format and Extensions ### Cookie file format and Extensions
@ -334,4 +349,4 @@ You can find a copy of the license in the LICENSE file in the root folder.
* * * * * *
© rlaphoenix 2019-2023 © rlaphoenix 2019-2024

71
cliff.toml Normal file
View File

@ -0,0 +1,71 @@
# git-cliff ~ default configuration file
# https://git-cliff.org/docs/configuration
[changelog]
header = """
# Changelog\n
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Versions [3.0.0] and older use a format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
but versions thereafter use a custom changelog format using [git-cliff](https://git-cliff.org).\n
"""
body = """
{% if version -%}
## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
{% else -%}
## [Unreleased]
{% endif -%}
{% for group, commits in commits | group_by(attribute="group") %}
### {{ group | striptags | trim | upper_first }}
{% for commit in commits %}
- {% if commit.scope %}*{{ commit.scope }}*: {% endif %}\
{% if commit.breaking %}[**breaking**] {% endif %}\
{{ commit.message | upper_first }}\
{% endfor %}
{% endfor %}\n
"""
footer = """
{% for release in releases -%}
{% if release.version -%}
{% if release.previous.version -%}
[{{ release.version | trim_start_matches(pat="v") }}]: \
https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}\
/compare/{{ release.previous.version }}..{{ release.version }}
{% endif -%}
{% else -%}
[unreleased]: https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}\
/compare/{{ release.previous.version }}..HEAD
{% endif -%}
{% endfor %}
"""
trim = true
postprocessors = [
# { pattern = '<REPO>', replace = "https://github.com/orhun/git-cliff" }, # replace repository URL
]
[git]
conventional_commits = true
filter_unconventional = true
split_commits = false
commit_preprocessors = []
commit_parsers = [
{ message = "^feat", group = "<!-- 0 -->Features" },
{ message = "^fix|revert", group = "<!-- 1 -->Bug Fixes" },
{ message = "^docs", group = "<!-- 2 -->Documentation" },
{ message = "^style", skip = true },
{ message = "^refactor", group = "<!-- 3 -->Changes" },
{ message = "^perf", group = "<!-- 4 -->Performance Improvements" },
{ message = "^test", skip = true },
{ message = "^build", group = "<!-- 5 -->Builds" },
{ message = "^ci", skip = true },
{ message = "^chore", skip = true },
]
protect_breaking_commits = false
filter_commits = false
# tag_pattern = "v[0-9].*"
# skip_tags = ""
# ignore_tags = ""
topo_order = false
sort_commits = "oldest"

View File

@ -1,266 +0,0 @@
import logging
import shutil
import sys
import tkinter.filedialog
from collections import defaultdict
from pathlib import Path
from typing import Optional
import click
from ruamel.yaml import YAML
from devine.core.config import Config, config
from devine.core.constants import context_settings
from devine.core.credential import Credential
@click.group(
short_help="Manage cookies and credentials for profiles of services.",
context_settings=context_settings)
@click.pass_context
def auth(ctx: click.Context) -> None:
"""Manage cookies and credentials for profiles of services."""
ctx.obj = logging.getLogger("auth")
@auth.command(
name="list",
short_help="List profiles and their state for a service or all services.",
context_settings=context_settings)
@click.argument("service", type=str, required=False)
@click.pass_context
def list_(ctx: click.Context, service: Optional[str] = None) -> None:
"""
List profiles and their state for a service or all services.
\b
Profile and Service names are case-insensitive.
"""
log = ctx.obj
service_f = service
auth_data: dict[str, dict[str, list]] = defaultdict(lambda: defaultdict(list))
if config.directories.cookies.exists():
for cookie_dir in config.directories.cookies.iterdir():
service = cookie_dir.name
for cookie in cookie_dir.glob("*.txt"):
if cookie.stem not in auth_data[service]:
auth_data[service][cookie.stem].append("Cookie")
for service, credentials in config.credentials.items():
for profile in credentials:
auth_data[service][profile].append("Credential")
for service, profiles in dict(sorted(auth_data.items())).items(): # type:ignore
if service_f and service != service_f.upper():
continue
log.info(service)
for profile, authorizations in dict(sorted(profiles.items())).items():
log.info(f' "{profile}": {", ".join(authorizations)}')
@auth.command(
short_help="View profile cookies and credentials for a service.",
context_settings=context_settings)
@click.argument("profile", type=str)
@click.argument("service", type=str)
@click.pass_context
def view(ctx: click.Context, profile: str, service: str) -> None:
"""
View profile cookies and credentials for a service.
\b
Profile and Service names are case-sensitive.
"""
log = ctx.obj
service_f = service
profile_f = profile
found = False
for cookie_dir in config.directories.cookies.iterdir():
if cookie_dir.name == service_f:
for cookie in cookie_dir.glob("*.txt"):
if cookie.stem == profile_f:
log.info(f"Cookie: {cookie}")
log.debug(cookie.read_text(encoding="utf8").strip())
found = True
break
for service, credentials in config.credentials.items():
if service == service_f:
for profile, credential in credentials.items():
if profile == profile_f:
log.info(f"Credential: {':'.join(list(credential))}")
found = True
break
if not found:
raise click.ClickException(
f"Could not find Profile '{profile_f}' for Service '{service_f}'."
f"\nThe profile and service values are case-sensitive."
)
@auth.command(
short_help="Check what profile is used by services.",
context_settings=context_settings)
@click.argument("service", type=str, required=False)
@click.pass_context
def status(ctx: click.Context, service: Optional[str] = None) -> None:
"""
Check what profile is used by services.
\b
Service names are case-sensitive.
"""
log = ctx.obj
found_profile = False
for service_, profile in config.profiles.items():
if not service or service_.upper() == service.upper():
log.info(f"{service_}: {profile or '--'}")
found_profile = True
if not found_profile:
log.info(f"No profile has been explicitly set for {service}")
default = config.profiles.get("default", "not set")
log.info(f"The default profile is {default}")
@auth.command(
short_help="Delete a profile and all of its authorization from a service.",
context_settings=context_settings)
@click.argument("profile", type=str)
@click.argument("service", type=str)
@click.option("--cookie", is_flag=True, default=False, help="Only delete the cookie.")
@click.option("--credential", is_flag=True, default=False, help="Only delete the credential.")
@click.pass_context
def delete(ctx: click.Context, profile: str, service: str, cookie: bool, credential: bool):
"""
Delete a profile and all of its authorization from a service.
\b
By default this does remove both Cookies and Credentials.
You may remove only one of them with --cookie or --credential.
\b
Profile and Service names are case-sensitive.
Comments may be removed from config!
"""
log = ctx.obj
service_f = service
profile_f = profile
found = False
if not credential:
for cookie_dir in config.directories.cookies.iterdir():
if cookie_dir.name == service_f:
for cookie_ in cookie_dir.glob("*.txt"):
if cookie_.stem == profile_f:
cookie_.unlink()
log.info(f"Deleted Cookie: {cookie_}")
found = True
break
if not cookie:
for key, credentials in config.credentials.items():
if key == service_f:
for profile, credential_ in credentials.items():
if profile == profile_f:
config_path = Config._Directories.user_configs / Config._Filenames.root_config
yaml, data = YAML(), None
yaml.default_flow_style = False
data = yaml.load(config_path)
del data["credentials"][key][profile_f]
yaml.dump(data, config_path)
log.info(f"Deleted Credential: {credential_}")
found = True
break
if not found:
raise click.ClickException(
f"Could not find Profile '{profile_f}' for Service '{service_f}'."
f"\nThe profile and service values are case-sensitive."
)
@auth.command(
short_help="Add a Credential and/or Cookies to an existing or new profile for a service.",
context_settings=context_settings)
@click.argument("profile", type=str)
@click.argument("service", type=str)
@click.option("--cookie", type=str, default=None, help="Direct path to Cookies to add.")
@click.option("--credential", type=str, default=None, help="Direct Credential string to add.")
@click.pass_context
def add(ctx: click.Context, profile: str, service: str, cookie: Optional[str] = None, credential: Optional[str] = None):
"""
Add a Credential and/or Cookies to an existing or new profile for a service.
\b
Cancel the Open File dialogue when presented if you do not wish to provide
cookies. The Credential should be in `Username:Password` form. The username
may be an email. If you do not wish to add a Credential, just hit enter.
\b
Profile and Service names are case-sensitive!
Comments may be removed from config!
"""
log = ctx.obj
service = service.upper()
profile = profile.lower()
if cookie:
cookie = Path(cookie)
if not cookie.is_file():
log.error(f"No such file or directory: {cookie}.")
sys.exit(1)
else:
print("Opening File Dialogue, select a Cookie file to import.")
cookie = tkinter.filedialog.askopenfilename(
title="Select a Cookie file (Cancel to skip)",
filetypes=[("Cookies", "*.txt"), ("All files", "*.*")]
)
if cookie:
cookie = Path(cookie)
else:
log.info("Skipped adding a Cookie...")
if credential:
try:
credential = Credential.loads(credential)
except ValueError as e:
raise click.ClickException(str(e))
else:
credential = input("Credential: ")
if credential:
try:
credential = Credential.loads(credential)
except ValueError as e:
raise click.ClickException(str(e))
else:
log.info("Skipped adding a Credential...")
if cookie:
final_path = (config.directories.cookies / service / profile).with_suffix(".txt")
final_path.parent.mkdir(parents=True, exist_ok=True)
if final_path.exists():
log.error(f"A Cookie file for the Profile {profile} on {service} already exists.")
sys.exit(1)
shutil.move(cookie, final_path)
log.info(f"Moved Cookie file to: {final_path}")
if credential:
config_path = Config._Directories.user_configs / Config._Filenames.root_config
yaml, data = YAML(), None
yaml.default_flow_style = False
data = yaml.load(config_path)
if not data:
data = {}
if "credentials" not in data:
data["credentials"] = {}
if service not in data["credentials"]:
data["credentials"][service] = {}
data["credentials"][service][profile] = credential.dumps()
yaml.dump(data, config_path)
log.info(f"Added Credential: {credential}")

View File

@ -13,8 +13,8 @@ from concurrent import futures
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
from copy import deepcopy from copy import deepcopy
from functools import partial from functools import partial
from http.cookiejar import MozillaCookieJar from http.cookiejar import CookieJar, MozillaCookieJar
from itertools import zip_longest from itertools import product
from pathlib import Path from pathlib import Path
from threading import Lock from threading import Lock
from typing import Any, Callable, Optional from typing import Any, Callable, Optional
@ -32,7 +32,7 @@ from rich.console import Group
from rich.live import Live from rich.live import Live
from rich.padding import Padding from rich.padding import Padding
from rich.panel import Panel from rich.panel import Panel
from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeRemainingColumn from rich.progress import BarColumn, Progress, SpinnerColumn, TaskID, TextColumn, TimeRemainingColumn
from rich.rule import Rule from rich.rule import Rule
from rich.table import Table from rich.table import Table
from rich.text import Text from rich.text import Text
@ -40,26 +40,24 @@ from rich.tree import Tree
from devine.core.config import config from devine.core.config import config
from devine.core.console import console from devine.core.console import console
from devine.core.constants import DOWNLOAD_CANCELLED, AnyTrack, context_settings from devine.core.constants import DOWNLOAD_LICENCE_ONLY, AnyTrack, context_settings
from devine.core.credential import Credential from devine.core.credential import Credential
from devine.core.downloaders import downloader
from devine.core.drm import DRM_T, Widevine from devine.core.drm import DRM_T, Widevine
from devine.core.manifests import DASH, HLS
from devine.core.proxies import Basic, Hola, NordVPN from devine.core.proxies import Basic, Hola, NordVPN
from devine.core.service import Service from devine.core.service import Service
from devine.core.services import Services from devine.core.services import Services
from devine.core.titles import Movie, Song, Title_T from devine.core.titles import Movie, Song, Title_T
from devine.core.titles.episode import Episode from devine.core.titles.episode import Episode
from devine.core.tracks import Audio, Subtitle, Tracks, Video from devine.core.tracks import Audio, Subtitle, Tracks, Video
from devine.core.utilities import get_binary_path, is_close_match, time_elapsed_since, try_ensure_utf8 from devine.core.utilities import get_binary_path, is_close_match, time_elapsed_since
from devine.core.utils.click_types import LANGUAGE_RANGE, QUALITY_LIST, SEASON_RANGE, ContextData from devine.core.utils.click_types import LANGUAGE_RANGE, QUALITY_LIST, SEASON_RANGE, ContextData, MultipleChoice
from devine.core.utils.collections import merge_dict from devine.core.utils.collections import merge_dict
from devine.core.utils.subprocess import ffprobe from devine.core.utils.subprocess import ffprobe
from devine.core.vaults import Vaults from devine.core.vaults import Vaults
class dl: class dl:
@click.group( @click.command(
short_help="Download, Decrypt, and Mux tracks for titles from a Service.", short_help="Download, Decrypt, and Mux tracks for titles from a Service.",
cls=Services, cls=Services,
context_settings=dict( context_settings=dict(
@ -68,12 +66,12 @@ class dl:
token_normalize_func=Services.get_tag token_normalize_func=Services.get_tag
)) ))
@click.option("-p", "--profile", type=str, default=None, @click.option("-p", "--profile", type=str, default=None,
help="Profile to use for Credentials and Cookies (if available). Overrides profile set by config.") help="Profile to use for Credentials and Cookies (if available).")
@click.option("-q", "--quality", type=QUALITY_LIST, default=[], @click.option("-q", "--quality", type=QUALITY_LIST, default=[],
help="Download Resolution(s), defaults to the best available resolution.") help="Download Resolution(s), defaults to the best available resolution.")
@click.option("-v", "--vcodec", type=click.Choice(Video.Codec, case_sensitive=False), @click.option("-v", "--vcodec", type=click.Choice(Video.Codec, case_sensitive=False),
default=Video.Codec.AVC, default=None,
help="Video Codec to download, defaults to H.264.") help="Video Codec to download, defaults to any codec.")
@click.option("-a", "--acodec", type=click.Choice(Audio.Codec, case_sensitive=False), @click.option("-a", "--acodec", type=click.Choice(Audio.Codec, case_sensitive=False),
default=None, default=None,
help="Audio Codec to download, defaults to any codec.") help="Audio Codec to download, defaults to any codec.")
@ -83,9 +81,9 @@ class dl:
@click.option("-ab", "--abitrate", type=int, @click.option("-ab", "--abitrate", type=int,
default=None, default=None,
help="Audio Bitrate to download (in kbps), defaults to highest available.") help="Audio Bitrate to download (in kbps), defaults to highest available.")
@click.option("-r", "--range", "range_", type=click.Choice(Video.Range, case_sensitive=False), @click.option("-r", "--range", "range_", type=MultipleChoice(Video.Range, case_sensitive=False),
default=Video.Range.SDR, default=[Video.Range.SDR],
help="Video Color Range, defaults to SDR.") help="Video Color Range(s) to download, defaults to SDR.")
@click.option("-c", "--channels", type=float, @click.option("-c", "--channels", type=float,
default=None, default=None,
help="Audio Channel(s) to download. Matches sub-channel layouts like 5.1 with 6.0 implicitly.") help="Audio Channel(s) to download. Matches sub-channel layouts like 5.1 with 6.0 implicitly.")
@ -99,10 +97,10 @@ class dl:
help="Language wanted for Subtitles.") help="Language wanted for Subtitles.")
@click.option("--proxy", type=str, default=None, @click.option("--proxy", type=str, default=None,
help="Proxy URI to use. If a 2-letter country is provided, it will try get a proxy from the config.") help="Proxy URI to use. If a 2-letter country is provided, it will try get a proxy from the config.")
@click.option("--group", type=str, default=None, @click.option("--tag", type=str, default=None,
help="Set the Group Tag to be used, overriding the one in config if any.") help="Set the Group Tag to be used, overriding the one in config if any.")
@click.option("--sub-format", type=click.Choice(Subtitle.Codec, case_sensitive=False), @click.option("--sub-format", type=click.Choice(Subtitle.Codec, case_sensitive=False),
default=Subtitle.Codec.SubRip, default=None,
help="Set Output Subtitle Format, only converting if necessary.") help="Set Output Subtitle Format, only converting if necessary.")
@click.option("-V", "--video-only", is_flag=True, default=False, @click.option("-V", "--video-only", is_flag=True, default=False,
help="Only download video tracks.") help="Only download video tracks.")
@ -119,6 +117,8 @@ class dl:
help="Skip downloading and list available tracks and what tracks would have been downloaded.") help="Skip downloading and list available tracks and what tracks would have been downloaded.")
@click.option("--list-titles", is_flag=True, default=False, @click.option("--list-titles", is_flag=True, default=False,
help="Skip downloading, only list available titles that would have been downloaded.") help="Skip downloading, only list available titles that would have been downloaded.")
@click.option("--skip-dl", is_flag=True, default=False,
help="Skip downloading while still retrieving the decryption keys.")
@click.option("--export", type=Path, @click.option("--export", type=Path,
help="Export Decryption Keys as you obtain them to a JSON file.") help="Export Decryption Keys as you obtain them to a JSON file.")
@click.option("--cdm-only/--vaults-only", is_flag=True, default=None, @click.option("--cdm-only/--vaults-only", is_flag=True, default=None,
@ -143,7 +143,7 @@ class dl:
no_proxy: bool, no_proxy: bool,
profile: Optional[str] = None, profile: Optional[str] = None,
proxy: Optional[str] = None, proxy: Optional[str] = None,
group: Optional[str] = None, tag: Optional[str] = None,
*_: Any, *_: Any,
**__: Any **__: Any
): ):
@ -153,17 +153,14 @@ class dl:
self.log = logging.getLogger("download") self.log = logging.getLogger("download")
self.service = Services.get_tag(ctx.invoked_subcommand) self.service = Services.get_tag(ctx.invoked_subcommand)
with console.status("Preparing Service and Profile Authentication...", spinner="dots"):
if profile:
self.profile = profile self.profile = profile
self.log.info(f"Profile: '{self.profile}' from the --profile argument")
else:
self.profile = self.get_profile(self.service)
self.log.info(f"Profile: '{self.profile}' from the config")
if self.profile:
self.log.info(f"Using profile: '{self.profile}'")
with console.status("Loading Service Config...", spinner="dots"):
service_config_path = Services.get_path(self.service) / config.filenames.config service_config_path = Services.get_path(self.service) / config.filenames.config
if service_config_path.is_file(): if service_config_path.exists():
self.service_config = yaml.safe_load(service_config_path.read_text(encoding="utf8")) self.service_config = yaml.safe_load(service_config_path.read_text(encoding="utf8"))
self.log.info("Service Config loaded") self.log.info("Service Config loaded")
else: else:
@ -242,8 +239,8 @@ class dl:
profile=self.profile profile=self.profile
) )
if group: if tag:
config.tag = group config.tag = tag
# needs to be added this way instead of @cli.result_callback to be # needs to be added this way instead of @cli.result_callback to be
# able to keep `self` as the first positional # able to keep `self` as the first positional
@ -253,23 +250,24 @@ class dl:
self, self,
service: Service, service: Service,
quality: list[int], quality: list[int],
vcodec: Video.Codec, vcodec: Optional[Video.Codec],
acodec: Optional[Audio.Codec], acodec: Optional[Audio.Codec],
vbitrate: int, vbitrate: int,
abitrate: int, abitrate: int,
range_: Video.Range, range_: list[Video.Range],
channels: float, channels: float,
wanted: list[str], wanted: list[str],
lang: list[str], lang: list[str],
v_lang: list[str], v_lang: list[str],
s_lang: list[str], s_lang: list[str],
sub_format: Subtitle.Codec, sub_format: Optional[Subtitle.Codec],
video_only: bool, video_only: bool,
audio_only: bool, audio_only: bool,
subs_only: bool, subs_only: bool,
chapters_only: bool, chapters_only: bool,
slow: bool, list_: bool, slow: bool, list_: bool,
list_titles: bool, list_titles: bool,
skip_dl: bool,
export: Optional[Path], export: Optional[Path],
cdm_only: Optional[bool], cdm_only: Optional[bool],
no_proxy: bool, no_proxy: bool,
@ -286,14 +284,11 @@ class dl:
else: else:
vaults_only = not cdm_only vaults_only = not cdm_only
if self.profile:
with console.status("Authenticating with Service...", spinner="dots"): with console.status("Authenticating with Service...", spinner="dots"):
cookies = self.get_cookie_jar(self.service, self.profile) cookies = self.get_cookie_jar(self.service, self.profile)
credential = self.get_credentials(self.service, self.profile) credential = self.get_credentials(self.service, self.profile)
if not cookies and not credential:
self.log.error(f"The Profile '{self.profile}' has no Cookies or Credentials, Check for typos")
sys.exit(1)
service.authenticate(cookies, credential) service.authenticate(cookies, credential)
if cookies or credential:
self.log.info("Authenticated with Service") self.log.info("Authenticated with Service")
with console.status("Fetching Title Metadata...", spinner="dots"): with console.status("Fetching Title Metadata...", spinner="dots"):
@ -330,7 +325,7 @@ class dl:
with console.status("Getting tracks...", spinner="dots"): with console.status("Getting tracks...", spinner="dots"):
title.tracks.add(service.get_tracks(title), warn_only=True) title.tracks.add(service.get_tracks(title), warn_only=True)
title.tracks.add(service.get_chapters(title)) title.tracks.chapters = service.get_chapters(title)
# strip SDH subs to non-SDH if no equivalent same-lang non-SDH is available # strip SDH subs to non-SDH if no equivalent same-lang non-SDH is available
# uses a loose check, e.g, wont strip en-US SDH sub if a non-SDH en-GB is available # uses a loose check, e.g, wont strip en-US SDH sub if a non-SDH en-GB is available
@ -343,14 +338,13 @@ class dl:
non_sdh_sub = deepcopy(subtitle) non_sdh_sub = deepcopy(subtitle)
non_sdh_sub.id += "_stripped" non_sdh_sub.id += "_stripped"
non_sdh_sub.sdh = False non_sdh_sub.sdh = False
non_sdh_sub.OnMultiplex = lambda x: x.strip_hearing_impaired() non_sdh_sub.OnMultiplex = lambda: non_sdh_sub.strip_hearing_impaired()
title.tracks.add(non_sdh_sub) title.tracks.add(non_sdh_sub)
with console.status("Sorting tracks by language and bitrate...", spinner="dots"): with console.status("Sorting tracks by language and bitrate...", spinner="dots"):
title.tracks.sort_videos(by_language=v_lang or lang) title.tracks.sort_videos(by_language=v_lang or lang)
title.tracks.sort_audio(by_language=lang) title.tracks.sort_audio(by_language=lang)
title.tracks.sort_subtitles(by_language=s_lang) title.tracks.sort_subtitles(by_language=s_lang)
title.tracks.sort_chapters()
if list_: if list_:
available_tracks, _ = title.tracks.tree() available_tracks, _ = title.tracks.tree()
@ -363,14 +357,17 @@ class dl:
with console.status("Selecting tracks...", spinner="dots"): with console.status("Selecting tracks...", spinner="dots"):
if isinstance(title, (Movie, Episode)): if isinstance(title, (Movie, Episode)):
# filter video tracks # filter video tracks
if vcodec:
title.tracks.select_video(lambda x: x.codec == vcodec) title.tracks.select_video(lambda x: x.codec == vcodec)
if not title.tracks.videos: if not title.tracks.videos:
self.log.error(f"There's no {vcodec.name} Video Track...") self.log.error(f"There's no {vcodec.name} Video Track...")
sys.exit(1) sys.exit(1)
title.tracks.select_video(lambda x: x.range == range_) if range_:
if not title.tracks.videos: title.tracks.select_video(lambda x: x.range in range_)
self.log.error(f"There's no {range_.name} Video Track...") for color_range in range_:
if not any(x.range == color_range for x in title.tracks.videos):
self.log.error(f"There's no {color_range.name} Video Tracks...")
sys.exit(1) sys.exit(1)
if vbitrate: if vbitrate:
@ -387,7 +384,7 @@ class dl:
sys.exit(1) sys.exit(1)
if quality: if quality:
title.tracks.by_resolutions(quality, per_resolution=1) title.tracks.by_resolutions(quality)
missing_resolutions = [] missing_resolutions = []
for resolution in quality: for resolution in quality:
if any(video.height == resolution for video in title.tracks.videos): if any(video.height == resolution for video in title.tracks.videos):
@ -403,8 +400,27 @@ class dl:
plural = "s" if len(missing_resolutions) > 1 else "" plural = "s" if len(missing_resolutions) > 1 else ""
self.log.error(f"There's no {res_list} Video Track{plural}...") self.log.error(f"There's no {res_list} Video Track{plural}...")
sys.exit(1) sys.exit(1)
else:
title.tracks.videos = [title.tracks.videos[0]] # choose best track by range and quality
title.tracks.videos = [
track
for resolution, color_range in product(
quality or [None],
range_ or [None]
)
for track in [next(
t
for t in title.tracks.videos
if (not resolution and not color_range) or
(
(not resolution or (
(t.height == resolution) or
(int(t.width * (9 / 16)) == resolution)
))
and (not color_range or t.range == color_range)
)
)]
]
# filter subtitle tracks # filter subtitle tracks
if s_lang and "all" not in s_lang: if s_lang and "all" not in s_lang:
@ -459,6 +475,9 @@ class dl:
dl_start_time = time.time() dl_start_time = time.time()
if skip_dl:
DOWNLOAD_LICENCE_ONLY.set()
try: try:
with Live( with Live(
Padding( Padding(
@ -471,9 +490,8 @@ class dl:
with ThreadPoolExecutor(workers) as pool: with ThreadPoolExecutor(workers) as pool:
for download in futures.as_completed(( for download in futures.as_completed((
pool.submit( pool.submit(
self.download_track, track.download,
service=service, session=service.session,
track=track,
prepare_drm=partial( prepare_drm=partial(
partial( partial(
self.prepare_drm, self.prepare_drm,
@ -523,6 +541,9 @@ class dl:
)) ))
return return
if skip_dl:
console.log("Skipped downloads as --skip-dl was used...")
else:
dl_time = time_elapsed_since(dl_start_time) dl_time = time_elapsed_since(dl_start_time)
console.print(Padding( console.print(Padding(
f"Track downloads finished in [progress.elapsed]{dl_time}[/]", f"Track downloads finished in [progress.elapsed]{dl_time}[/]",
@ -570,6 +591,7 @@ class dl:
break break
video_track_n += 1 video_track_n += 1
if sub_format:
with console.status(f"Converting Subtitles to {sub_format.name}..."): with console.status(f"Converting Subtitles to {sub_format.name}..."):
for subtitle in title.tracks.subtitles: for subtitle in title.tracks.subtitles:
if subtitle.codec != sub_format: if subtitle.codec != sub_format:
@ -582,7 +604,7 @@ class dl:
track.repackage() track.repackage()
has_repacked = True has_repacked = True
if callable(track.OnRepacked): if callable(track.OnRepacked):
track.OnRepacked(track) track.OnRepacked()
if has_repacked: if has_repacked:
# we don't want to fill up the log with "Repacked x track" # we don't want to fill up the log with "Repacked x track"
self.log.info("Repacked one or more tracks with FFMPEG") self.log.info("Repacked one or more tracks with FFMPEG")
@ -598,95 +620,33 @@ class dl:
TimeRemainingColumn(compact=True, elapsed_when_finished=True), TimeRemainingColumn(compact=True, elapsed_when_finished=True),
console=console console=console
) )
multi_jobs = len(title.tracks.videos) > 1
tasks = [
progress.add_task(
f"Multiplexing{f' {x.height}p' if multi_jobs else ''}...",
total=None,
start=False
)
if cc:
# will not appear in track listings as it's added after all times it lists
title.tracks.add(cc)
self.log.info(f"Extracted a Closed Caption from Video track {video_track_n + 1}")
else:
self.log.info(f"No Closed Captions were found in Video track {video_track_n + 1}")
except EnvironmentError:
self.log.error(
"Cannot extract Closed Captions as the ccextractor executable was not found..."
)
break
video_track_n += 1
with console.status(f"Converting Subtitles to {sub_format.name}..."): multiplex_tasks: list[tuple[TaskID, Tracks]] = []
for subtitle in title.tracks.subtitles: for video_track in title.tracks.videos or [None]:
if subtitle.codec != sub_format: task_description = "Multiplexing"
writer = { if video_track:
Subtitle.Codec.SubRip: pycaption.SRTWriter, if len(quality) > 1:
Subtitle.Codec.SubStationAlpha: None, task_description += f" {video_track.height}p"
Subtitle.Codec.SubStationAlphav4: None, if len(range_) > 1:
Subtitle.Codec.TimedTextMarkupLang: pycaption.DFXPWriter, task_description += f" {video_track.range.name}"
Subtitle.Codec.WebVTT: pycaption.WebVTTWriter,
# MPEG-DASH box-encapsulated subtitle formats
Subtitle.Codec.fTTML: None,
Subtitle.Codec.fVTT: None,
}[sub_format]
if writer is None:
self.log.error(f"Cannot yet convert {subtitle.codec} to {sub_format.name}...")
sys.exit(1)
caption_set = subtitle.parse(subtitle.path.read_bytes(), subtitle.codec) task_id = progress.add_task(f"{task_description}...", total=None, start=False)
subtitle.merge_same_cues(caption_set)
subtitle_text = writer().write(caption_set) task_tracks = Tracks(title.tracks) + title.tracks.chapters
subtitle.path.write_text(subtitle_text, encoding="utf8") if video_track:
task_tracks.videos = [video_track]
subtitle.codec = sub_format multiplex_tasks.append((task_id, task_tracks))
subtitle.move(subtitle.path.with_suffix(f".{sub_format.value.lower()}"))
with console.status("Repackaging tracks with FFMPEG..."):
has_repacked = False
for track in title.tracks:
if track.needs_repack:
track.repackage()
has_repacked = True
if callable(track.OnRepacked):
track.OnRepacked(track)
if has_repacked:
# we don't want to fill up the log with "Repacked x track"
self.log.info("Repacked one or more tracks with FFMPEG")
muxed_paths = []
if isinstance(title, (Movie, Episode)):
progress = Progress(
TextColumn("[progress.description]{task.description}"),
SpinnerColumn(finished_text=""),
BarColumn(),
"",
TimeRemainingColumn(compact=True, elapsed_when_finished=True),
console=console
)
multi_jobs = len(title.tracks.videos) > 1
tasks = [
progress.add_task(
f"Multiplexing{f' {x.height}p' if multi_jobs else ''}...",
total=None,
start=False
)
for x in title.tracks.videos or [None]
]
with Live( with Live(
Padding(progress, (0, 5, 1, 5)), Padding(progress, (0, 5, 1, 5)),
console=console console=console
): ):
for task, video_track in zip_longest(tasks, title.tracks.videos, fillvalue=None): for task_id, task_tracks in multiplex_tasks:
if video_track: progress.start_task(task_id) # TODO: Needed?
title.tracks.videos = [video_track] muxed_path, return_code = task_tracks.mux(
progress.start_task(task) # TODO: Needed?
muxed_path, return_code = title.tracks.mux(
str(title), str(title),
progress=partial(progress.update, task_id=task), progress=partial(progress.update, task_id=task_id),
delete=False delete=False
) )
muxed_paths.append(muxed_path) muxed_paths.append(muxed_path)
@ -695,7 +655,7 @@ class dl:
elif return_code >= 2: elif return_code >= 2:
self.log.error(f"Failed to Mux video to Matroska file ({return_code})") self.log.error(f"Failed to Mux video to Matroska file ({return_code})")
sys.exit(1) sys.exit(1)
if video_track: for video_track in task_tracks.videos:
video_track.delete() video_track.delete()
for track in title.tracks: for track in title.tracks:
track.delete() track.delete()
@ -723,13 +683,9 @@ class dl:
)) ))
# update cookies # update cookies
cookie_file = config.directories.cookies / service.__class__.__name__ / f"{self.profile}.txt" cookie_file = self.get_cookie_path(self.service, self.profile)
if cookie_file.exists(): if cookie_file:
cookie_jar = MozillaCookieJar(cookie_file) self.save_cookies(cookie_file, service.session.cookies)
cookie_jar.load()
for cookie in service.session.cookies:
cookie_jar.set_cookie(cookie)
cookie_jar.save(ignore_discard=True)
dl_time = time_elapsed_since(start_time) dl_time = time_elapsed_since(start_time)
@ -830,8 +786,11 @@ class dl:
# So we re-add the keys from vaults earlier overwriting blanks or removed KIDs data. # So we re-add the keys from vaults earlier overwriting blanks or removed KIDs data.
drm.content_keys.update(from_vaults) drm.content_keys.update(from_vaults)
cached_keys = self.vaults.add_keys(drm.content_keys) successful_caches = self.vaults.add_keys(drm.content_keys)
self.log.info(f" + Newly added to {cached_keys}/{len(drm.content_keys)} Vaults") self.log.info(
f"Cached {len(drm.content_keys)} Key{'' if len(drm.content_keys) == 1 else 's'} to "
f"{successful_caches}/{len(self.vaults)} Vaults"
)
break # licensing twice will be unnecessary break # licensing twice will be unnecessary
if track_kid and track_kid not in drm.content_keys: if track_kid and track_kid not in drm.content_keys:
@ -856,159 +815,25 @@ class dl:
keys[str(title)][str(track)].update(drm.content_keys) keys[str(title)][str(track)].update(drm.content_keys)
export.write_text(jsonpickle.dumps(keys, indent=4), encoding="utf8") export.write_text(jsonpickle.dumps(keys, indent=4), encoding="utf8")
def download_track( @staticmethod
self, def get_cookie_path(service: str, profile: Optional[str]) -> Optional[Path]:
service: Service, """Get Service Cookie File Path for Profile."""
track: AnyTrack, direct_cookie_file = config.directories.cookies / f"{service}.txt"
prepare_drm: Callable, profile_cookie_file = config.directories.cookies / service / f"{profile}.txt"
progress: partial default_cookie_file = config.directories.cookies / service / "default.txt"
):
if DOWNLOAD_CANCELLED.is_set():
progress(downloaded="[yellow]CANCELLED")
return
proxy = next(iter(service.session.proxies.values()), None) if direct_cookie_file.exists():
return direct_cookie_file
save_path = config.directories.temp / f"{track.__class__.__name__}_{track.id}.mp4" elif profile_cookie_file.exists():
if isinstance(track, Subtitle): return profile_cookie_file
save_path = save_path.with_suffix(f".{track.codec.extension}") elif default_cookie_file.exists():
return default_cookie_file
if track.descriptor != track.Descriptor.URL:
save_dir = save_path.with_name(save_path.name + "_segments")
else:
save_dir = save_path.parent
def cleanup():
# track file (e.g., "foo.mp4")
save_path.unlink(missing_ok=True)
# aria2c control file (e.g., "foo.mp4.aria2")
save_path.with_suffix(f"{save_path.suffix}.aria2").unlink(missing_ok=True)
if save_dir.exists() and save_dir.name.endswith("_segments"):
shutil.rmtree(save_dir)
if config.directories.temp.is_file():
self.log.error(f"Temp Directory '{config.directories.temp}' must be a Directory, not a file")
sys.exit(1)
config.directories.temp.mkdir(parents=True, exist_ok=True)
# Delete any pre-existing temp files matching this track.
# We can't re-use or continue downloading these tracks as they do not use a
# lock file. Or at least the majority don't. Even if they did I've encountered
# corruptions caused by sudden interruptions to the lock file.
cleanup()
try:
if track.descriptor == track.Descriptor.M3U:
HLS.download_track(
track=track,
save_path=save_path,
save_dir=save_dir,
progress=progress,
session=service.session,
proxy=proxy,
license_widevine=prepare_drm
)
elif track.descriptor == track.Descriptor.MPD:
DASH.download_track(
track=track,
save_path=save_path,
save_dir=save_dir,
progress=progress,
session=service.session,
proxy=proxy,
license_widevine=prepare_drm
)
# no else-if as DASH may convert the track to URL descriptor
if track.descriptor == track.Descriptor.URL:
try:
if not track.drm and isinstance(track, (Video, Audio)):
# the service might not have explicitly defined the `drm` property
# try find widevine DRM information from the init data of URL
try:
track.drm = [Widevine.from_track(track, service.session)]
except Widevine.Exceptions.PSSHNotFound:
# it might not have Widevine DRM, or might not have found the PSSH
self.log.warning("No Widevine PSSH was found for this track, is it DRM free?")
if track.drm:
track_kid = track.get_key_id(session=service.session)
drm = track.drm[0] # just use the first supported DRM system for now
if isinstance(drm, Widevine):
# license and grab content keys
if not prepare_drm:
raise ValueError("prepare_drm func must be supplied to use Widevine DRM")
progress(downloaded="LICENSING")
prepare_drm(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
else:
drm = None
downloader(
uri=track.url,
out=save_path,
headers=service.session.headers,
cookies=service.session.cookies,
proxy=proxy,
progress=progress
)
track.path = save_path
if drm:
progress(downloaded="Decrypting", completed=0, total=100)
drm.decrypt(save_path)
track.drm = None
if callable(track.OnDecrypted):
track.OnDecrypted(track)
progress(downloaded="Decrypted", completed=100)
if isinstance(track, Subtitle):
track_data = track.path.read_bytes()
track_data = try_ensure_utf8(track_data)
track_data = html.unescape(track_data.decode("utf8")).encode("utf8")
track.path.write_bytes(track_data)
progress(downloaded="Downloaded")
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set()
progress(downloaded="[yellow]CANCELLED")
raise
except Exception:
DOWNLOAD_CANCELLED.set()
progress(downloaded="[red]FAILED")
raise
except (Exception, KeyboardInterrupt):
cleanup()
raise
if DOWNLOAD_CANCELLED.is_set():
# we stopped during the download, let's exit
return
if track.path.stat().st_size <= 3: # Empty UTF-8 BOM == 3 bytes
raise IOError("Download failed, the downloaded file is empty.")
if callable(track.OnDownloaded):
track.OnDownloaded(track)
@staticmethod @staticmethod
def get_profile(service: str) -> Optional[str]: def get_cookie_jar(service: str, profile: Optional[str]) -> Optional[MozillaCookieJar]:
"""Get profile for Service from config.""" """Get Service Cookies for Profile."""
profile = config.profiles.get(service) cookie_file = dl.get_cookie_path(service, profile)
if profile is False: if cookie_file:
return None # auth-less service if `false` in config
if not profile:
profile = config.profiles.get("default")
if not profile:
raise ValueError(f"No profile has been defined for '{service}' in the config.")
return profile
@staticmethod
def get_cookie_jar(service: str, profile: str) -> Optional[MozillaCookieJar]:
"""Get Profile's Cookies as Mozilla Cookie Jar if available."""
cookie_file = config.directories.cookies / service / f"{profile}.txt"
if cookie_file.is_file():
cookie_jar = MozillaCookieJar(cookie_file) cookie_jar = MozillaCookieJar(cookie_file)
cookie_data = html.unescape(cookie_file.read_text("utf8")).splitlines(keepends=False) cookie_data = html.unescape(cookie_file.read_text("utf8")).splitlines(keepends=False)
for i, line in enumerate(cookie_data): for i, line in enumerate(cookie_data):
@ -1023,17 +848,29 @@ class dl:
cookie_file.write_text(cookie_data, "utf8") cookie_file.write_text(cookie_data, "utf8")
cookie_jar.load(ignore_discard=True, ignore_expires=True) cookie_jar.load(ignore_discard=True, ignore_expires=True)
return cookie_jar return cookie_jar
return None
@staticmethod @staticmethod
def get_credentials(service: str, profile: str) -> Optional[Credential]: def save_cookies(path: Path, cookies: CookieJar):
"""Get Profile's Credential if available.""" cookie_jar = MozillaCookieJar(path)
cred = config.credentials.get(service, {}).get(profile) cookie_jar.load()
if cred: for cookie in cookies:
if isinstance(cred, list): cookie_jar.set_cookie(cookie)
return Credential(*cred) cookie_jar.save(ignore_discard=True)
return Credential.loads(cred)
return None @staticmethod
def get_credentials(service: str, profile: Optional[str]) -> Optional[Credential]:
"""Get Service Credentials for Profile."""
credentials = config.credentials.get(service)
if credentials:
if isinstance(credentials, dict):
if profile:
credentials = credentials.get(profile) or credentials.get("default")
else:
credentials = credentials.get("default")
if credentials:
if isinstance(credentials, list):
return Credential(*credentials)
return Credential.loads(credentials) # type: ignore
@staticmethod @staticmethod
def get_cdm(service: str, profile: Optional[str] = None) -> WidevineCdm: def get_cdm(service: str, profile: Optional[str] = None) -> WidevineCdm:

166
devine/commands/search.py Normal file
View File

@ -0,0 +1,166 @@
from __future__ import annotations
import logging
import re
import sys
from typing import Any, Optional
import click
import yaml
from rich.padding import Padding
from rich.rule import Rule
from rich.tree import Tree
from devine.commands.dl import dl
from devine.core.config import config
from devine.core.console import console
from devine.core.constants import context_settings
from devine.core.proxies import Basic, Hola, NordVPN
from devine.core.service import Service
from devine.core.services import Services
from devine.core.utilities import get_binary_path
from devine.core.utils.click_types import ContextData
from devine.core.utils.collections import merge_dict
@click.command(
short_help="Search for titles from a Service.",
cls=Services,
context_settings=dict(
**context_settings,
token_normalize_func=Services.get_tag
))
@click.option("-p", "--profile", type=str, default=None,
help="Profile to use for Credentials and Cookies (if available).")
@click.option("--proxy", type=str, default=None,
help="Proxy URI to use. If a 2-letter country is provided, it will try get a proxy from the config.")
@click.option("--no-proxy", is_flag=True, default=False,
help="Force disable all proxy use.")
@click.pass_context
def search(
ctx: click.Context,
no_proxy: bool,
profile: Optional[str] = None,
proxy: Optional[str] = None
):
if not ctx.invoked_subcommand:
raise ValueError("A subcommand to invoke was not specified, the main code cannot continue.")
log = logging.getLogger("search")
service = Services.get_tag(ctx.invoked_subcommand)
profile = profile
if profile:
log.info(f"Using profile: '{profile}'")
with console.status("Loading Service Config...", spinner="dots"):
service_config_path = Services.get_path(service) / config.filenames.config
if service_config_path.exists():
service_config = yaml.safe_load(service_config_path.read_text(encoding="utf8"))
log.info("Service Config loaded")
else:
service_config = {}
merge_dict(config.services.get(service), service_config)
proxy_providers = []
if no_proxy:
ctx.params["proxy"] = None
else:
with console.status("Loading Proxy Providers...", spinner="dots"):
if config.proxy_providers.get("basic"):
proxy_providers.append(Basic(**config.proxy_providers["basic"]))
if config.proxy_providers.get("nordvpn"):
proxy_providers.append(NordVPN(**config.proxy_providers["nordvpn"]))
if get_binary_path("hola-proxy"):
proxy_providers.append(Hola())
for proxy_provider in proxy_providers:
log.info(f"Loaded {proxy_provider.__class__.__name__}: {proxy_provider}")
if proxy:
requested_provider = None
if re.match(r"^[a-z]+:.+$", proxy, re.IGNORECASE):
# requesting proxy from a specific proxy provider
requested_provider, proxy = proxy.split(":", maxsplit=1)
if re.match(r"^[a-z]{2}(?:\d+)?$", proxy, re.IGNORECASE):
proxy = proxy.lower()
with console.status(f"Getting a Proxy to {proxy}...", spinner="dots"):
if requested_provider:
proxy_provider = next((
x
for x in proxy_providers
if x.__class__.__name__.lower() == requested_provider
), None)
if not proxy_provider:
log.error(f"The proxy provider '{requested_provider}' was not recognised.")
sys.exit(1)
proxy_uri = proxy_provider.get_proxy(proxy)
if not proxy_uri:
log.error(f"The proxy provider {requested_provider} had no proxy for {proxy}")
sys.exit(1)
proxy = ctx.params["proxy"] = proxy_uri
log.info(f"Using {proxy_provider.__class__.__name__} Proxy: {proxy}")
else:
for proxy_provider in proxy_providers:
proxy_uri = proxy_provider.get_proxy(proxy)
if proxy_uri:
proxy = ctx.params["proxy"] = proxy_uri
log.info(f"Using {proxy_provider.__class__.__name__} Proxy: {proxy}")
break
else:
log.info(f"Using explicit Proxy: {proxy}")
ctx.obj = ContextData(
config=service_config,
cdm=None,
proxy_providers=proxy_providers,
profile=profile
)
@search.result_callback()
def result(service: Service, profile: Optional[str] = None, **_: Any) -> None:
log = logging.getLogger("search")
service_tag = service.__class__.__name__
with console.status("Authenticating with Service...", spinner="dots"):
cookies = dl.get_cookie_jar(service_tag, profile)
credential = dl.get_credentials(service_tag, profile)
service.authenticate(cookies, credential)
if cookies or credential:
log.info("Authenticated with Service")
search_results = Tree("Search Results", hide_root=True)
with console.status("Searching...", spinner="dots"):
for result in service.search():
result_text = f"[bold text]{result.title}[/]"
if result.url:
result_text = f"[link={result.url}]{result_text}[/link]"
if result.label:
result_text += f" [pink]{result.label}[/]"
if result.description:
result_text += f"\n[text2]{result.description}[/]"
result_text += f"\n[bright_black]id: {result.id}[/]"
search_results.add(result_text + "\n")
# update cookies
cookie_file = dl.get_cookie_path(service_tag, profile)
if cookie_file:
dl.save_cookies(cookie_file, service.session.cookies)
console.print(Padding(
Rule(f"[rule.text]{len(search_results.children)} Search Results"),
(1, 2)
))
if search_results.children:
console.print(Padding(
search_results,
(0, 5)
))
else:
console.print(Padding(
"[bold text]No matches[/]\n[bright_black]Please check spelling and search again....[/]",
(0, 5)
))

View File

@ -1 +1 @@
__version__ = "2.1.0" __version__ = "3.1.0"

View File

@ -27,7 +27,7 @@ LOGGING_PATH = None
@click.option("--log", "log_path", type=Path, default=config.directories.logs / config.filenames.log, @click.option("--log", "log_path", type=Path, default=config.directories.logs / config.filenames.log,
help="Log path (or filename). Path can contain the following f-string args: {name} {time}.") help="Log path (or filename). Path can contain the following f-string args: {name} {time}.")
def main(version: bool, debug: bool, log_path: Path) -> None: def main(version: bool, debug: bool, log_path: Path) -> None:
"""Devine—Open-Source Movie, TV, and Music Downloading Solution.""" """Devine—Modular Movie, TV, and Music Archival Software."""
logging.basicConfig( logging.basicConfig(
level=logging.DEBUG if debug else logging.INFO, level=logging.DEBUG if debug else logging.INFO,
format="%(message)s", format="%(message)s",

View File

@ -2,7 +2,7 @@ from __future__ import annotations
import tempfile import tempfile
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any, Optional
import yaml import yaml
from appdirs import AppDirs from appdirs import AppDirs
@ -39,6 +39,7 @@ class Config:
self.dl: dict = kwargs.get("dl") or {} self.dl: dict = kwargs.get("dl") or {}
self.aria2c: dict = kwargs.get("aria2c") or {} self.aria2c: dict = kwargs.get("aria2c") or {}
self.cdm: dict = kwargs.get("cdm") or {} self.cdm: dict = kwargs.get("cdm") or {}
self.chapter_fallback_name: str = kwargs.get("chapter_fallback_name") or ""
self.curl_impersonate: dict = kwargs.get("curl_impersonate") or {} self.curl_impersonate: dict = kwargs.get("curl_impersonate") or {}
self.remote_cdm: list[dict] = kwargs.get("remote_cdm") or [] self.remote_cdm: list[dict] = kwargs.get("remote_cdm") or []
self.credentials: dict = kwargs.get("credentials") or {} self.credentials: dict = kwargs.get("credentials") or {}
@ -50,7 +51,7 @@ class Config:
continue continue
setattr(self.directories, name, Path(path).expanduser()) setattr(self.directories, name, Path(path).expanduser())
self.downloader = kwargs.get("downloader") or "aria2c" self.downloader = kwargs.get("downloader") or "requests"
self.filenames = self._Filenames() self.filenames = self._Filenames()
for name, filename in (kwargs.get("filenames") or {}).items(): for name, filename in (kwargs.get("filenames") or {}).items():
@ -60,7 +61,6 @@ class Config:
self.key_vaults: list[dict[str, Any]] = kwargs.get("key_vaults", []) self.key_vaults: list[dict[str, Any]] = kwargs.get("key_vaults", [])
self.muxing: dict = kwargs.get("muxing") or {} self.muxing: dict = kwargs.get("muxing") or {}
self.nordvpn: dict = kwargs.get("nordvpn") or {} self.nordvpn: dict = kwargs.get("nordvpn") or {}
self.profiles: dict = kwargs.get("profiles") or {}
self.proxy_providers: dict = kwargs.get("proxy_providers") or {} self.proxy_providers: dict = kwargs.get("proxy_providers") or {}
self.serve: dict = kwargs.get("serve") or {} self.serve: dict = kwargs.get("serve") or {}
self.services: dict = kwargs.get("services") or {} self.services: dict = kwargs.get("services") or {}
@ -76,11 +76,35 @@ class Config:
return cls(**yaml.safe_load(path.read_text(encoding="utf8")) or {}) return cls(**yaml.safe_load(path.read_text(encoding="utf8")) or {})
# noinspection PyProtectedMember def get_config_path() -> Optional[Path]:
config_path = Config._Directories.user_configs / Config._Filenames.root_config """
if not config_path.is_file(): Get Path to Config from various locations.
Config._Directories.user_configs.mkdir(parents=True, exist_ok=True)
config_path.write_text("") Looks for a config file in the following folders in order:
config = Config.from_yaml(config_path)
1. The Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages/devine)
2. The Parent Folder to the Devine Namespace Folder (e.g., %appdata%/Python/Python311/site-packages)
3. The AppDirs User Config Folder (e.g., %localappdata%/devine)
Returns None if no config file could be found.
"""
# noinspection PyProtectedMember
path = Config._Directories.namespace_dir / Config._Filenames.root_config
if not path.exists():
# noinspection PyProtectedMember
path = Config._Directories.namespace_dir.parent / Config._Filenames.root_config
if not path.exists():
# noinspection PyProtectedMember
path = Config._Directories.user_configs / Config._Filenames.root_config
if not path.exists():
path = None
return path
config_path = get_config_path()
if config_path:
config = Config.from_yaml(config_path)
else:
config = Config()
__all__ = ("config",) __all__ = ("config",)

View File

@ -5,21 +5,6 @@ DOWNLOAD_CANCELLED = Event()
DOWNLOAD_LICENCE_ONLY = Event() DOWNLOAD_LICENCE_ONLY = Event()
DRM_SORT_MAP = ["ClearKey", "Widevine"] DRM_SORT_MAP = ["ClearKey", "Widevine"]
LANGUAGE_MUX_MAP = {
# List of language tags that cannot be used by mkvmerge and need replacements.
# Try get the replacement to be as specific locale-wise as possible.
# A bcp47 as the replacement is recommended.
"cmn": "zh",
"cmn-Hant": "zh-Hant",
"cmn-Hans": "zh-Hans",
"none": "und",
"yue": "zh-yue",
"yue-Hant": "zh-yue-Hant",
"yue-Hans": "zh-yue-Hans"
}
TERRITORY_MAP = {
"Hong Kong SAR China": "Hong Kong"
}
LANGUAGE_MAX_DISTANCE = 5 # this is max to be considered "same", e.g., en, en-US, en-AU LANGUAGE_MAX_DISTANCE = 5 # this is max to be considered "same", e.g., en, en-US, en-AU
VIDEO_CODEC_MAP = { VIDEO_CODEC_MAP = {
"AVC": "H.264", "AVC": "H.264",

View File

@ -1,15 +1,5 @@
import asyncio
from ..config import config
from .aria2c import aria2c from .aria2c import aria2c
from .curl_impersonate import curl_impersonate from .curl_impersonate import curl_impersonate
from .requests import requests from .requests import requests
downloader = { __all__ = ("aria2c", "curl_impersonate", "requests")
"aria2c": lambda *args, **kwargs: asyncio.run(aria2c(*args, **kwargs)),
"curl_impersonate": curl_impersonate,
"requests": requests
}[config.downloader]
__all__ = ("downloader", "aria2c", "curl_impersonate", "requests")

View File

@ -1,84 +1,149 @@
import asyncio import os
import subprocess import subprocess
import textwrap import textwrap
import time
from functools import partial from functools import partial
from http.cookiejar import CookieJar from http.cookiejar import CookieJar
from pathlib import Path from pathlib import Path
from typing import MutableMapping, Optional, Union from typing import Any, Callable, Generator, MutableMapping, Optional, Union
from urllib.parse import urlparse
import requests import requests
from requests.cookies import RequestsCookieJar, cookiejar_from_dict, get_cookie_header from Crypto.Random import get_random_bytes
from requests import Session
from requests.cookies import cookiejar_from_dict, get_cookie_header
from rich import filesize
from rich.text import Text from rich.text import Text
from devine.core.config import config from devine.core.config import config
from devine.core.console import console from devine.core.console import console
from devine.core.utilities import get_binary_path, start_pproxy from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_binary_path, get_extension, get_free_port
async def aria2c( def rpc(caller: Callable, secret: str, method: str, params: Optional[list[Any]] = None) -> Any:
uri: Union[str, list[str]], """Make a call to Aria2's JSON-RPC API."""
out: Path, try:
headers: Optional[dict] = None, rpc_res = caller(
cookies: Optional[Union[MutableMapping[str, str], RequestsCookieJar]] = None, json={
"jsonrpc": "2.0",
"id": get_random_bytes(16).hex(),
"method": method,
"params": [f"token:{secret}", *(params or [])]
}
).json()
if rpc_res.get("code"):
# wrap to console width - padding - '[Aria2c]: '
error_pretty = "\n ".join(textwrap.wrap(
f"RPC Error: {rpc_res['message']} ({rpc_res['code']})".strip(),
width=console.width - 20,
initial_indent=""
))
console.log(Text.from_ansi("\n[Aria2c]: " + error_pretty))
return rpc_res["result"]
except requests.exceptions.ConnectionError:
# absorb, process likely ended as it was calling RPC
return
def download(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None, proxy: Optional[str] = None,
silent: bool = False, max_workers: Optional[int] = None
segmented: bool = False, ) -> Generator[dict[str, Any], None, None]:
progress: Optional[partial] = None, if not urls:
*args: str raise ValueError("urls must be provided and not empty")
) -> int: elif not isinstance(urls, (str, dict, list)):
""" raise TypeError(f"Expected urls to be {str} or {dict} or a list of one of them, not {type(urls)}")
Download files using Aria2(c).
https://aria2.github.io
If multiple URLs are provided they will be downloaded in the provided order if not output_dir:
to the output directory. They will not be merged together. raise ValueError("output_dir must be provided")
""" elif not isinstance(output_dir, Path):
if not isinstance(uri, list): raise TypeError(f"Expected output_dir to be {Path}, not {type(output_dir)}")
uri = [uri]
if cookies and not isinstance(cookies, CookieJar): if not filename:
cookies = cookiejar_from_dict(cookies) raise ValueError("filename must be provided")
elif not isinstance(filename, str):
raise TypeError(f"Expected filename to be {str}, not {type(filename)}")
if not isinstance(headers, (MutableMapping, type(None))):
raise TypeError(f"Expected headers to be {MutableMapping}, not {type(headers)}")
if not isinstance(cookies, (MutableMapping, CookieJar, type(None))):
raise TypeError(f"Expected cookies to be {MutableMapping} or {CookieJar}, not {type(cookies)}")
if not isinstance(proxy, (str, type(None))):
raise TypeError(f"Expected proxy to be {str}, not {type(proxy)}")
if not max_workers:
max_workers = min(32, (os.cpu_count() or 1) + 4)
elif not isinstance(max_workers, int):
raise TypeError(f"Expected max_workers to be {int}, not {type(max_workers)}")
if not isinstance(urls, list):
urls = [urls]
executable = get_binary_path("aria2c", "aria2") executable = get_binary_path("aria2c", "aria2")
if not executable: if not executable:
raise EnvironmentError("Aria2c executable not found...") raise EnvironmentError("Aria2c executable not found...")
if proxy and proxy.lower().split(":")[0] != "http": if proxy and not proxy.lower().startswith("http://"):
# HTTPS proxies are not supported by aria2(c). raise ValueError("Only HTTP proxies are supported by aria2(c)")
# Proxy the proxy via pproxy to access it as an HTTP proxy.
async with start_pproxy(proxy) as pproxy_: if cookies and not isinstance(cookies, CookieJar):
return await aria2c(uri, out, headers, cookies, pproxy_, silent, segmented, progress, *args) cookies = cookiejar_from_dict(cookies)
multiple_urls = len(uri) > 1
url_files = [] url_files = []
for i, url in enumerate(uri): for i, url in enumerate(urls):
url_text = url if isinstance(url, str):
if multiple_urls: url_data = {
url_text += f"\n\tdir={out}" "url": url
url_text += f"\n\tout={i:08}.mp4" }
else: else:
url_text += f"\n\tdir={out.parent}" url_data: dict[str, Any] = url
url_text += f"\n\tout={out.name}" url_filename = filename.format(
i=i,
ext=get_extension(url_data["url"])
)
url_text = url_data["url"]
url_text += f"\n\tdir={output_dir}"
url_text += f"\n\tout={url_filename}"
if cookies: if cookies:
mock_request = requests.Request(url=url) mock_request = requests.Request(url=url_data["url"])
cookie_header = get_cookie_header(cookies, mock_request) cookie_header = get_cookie_header(cookies, mock_request)
if cookie_header: if cookie_header:
url_text += f"\n\theader=Cookie: {cookie_header}" url_text += f"\n\theader=Cookie: {cookie_header}"
for key, value in url_data.items():
if key == "url":
continue
if key == "headers":
for header_name, header_value in value.items():
url_text += f"\n\theader={header_name}: {header_value}"
else:
url_text += f"\n\t{key}={value}"
url_files.append(url_text) url_files.append(url_text)
url_file = "\n".join(url_files) url_file = "\n".join(url_files)
max_concurrent_downloads = int(config.aria2c.get("max_concurrent_downloads", 5)) rpc_port = get_free_port()
rpc_secret = get_random_bytes(16).hex()
rpc_uri = f"http://127.0.0.1:{rpc_port}/jsonrpc"
rpc_session = Session()
max_concurrent_downloads = int(config.aria2c.get("max_concurrent_downloads", max_workers))
max_connection_per_server = int(config.aria2c.get("max_connection_per_server", 1)) max_connection_per_server = int(config.aria2c.get("max_connection_per_server", 1))
split = int(config.aria2c.get("split", 5)) split = int(config.aria2c.get("split", 5))
file_allocation = config.aria2c.get("file_allocation", "prealloc") file_allocation = config.aria2c.get("file_allocation", "prealloc")
if segmented: if len(urls) > 1:
split = 1 split = 1
file_allocation = "none" file_allocation = "none"
arguments = [ arguments = [
# [Basic Options] # [Basic Options]
"--input-file", "-", "--input-file", "-",
"--out", out.name,
"--all-proxy", proxy or "", "--all-proxy", proxy or "",
"--continue=true", "--continue=true",
# [Connection Options] # [Connection Options]
@ -92,11 +157,13 @@ async def aria2c(
"--allow-overwrite=true", "--allow-overwrite=true",
"--auto-file-renaming=false", "--auto-file-renaming=false",
"--console-log-level=warn", "--console-log-level=warn",
f"--download-result={'default' if progress else 'hide'}", "--download-result=default",
f"--file-allocation={file_allocation}", f"--file-allocation={file_allocation}",
"--summary-interval=0", "--summary-interval=0",
# [Extra Options] # [RPC Options]
*args "--enable-rpc=true",
f"--rpc-listen-port={rpc_port}",
f"--rpc-secret={rpc_secret}"
] ]
for header, value in (headers or {}).items(): for header, value in (headers or {}).items():
@ -114,67 +181,72 @@ async def aria2c(
continue continue
arguments.extend(["--header", f"{header}: {value}"]) arguments.extend(["--header", f"{header}: {value}"])
yield dict(total=len(urls))
try: try:
p = await asyncio.create_subprocess_exec( p = subprocess.Popen(
[
executable, executable,
*arguments, *arguments
],
stdin=subprocess.PIPE, stdin=subprocess.PIPE,
stdout=subprocess.PIPE stdout=subprocess.DEVNULL
) )
p.stdin.write(url_file.encode()) p.stdin.write(url_file.encode())
await p.stdin.drain()
p.stdin.close() p.stdin.close()
if p.stdout: while p.poll() is None:
is_dl_summary = False global_stats: dict[str, Any] = rpc(
log_buffer = "" caller=partial(rpc_session.post, url=rpc_uri),
while True: secret=rpc_secret,
try: method="aria2.getGlobalStat"
chunk = await p.stdout.readuntil(b"\r") ) or {}
except asyncio.IncompleteReadError as e:
chunk = e.partial
if not chunk:
break
for line in chunk.decode().strip().splitlines():
if not line:
continue
if line.startswith("Download Results"):
# we know it's 100% downloaded, but let's use the avg dl speed value
is_dl_summary = True
elif line.startswith("[") and line.endswith("]"):
if progress and "%" in line:
# id, dledMiB/totalMiB(x%), CN:xx, DL:xxMiB, ETA:Xs
# eta may not always be available
data_parts = line[1:-1].split()
perc_parts = data_parts[1].split("(")
if len(perc_parts) == 2:
# might otherwise be e.g., 0B/0B, with no % symbol provided
progress(
total=100,
completed=int(perc_parts[1][:-2]),
downloaded=f"{data_parts[3].split(':')[1]}/s"
)
elif is_dl_summary and "OK" in line and "|" in line:
gid, status, avg_speed, path_or_uri = line.split("|")
progress(total=100, completed=100, downloaded=avg_speed.strip())
elif not is_dl_summary:
if "aria2 will resume download if the transfer is restarted" in line:
continue
if "If there are any errors, then see the log file" in line:
continue
log_buffer += f"{line.strip()}\n"
if log_buffer and not silent: number_stopped = int(global_stats.get("numStoppedTotal", 0))
# wrap to console width - padding - '[Aria2c]: ' download_speed = int(global_stats.get("downloadSpeed", -1))
log_buffer = "\n ".join(textwrap.wrap(
log_buffer.rstrip(), if number_stopped:
yield dict(completed=number_stopped)
if download_speed != -1:
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
stopped_downloads: list[dict[str, Any]] = rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.tellStopped",
params=[0, 999999]
) or []
for dl in stopped_downloads:
if dl["status"] == "error":
used_uri = next(
uri["uri"]
for file in dl["files"]
if file["selected"] == "true"
for uri in file["uris"]
if uri["status"] == "used"
)
error = f"Download Error (#{dl['gid']}): {dl['errorMessage']} ({dl['errorCode']}), {used_uri}"
error_pretty = "\n ".join(textwrap.wrap(
error,
width=console.width - 20, width=console.width - 20,
initial_indent="" initial_indent=""
)) ))
console.log(Text.from_ansi("\n[Aria2c]: " + log_buffer)) console.log(Text.from_ansi("\n[Aria2c]: " + error_pretty))
raise ValueError(error)
await p.wait() if number_stopped == len(urls):
rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.shutdown"
)
break
time.sleep(1)
p.wait()
if p.returncode != 0: if p.returncode != 0:
raise subprocess.CalledProcessError(p.returncode, arguments) raise subprocess.CalledProcessError(p.returncode, arguments)
@ -187,8 +259,96 @@ async def aria2c(
# 0xC000013A is when it never got the chance to # 0xC000013A is when it never got the chance to
raise KeyboardInterrupt() raise KeyboardInterrupt()
raise raise
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[yellow]CANCELLED")
raise
except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[red]FAILED")
raise
finally:
rpc(
caller=partial(rpc_session.post, url=rpc_uri),
secret=rpc_secret,
method="aria2.shutdown"
)
return p.returncode
def aria2c(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None
) -> Generator[dict[str, Any], None, None]:
"""
Download files using Aria2(c).
https://aria2.github.io
Yields the following download status updates while chunks are downloading:
- {total: 100} (100% download total)
- {completed: 1} (1% download progress out of 100%)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
The data is in the same format accepted by rich's progress.update() function.
Parameters:
urls: Web URL(s) to file(s) to download. You can use a dictionary with the key
"url" for the URI, and other keys for extra arguments to use per-URL.
output_dir: The folder to save the file into. If the save path's directory does
not exist then it will be made automatically.
filename: The filename or filename template to use for each file. The variables
you can use are `i` for the URL index and `ext` for the URL extension.
headers: A mapping of HTTP Header Key/Values to use for all downloads.
cookies: A mapping of Cookie Key/Values or a Cookie Jar to use for all downloads.
proxy: An optional proxy URI to route connections through for all downloads.
max_workers: The maximum amount of threads to use for downloads. Defaults to
min(32,(cpu_count+4)). Use for the --max-concurrent-downloads option.
"""
if proxy and not proxy.lower().startswith("http://"):
# Only HTTP proxies are supported by aria2(c)
proxy = urlparse(proxy)
port = get_free_port()
username, password = get_random_bytes(8).hex(), get_random_bytes(8).hex()
local_proxy = f"http://{username}:{password}@localhost:{port}"
scheme = {
"https": "http+ssl",
"socks5h": "socks"
}.get(proxy.scheme, proxy.scheme)
remote_server = f"{scheme}://{proxy.hostname}"
if proxy.port:
remote_server += f":{proxy.port}"
if proxy.username or proxy.password:
remote_server += "#"
if proxy.username:
remote_server += proxy.username
if proxy.password:
remote_server += f":{proxy.password}"
p = subprocess.Popen(
[
"pproxy",
"-l", f"http://:{port}#{username}:{password}",
"-r", remote_server
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
)
try:
yield from download(urls, output_dir, filename, headers, cookies, local_proxy, max_workers)
finally:
p.kill()
p.wait()
return
yield from download(urls, output_dir, filename, headers, cookies, proxy, max_workers)
__all__ = ("aria2c",) __all__ = ("aria2c",)

View File

@ -1,49 +1,217 @@
import math
import time import time
from functools import partial from concurrent import futures
from concurrent.futures.thread import ThreadPoolExecutor
from http.cookiejar import CookieJar
from pathlib import Path from pathlib import Path
from typing import Any, MutableMapping, Optional, Union from typing import Any, Generator, MutableMapping, Optional, Union
from curl_cffi.requests import Session from curl_cffi.requests import Session
from requests.cookies import RequestsCookieJar
from rich import filesize from rich import filesize
from devine.core.config import config from devine.core.config import config
from devine.core.constants import DOWNLOAD_CANCELLED from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_extension
MAX_ATTEMPTS = 5 MAX_ATTEMPTS = 5
RETRY_WAIT = 2 RETRY_WAIT = 2
BROWSER = config.curl_impersonate.get("browser", "chrome110") CHUNK_SIZE = 1024
PROGRESS_WINDOW = 5
BROWSER = config.curl_impersonate.get("browser", "chrome120")
def curl_impersonate( def download(
uri: Union[str, list[str]], url: str,
out: Path, save_path: Path,
headers: Optional[dict] = None, session: Optional[Session] = None,
cookies: Optional[Union[MutableMapping[str, str], RequestsCookieJar]] = None, **kwargs: Any
proxy: Optional[str] = None, ) -> Generator[dict[str, Any], None, None]:
progress: Optional[partial] = None,
*_: Any,
**__: Any
) -> int:
""" """
Download files using Curl Impersonate. Download files using Curl Impersonate.
https://github.com/lwthiker/curl-impersonate https://github.com/lwthiker/curl-impersonate
If multiple URLs are provided they will be downloaded in the provided order Yields the following download status updates while chunks are downloading:
to the output directory. They will not be merged together.
"""
if isinstance(uri, list) and len(uri) == 1:
uri = uri[0]
if isinstance(uri, list): - {total: 123} (there are 123 chunks to download)
if out.is_file(): - {total: None} (there are an unknown number of chunks to download)
raise ValueError("Expecting out to be a Directory path not a File as multiple URLs were provided") - {advance: 1} (one chunk was downloaded)
uri = [ - {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
(url, out / f"{i:08}.mp4") - {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
for i, url in enumerate(uri)
] The data is in the same format accepted by rich's progress.update() function. The
`downloaded` key is custom and is not natively accepted by all rich progress bars.
Parameters:
url: Web URL of a file to download.
save_path: The path to save the file to. If the save path's directory does not
exist then it will be made automatically.
session: The Requests or Curl-Impersonate Session to make HTTP requests with.
Useful to set Header, Cookie, and Proxy data. Connections are saved and
re-used with the session so long as the server keeps the connection alive.
kwargs: Any extra keyword arguments to pass to the session.get() call. Use this
for one-time request changes like a header, cookie, or proxy. For example,
to request Byte-ranges use e.g., `headers={"Range": "bytes=0-128"}`.
"""
if not session:
session = Session(impersonate=BROWSER)
save_dir = save_path.parent
control_file = save_path.with_name(f"{save_path.name}.!dev")
save_dir.mkdir(parents=True, exist_ok=True)
if control_file.exists():
# consider the file corrupt if the control file exists
save_path.unlink(missing_ok=True)
control_file.unlink()
elif save_path.exists():
# if it exists, and no control file, then it should be safe
yield dict(
file_downloaded=save_path,
written=save_path.stat().st_size
)
# TODO: Design a control file format so we know how much of the file is missing
control_file.write_bytes(b"")
attempts = 1
try:
while True:
written = 0
download_sizes = []
last_speed_refresh = time.time()
try:
stream = session.get(url, stream=True, **kwargs)
stream.raise_for_status()
try:
content_length = int(stream.headers.get("Content-Length", "0"))
except ValueError:
content_length = 0
if content_length > 0:
yield dict(total=math.ceil(content_length / CHUNK_SIZE))
else: else:
uri = [(uri, out.parent / out.name)] # we have no data to calculate total chunks
yield dict(total=None) # indeterminate mode
with open(save_path, "wb") as f:
for chunk in stream.iter_content(chunk_size=CHUNK_SIZE):
download_size = len(chunk)
f.write(chunk)
written += download_size
yield dict(advance=1)
now = time.time()
time_since = now - last_speed_refresh
download_sizes.append(download_size)
if time_since > PROGRESS_WINDOW or download_size < CHUNK_SIZE:
data_size = sum(download_sizes)
download_speed = math.ceil(data_size / (time_since or 1))
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
yield dict(
file_downloaded=save_path,
written=written
)
break
except Exception as e:
save_path.unlink(missing_ok=True)
if DOWNLOAD_CANCELLED.is_set() or attempts == MAX_ATTEMPTS:
raise e
time.sleep(RETRY_WAIT)
attempts += 1
finally:
control_file.unlink()
def curl_impersonate(
urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
output_dir: Path,
filename: str,
headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None,
max_workers: Optional[int] = None
) -> Generator[dict[str, Any], None, None]:
"""
Download files using Curl Impersonate.
https://github.com/lwthiker/curl-impersonate
Yields the following download status updates while chunks are downloading:
- {total: 123} (there are 123 chunks to download)
- {total: None} (there are an unknown number of chunks to download)
- {advance: 1} (one chunk was downloaded)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
- {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
The data is in the same format accepted by rich's progress.update() function.
However, The `downloaded`, `file_downloaded` and `written` keys are custom and not
natively accepted by rich progress bars.
Parameters:
urls: Web URL(s) to file(s) to download. You can use a dictionary with the key
"url" for the URI, and other keys for extra arguments to use per-URL.
output_dir: The folder to save the file into. If the save path's directory does
not exist then it will be made automatically.
filename: The filename or filename template to use for each file. The variables
you can use are `i` for the URL index and `ext` for the URL extension.
headers: A mapping of HTTP Header Key/Values to use for all downloads.
cookies: A mapping of Cookie Key/Values or a Cookie Jar to use for all downloads.
proxy: An optional proxy URI to route connections through for all downloads.
max_workers: The maximum amount of threads to use for downloads. Defaults to
min(32,(cpu_count+4)).
"""
if not urls:
raise ValueError("urls must be provided and not empty")
elif not isinstance(urls, (str, dict, list)):
raise TypeError(f"Expected urls to be {str} or {dict} or a list of one of them, not {type(urls)}")
if not output_dir:
raise ValueError("output_dir must be provided")
elif not isinstance(output_dir, Path):
raise TypeError(f"Expected output_dir to be {Path}, not {type(output_dir)}")
if not filename:
raise ValueError("filename must be provided")
elif not isinstance(filename, str):
raise TypeError(f"Expected filename to be {str}, not {type(filename)}")
if not isinstance(headers, (MutableMapping, type(None))):
raise TypeError(f"Expected headers to be {MutableMapping}, not {type(headers)}")
if not isinstance(cookies, (MutableMapping, CookieJar, type(None))):
raise TypeError(f"Expected cookies to be {MutableMapping} or {CookieJar}, not {type(cookies)}")
if not isinstance(proxy, (str, type(None))):
raise TypeError(f"Expected proxy to be {str}, not {type(proxy)}")
if not isinstance(max_workers, (int, type(None))):
raise TypeError(f"Expected max_workers to be {int}, not {type(max_workers)}")
if not isinstance(urls, list):
urls = [urls]
urls = [
dict(
save_path=save_path,
**url
) if isinstance(url, dict) else dict(
url=url,
save_path=save_path
)
for i, url in enumerate(urls)
for save_path in [output_dir / filename.format(
i=i,
ext=get_extension(url["url"] if isinstance(url, dict) else url)
)]
]
session = Session(impersonate=BROWSER) session = Session(impersonate=BROWSER)
if headers: if headers:
@ -57,49 +225,65 @@ def curl_impersonate(
session.cookies.update(cookies) session.cookies.update(cookies)
if proxy: if proxy:
session.proxies.update({ session.proxies.update({
"http": proxy, "http": proxy.replace("https://", "http://"),
"https": proxy "https": proxy.replace("https://", "http://")
}) })
if progress: yield dict(total=len(urls))
progress(total=len(uri))
download_sizes = [] download_sizes = []
last_speed_refresh = time.time() last_speed_refresh = time.time()
for url, out_path in uri: with ThreadPoolExecutor(max_workers=max_workers) as pool:
out_path.parent.mkdir(parents=True, exist_ok=True) for i, future in enumerate(futures.as_completed((
attempts = 1 pool.submit(
download,
session=session,
**url
)
for url in urls
))):
file_path, download_size = None, None
try: try:
stream = session.get(url, stream=True) for status_update in future.result():
stream.raise_for_status() if status_update.get("file_downloaded") and status_update.get("written"):
with open(out_path, "wb") as f: file_path = status_update["file_downloaded"]
written = 0 download_size = status_update["written"]
for chunk in stream.iter_content(chunk_size=1024): elif len(urls) == 1:
download_size = len(chunk) # these are per-chunk updates, only useful if it's one big file
f.write(chunk) yield status_update
written += download_size except KeyboardInterrupt:
if progress: DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(advance=1) yield dict(downloaded="[yellow]CANCELLING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[yellow]CANCELLED")
# tell dl that it was cancelled
# the pool is already shut down, so exiting loop is fine
raise
except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[red]FAILING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[red]FAILED")
# tell dl that it failed
# the pool is already shut down, so exiting loop is fine
raise
else:
yield dict(file_downloaded=file_path)
yield dict(advance=1)
now = time.time() now = time.time()
time_since = now - last_speed_refresh time_since = now - last_speed_refresh
if download_size: # no size == skipped dl
download_sizes.append(download_size) download_sizes.append(download_size)
if time_since > 5 or download_size < 1024:
if download_sizes and (time_since > PROGRESS_WINDOW or i == len(urls)):
data_size = sum(download_sizes) data_size = sum(download_sizes)
download_speed = data_size / (time_since or 1) download_speed = math.ceil(data_size / (time_since or 1))
progress(downloaded=f"{filesize.decimal(download_speed)}/s") yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now last_speed_refresh = now
download_sizes.clear() download_sizes.clear()
break
except Exception as e:
if DOWNLOAD_CANCELLED.is_set() or attempts == MAX_ATTEMPTS:
raise e
time.sleep(RETRY_WAIT)
attempts += 1
return 0
__all__ = ("curl_impersonate",) __all__ = ("curl_impersonate",)

View File

@ -1,50 +1,228 @@
import math import math
import os
import time import time
from functools import partial from concurrent import futures
from concurrent.futures.thread import ThreadPoolExecutor
from http.cookiejar import CookieJar
from pathlib import Path from pathlib import Path
from typing import Any, MutableMapping, Optional, Union from typing import Any, Generator, MutableMapping, Optional, Union
from requests import Session from requests import Session
from requests.cookies import RequestsCookieJar from requests.adapters import HTTPAdapter
from rich import filesize from rich import filesize
from devine.core.constants import DOWNLOAD_CANCELLED from devine.core.constants import DOWNLOAD_CANCELLED
from devine.core.utilities import get_extension
MAX_ATTEMPTS = 5 MAX_ATTEMPTS = 5
RETRY_WAIT = 2 RETRY_WAIT = 2
CHUNK_SIZE = 1024
PROGRESS_WINDOW = 5
def download(
url: str,
save_path: Path,
session: Optional[Session] = None,
**kwargs: Any
) -> Generator[dict[str, Any], None, None]:
"""
Download a file using Python Requests.
https://requests.readthedocs.io
Yields the following download status updates while chunks are downloading:
- {total: 123} (there are 123 chunks to download)
- {total: None} (there are an unknown number of chunks to download)
- {advance: 1} (one chunk was downloaded)
- {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
- {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
The data is in the same format accepted by rich's progress.update() function. The
`downloaded` key is custom and is not natively accepted by all rich progress bars.
Parameters:
url: Web URL of a file to download.
save_path: The path to save the file to. If the save path's directory does not
exist then it will be made automatically.
session: The Requests Session to make HTTP requests with. Useful to set Header,
Cookie, and Proxy data. Connections are saved and re-used with the session
so long as the server keeps the connection alive.
kwargs: Any extra keyword arguments to pass to the session.get() call. Use this
for one-time request changes like a header, cookie, or proxy. For example,
to request Byte-ranges use e.g., `headers={"Range": "bytes=0-128"}`.
"""
session = session or Session()
save_dir = save_path.parent
control_file = save_path.with_name(f"{save_path.name}.!dev")
save_dir.mkdir(parents=True, exist_ok=True)
if control_file.exists():
# consider the file corrupt if the control file exists
save_path.unlink(missing_ok=True)
control_file.unlink()
elif save_path.exists():
# if it exists, and no control file, then it should be safe
yield dict(
file_downloaded=save_path,
written=save_path.stat().st_size
)
# TODO: Design a control file format so we know how much of the file is missing
control_file.write_bytes(b"")
attempts = 1
try:
while True:
written = 0
download_sizes = []
last_speed_refresh = time.time()
try:
stream = session.get(url, stream=True, **kwargs)
stream.raise_for_status()
try:
content_length = int(stream.headers.get("Content-Length", "0"))
except ValueError:
content_length = 0
if content_length > 0:
yield dict(total=math.ceil(content_length / CHUNK_SIZE))
else:
# we have no data to calculate total chunks
yield dict(total=None) # indeterminate mode
with open(save_path, "wb") as f:
for chunk in stream.iter_content(chunk_size=CHUNK_SIZE):
download_size = len(chunk)
f.write(chunk)
written += download_size
yield dict(advance=1)
now = time.time()
time_since = now - last_speed_refresh
download_sizes.append(download_size)
if time_since > PROGRESS_WINDOW or download_size < CHUNK_SIZE:
data_size = sum(download_sizes)
download_speed = math.ceil(data_size / (time_since or 1))
yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
yield dict(
file_downloaded=save_path,
written=written
)
break
except Exception as e:
save_path.unlink(missing_ok=True)
if DOWNLOAD_CANCELLED.is_set() or attempts == MAX_ATTEMPTS:
raise e
time.sleep(RETRY_WAIT)
attempts += 1
finally:
control_file.unlink()
def requests( def requests(
uri: Union[str, list[str]], urls: Union[str, list[str], dict[str, Any], list[dict[str, Any]]],
out: Path, output_dir: Path,
headers: Optional[dict] = None, filename: str,
cookies: Optional[Union[MutableMapping[str, str], RequestsCookieJar]] = None, headers: Optional[MutableMapping[str, Union[str, bytes]]] = None,
cookies: Optional[Union[MutableMapping[str, str], CookieJar]] = None,
proxy: Optional[str] = None, proxy: Optional[str] = None,
progress: Optional[partial] = None, max_workers: Optional[int] = None
*_: Any, ) -> Generator[dict[str, Any], None, None]:
**__: Any
) -> int:
""" """
Download files using Python Requests. Download a file using Python Requests.
https://requests.readthedocs.io https://requests.readthedocs.io
If multiple URLs are provided they will be downloaded in the provided order Yields the following download status updates while chunks are downloading:
to the output directory. They will not be merged together.
"""
if isinstance(uri, list) and len(uri) == 1:
uri = uri[0]
if isinstance(uri, list): - {total: 123} (there are 123 chunks to download)
if out.is_file(): - {total: None} (there are an unknown number of chunks to download)
raise ValueError("Expecting out to be a Directory path not a File as multiple URLs were provided") - {advance: 1} (one chunk was downloaded)
uri = [ - {downloaded: "10.1 MB/s"} (currently downloading at a rate of 10.1 MB/s)
(url, out / f"{i:08}.mp4") - {file_downloaded: Path(...), written: 1024} (download finished, has the save path and size)
for i, url in enumerate(uri)
The data is in the same format accepted by rich's progress.update() function.
However, The `downloaded`, `file_downloaded` and `written` keys are custom and not
natively accepted by rich progress bars.
Parameters:
urls: Web URL(s) to file(s) to download. You can use a dictionary with the key
"url" for the URI, and other keys for extra arguments to use per-URL.
output_dir: The folder to save the file into. If the save path's directory does
not exist then it will be made automatically.
filename: The filename or filename template to use for each file. The variables
you can use are `i` for the URL index and `ext` for the URL extension.
headers: A mapping of HTTP Header Key/Values to use for all downloads.
cookies: A mapping of Cookie Key/Values or a Cookie Jar to use for all downloads.
proxy: An optional proxy URI to route connections through for all downloads.
max_workers: The maximum amount of threads to use for downloads. Defaults to
min(32,(cpu_count+4)).
"""
if not urls:
raise ValueError("urls must be provided and not empty")
elif not isinstance(urls, (str, dict, list)):
raise TypeError(f"Expected urls to be {str} or {dict} or a list of one of them, not {type(urls)}")
if not output_dir:
raise ValueError("output_dir must be provided")
elif not isinstance(output_dir, Path):
raise TypeError(f"Expected output_dir to be {Path}, not {type(output_dir)}")
if not filename:
raise ValueError("filename must be provided")
elif not isinstance(filename, str):
raise TypeError(f"Expected filename to be {str}, not {type(filename)}")
if not isinstance(headers, (MutableMapping, type(None))):
raise TypeError(f"Expected headers to be {MutableMapping}, not {type(headers)}")
if not isinstance(cookies, (MutableMapping, CookieJar, type(None))):
raise TypeError(f"Expected cookies to be {MutableMapping} or {CookieJar}, not {type(cookies)}")
if not isinstance(proxy, (str, type(None))):
raise TypeError(f"Expected proxy to be {str}, not {type(proxy)}")
if not isinstance(max_workers, (int, type(None))):
raise TypeError(f"Expected max_workers to be {int}, not {type(max_workers)}")
if not isinstance(urls, list):
urls = [urls]
if not max_workers:
max_workers = min(32, (os.cpu_count() or 1) + 4)
urls = [
dict(
save_path=save_path,
**url
) if isinstance(url, dict) else dict(
url=url,
save_path=save_path
)
for i, url in enumerate(urls)
for save_path in [output_dir / filename.format(
i=i,
ext=get_extension(url["url"] if isinstance(url, dict) else url)
)]
] ]
else:
uri = [(uri, out.parent / out.name)]
session = Session() session = Session()
session.mount("https://", HTTPAdapter(
pool_connections=max_workers,
pool_maxsize=max_workers,
pool_block=True
))
session.mount("http://", session.adapters["https://"])
if headers: if headers:
headers = { headers = {
k: v k: v
@ -57,53 +235,61 @@ def requests(
if proxy: if proxy:
session.proxies.update({"all": proxy}) session.proxies.update({"all": proxy})
if progress: yield dict(total=len(urls))
progress(total=len(uri))
download_sizes = [] download_sizes = []
last_speed_refresh = time.time() last_speed_refresh = time.time()
for url, out_path in uri: with ThreadPoolExecutor(max_workers=max_workers) as pool:
out_path.parent.mkdir(parents=True, exist_ok=True) for i, future in enumerate(futures.as_completed((
attempts = 1 pool.submit(
download,
while True: session=session,
**url
)
for url in urls
))):
file_path, download_size = None, None
try: try:
stream = session.get(url, stream=True) for status_update in future.result():
stream.raise_for_status() if status_update.get("file_downloaded") and status_update.get("written"):
file_path = status_update["file_downloaded"]
if len(uri) == 1 and progress: download_size = status_update["written"]
content_length = int(stream.headers.get("Content-Length", "0")) elif len(urls) == 1:
if content_length > 0: # these are per-chunk updates, only useful if it's one big file
progress(total=math.ceil(content_length / 1024)) yield status_update
except KeyboardInterrupt:
with open(out_path, "wb") as f: DOWNLOAD_CANCELLED.set() # skip pending track downloads
written = 0 yield dict(downloaded="[yellow]CANCELLING")
for chunk in stream.iter_content(chunk_size=1024): pool.shutdown(wait=True, cancel_futures=True)
download_size = len(chunk) yield dict(downloaded="[yellow]CANCELLED")
f.write(chunk) # tell dl that it was cancelled
written += download_size # the pool is already shut down, so exiting loop is fine
if progress: raise
progress(advance=1) except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
yield dict(downloaded="[red]FAILING")
pool.shutdown(wait=True, cancel_futures=True)
yield dict(downloaded="[red]FAILED")
# tell dl that it failed
# the pool is already shut down, so exiting loop is fine
raise
else:
yield dict(file_downloaded=file_path, written=download_size)
yield dict(advance=1)
now = time.time() now = time.time()
time_since = now - last_speed_refresh time_since = now - last_speed_refresh
if download_size: # no size == skipped dl
download_sizes.append(download_size) download_sizes.append(download_size)
if time_since > 5 or download_size < 1024:
if download_sizes and (time_since > PROGRESS_WINDOW or i == len(urls)):
data_size = sum(download_sizes) data_size = sum(download_sizes)
download_speed = data_size / (time_since or 1) download_speed = math.ceil(data_size / (time_since or 1))
progress(downloaded=f"{filesize.decimal(download_speed)}/s") yield dict(downloaded=f"{filesize.decimal(download_speed)}/s")
last_speed_refresh = now last_speed_refresh = now
download_sizes.clear() download_sizes.clear()
break
except Exception as e:
if DOWNLOAD_CANCELLED.is_set() or attempts == MAX_ATTEMPTS:
raise e
time.sleep(RETRY_WAIT)
attempts += 1
return 0
__all__ = ("requests",) __all__ = ("requests",)

View File

@ -6,10 +6,10 @@ from pathlib import Path
from typing import Optional, Union from typing import Optional, Union
from urllib.parse import urljoin from urllib.parse import urljoin
import requests
from Cryptodome.Cipher import AES from Cryptodome.Cipher import AES
from Cryptodome.Util.Padding import pad, unpad from Cryptodome.Util.Padding import pad, unpad
from m3u8.model import Key from m3u8.model import Key
from requests import Session
class ClearKey: class ClearKey:
@ -58,14 +58,33 @@ class ClearKey:
shutil.move(decrypted_path, path) shutil.move(decrypted_path, path)
@classmethod @classmethod
def from_m3u_key(cls, m3u_key: Key, proxy: Optional[str] = None) -> ClearKey: def from_m3u_key(cls, m3u_key: Key, session: Optional[Session] = None) -> ClearKey:
"""
Load a ClearKey from an M3U(8) Playlist's EXT-X-KEY.
Parameters:
m3u_key: A Key object parsed from a m3u(8) playlist using
the `m3u8` library.
session: Optional session used to request external URIs with.
Useful to set headers, proxies, cookies, and so forth.
"""
if not isinstance(m3u_key, Key): if not isinstance(m3u_key, Key):
raise ValueError(f"Provided M3U Key is in an unexpected type {m3u_key!r}") raise ValueError(f"Provided M3U Key is in an unexpected type {m3u_key!r}")
if not isinstance(session, (Session, type(None))):
raise TypeError(f"Expected session to be a {Session}, not a {type(session)}")
if not m3u_key.method.startswith("AES"): if not m3u_key.method.startswith("AES"):
raise ValueError(f"Provided M3U Key is not an AES Clear Key, {m3u_key.method}") raise ValueError(f"Provided M3U Key is not an AES Clear Key, {m3u_key.method}")
if not m3u_key.uri: if not m3u_key.uri:
raise ValueError("No URI in M3U Key, unable to get Key.") raise ValueError("No URI in M3U Key, unable to get Key.")
if not session:
session = Session()
if not session.headers.get("User-Agent"):
# commonly needed default for HLS playlists
session.headers["User-Agent"] = "smartexoplayer/1.1.0 (Linux;Android 8.0.0) ExoPlayerLib/2.13.3"
if m3u_key.uri.startswith("data:"): if m3u_key.uri.startswith("data:"):
media_types, data = m3u_key.uri[5:].split(",") media_types, data = m3u_key.uri[5:].split(",")
media_types = media_types.split(";") media_types = media_types.split(";")
@ -74,13 +93,7 @@ class ClearKey:
key = data key = data
else: else:
url = urljoin(m3u_key.base_uri, m3u_key.uri) url = urljoin(m3u_key.base_uri, m3u_key.uri)
res = requests.get( res = session.get(url)
url=url,
headers={
"User-Agent": "smartexoplayer/1.1.0 (Linux;Android 8.0.0) ExoPlayerLib/2.13.3"
},
proxies={"all": proxy} if proxy else None
)
res.raise_for_status() res.raise_for_status()
if not res.content: if not res.content:
raise EOFError("Unexpected Empty Response by M3U Key URI.") raise EOFError("Unexpected Empty Response by M3U Key URI.")

View File

@ -78,7 +78,7 @@ class Widevine:
pssh_boxes: list[Container] = [] pssh_boxes: list[Container] = []
tenc_boxes: list[Container] = [] tenc_boxes: list[Container] = []
if track.descriptor == track.Descriptor.M3U: if track.descriptor == track.Descriptor.HLS:
m3u_url = track.url m3u_url = track.url
master = m3u8.loads(session.get(m3u_url).text, uri=m3u_url) master = m3u8.loads(session.get(m3u_url).text, uri=m3u_url)
pssh_boxes.extend( pssh_boxes.extend(
@ -224,7 +224,7 @@ class Widevine:
raise ValueError("Cannot decrypt a Track without any Content Keys...") raise ValueError("Cannot decrypt a Track without any Content Keys...")
platform = {"win32": "win", "darwin": "osx"}.get(sys.platform, sys.platform) platform = {"win32": "win", "darwin": "osx"}.get(sys.platform, sys.platform)
executable = get_binary_path("shaka-packager", f"packager-{platform}", f"packager-{platform}-x64") executable = get_binary_path("shaka-packager", "packager", f"packager-{platform}", f"packager-{platform}-x64")
if not executable: if not executable:
raise EnvironmentError("Shaka Packager executable not found but is required.") raise EnvironmentError("Shaka Packager executable not found but is required.")
if not path or not path.exists(): if not path or not path.exists():

View File

@ -6,28 +6,22 @@ import logging
import math import math
import re import re
import sys import sys
import time
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor
from copy import copy from copy import copy
from functools import partial from functools import partial
from hashlib import md5
from pathlib import Path from pathlib import Path
from typing import Any, Callable, MutableMapping, Optional, Union from typing import Any, Callable, Optional, Union
from urllib.parse import urljoin, urlparse from urllib.parse import urljoin, urlparse
from uuid import UUID from uuid import UUID
from zlib import crc32
import requests import requests
from langcodes import Language, tag_is_valid from langcodes import Language, tag_is_valid
from lxml.etree import Element from lxml.etree import Element, ElementTree
from pywidevine.cdm import Cdm as WidevineCdm from pywidevine.cdm import Cdm as WidevineCdm
from pywidevine.pssh import PSSH from pywidevine.pssh import PSSH
from requests import Session from requests import Session
from requests.cookies import RequestsCookieJar
from rich import filesize
from devine.core.constants import DOWNLOAD_CANCELLED, AnyTrack from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY, AnyTrack
from devine.core.downloaders import downloader
from devine.core.downloaders import requests as requests_downloader from devine.core.downloaders import requests as requests_downloader
from devine.core.drm import Widevine from devine.core.drm import Widevine
from devine.core.tracks import Audio, Subtitle, Tracks, Video from devine.core.tracks import Audio, Subtitle, Tracks, Video
@ -119,6 +113,7 @@ class DASH:
for rep in adaptation_set.findall("Representation"): for rep in adaptation_set.findall("Representation"):
get = partial(self._get, adaptation_set=adaptation_set, representation=rep) get = partial(self._get, adaptation_set=adaptation_set, representation=rep)
findall = partial(self._findall, adaptation_set=adaptation_set, representation=rep, both=True) findall = partial(self._findall, adaptation_set=adaptation_set, representation=rep, both=True)
segment_base = rep.find("SegmentBase")
codecs = get("codecs") codecs = get("codecs")
content_type = get("contentType") content_type = get("contentType")
@ -146,6 +141,10 @@ class DASH:
if content_type == "video": if content_type == "video":
track_type = Video track_type = Video
track_codec = Video.Codec.from_codecs(codecs) track_codec = Video.Codec.from_codecs(codecs)
track_fps = get("frameRate")
if not track_fps and segment_base is not None:
track_fps = segment_base.get("timescale")
track_args = dict( track_args = dict(
range_=self.get_video_range( range_=self.get_video_range(
codecs, codecs,
@ -155,7 +154,7 @@ class DASH:
bitrate=get("bandwidth") or None, bitrate=get("bandwidth") or None,
width=get("width") or 0, width=get("width") or 0,
height=get("height") or 0, height=get("height") or 0,
fps=get("frameRate") or (rep.find("SegmentBase") or {}).get("timescale") or None fps=track_fps or None
) )
elif content_type == "audio": elif content_type == "audio":
track_type = Audio track_type = Audio
@ -173,8 +172,9 @@ class DASH:
track_type = Subtitle track_type = Subtitle
track_codec = Subtitle.Codec.from_codecs(codecs or "vtt") track_codec = Subtitle.Codec.from_codecs(codecs or "vtt")
track_args = dict( track_args = dict(
forced=self.is_forced(adaptation_set), cc=self.is_closed_caption(adaptation_set),
cc=self.is_closed_caption(adaptation_set) sdh=self.is_sdh(adaptation_set),
forced=self.is_forced(adaptation_set)
) )
elif content_type == "image": elif content_type == "image":
# we don't want what's likely thumbnails for the seekbar # we don't want what's likely thumbnails for the seekbar
@ -195,23 +195,30 @@ class DASH:
# a good and actually unique track ID, sometimes because of the lang # a good and actually unique track ID, sometimes because of the lang
# dialect not being represented in the id, or the bitrate, or such. # dialect not being represented in the id, or the bitrate, or such.
# this combines all of them as one and hashes it to keep it small(ish). # this combines all of them as one and hashes it to keep it small(ish).
track_id = md5("{codec}-{lang}-{bitrate}-{base_url}-{ids}-{track_args}".format( track_id = hex(crc32("{codec}-{lang}-{bitrate}-{base_url}-{ids}-{track_args}".format(
codec=codecs, codec=codecs,
lang=track_lang, lang=track_lang,
bitrate=get("bitrate"), bitrate=get("bitrate"),
base_url=(rep.findtext("BaseURL") or "").split("?")[0], base_url=(rep.findtext("BaseURL") or "").split("?")[0],
ids=[get("audioTrackId"), get("id"), period.get("id")], ids=[get("audioTrackId"), get("id"), period.get("id")],
track_args=track_args track_args=track_args
).encode()).hexdigest() ).encode()))[2:]
tracks.add(track_type( tracks.add(track_type(
id_=track_id, id_=track_id,
url=(self.url, self.manifest, rep, adaptation_set, period), url=self.url,
codec=track_codec, codec=track_codec,
language=track_lang, language=track_lang,
is_original_lang=language and is_close_match(track_lang, [language]), is_original_lang=language and is_close_match(track_lang, [language]),
descriptor=Video.Descriptor.MPD, descriptor=Video.Descriptor.DASH,
extra=(rep, adaptation_set), data={
"dash": {
"manifest": self.manifest,
"period": period,
"adaptation_set": adaptation_set,
"representation": rep
}
},
**track_args **track_args
)) ))
@ -242,18 +249,21 @@ class DASH:
log = logging.getLogger("DASH") log = logging.getLogger("DASH")
manifest_url, manifest, representation, adaptation_set, period = track.url manifest: ElementTree = track.data["dash"]["manifest"]
period: Element = track.data["dash"]["period"]
adaptation_set: Element = track.data["dash"]["adaptation_set"]
representation: Element = track.data["dash"]["representation"]
track.drm = DASH.get_drm( track.drm = DASH.get_drm(
representation.findall("ContentProtection") + representation.findall("ContentProtection") +
adaptation_set.findall("ContentProtection") adaptation_set.findall("ContentProtection")
) )
manifest_url_query = urlparse(manifest_url).query
manifest_base_url = manifest.findtext("BaseURL") manifest_base_url = manifest.findtext("BaseURL")
if not manifest_base_url or not re.match("^https?://", manifest_base_url, re.IGNORECASE): if not manifest_base_url:
manifest_base_url = urljoin(manifest_url, "./", manifest_base_url) manifest_base_url = track.url
elif not re.match("^https?://", manifest_base_url, re.IGNORECASE):
manifest_base_url = urljoin(track.url, f"./{manifest_base_url}")
period_base_url = urljoin(manifest_base_url, period.findtext("BaseURL")) period_base_url = urljoin(manifest_base_url, period.findtext("BaseURL"))
rep_base_url = urljoin(period_base_url, representation.findtext("BaseURL")) rep_base_url = urljoin(period_base_url, representation.findtext("BaseURL"))
@ -268,13 +278,10 @@ class DASH:
if segment_list is None: if segment_list is None:
segment_list = adaptation_set.find("SegmentList") segment_list = adaptation_set.find("SegmentList")
if segment_template is None and segment_list is None and rep_base_url: segment_base = representation.find("SegmentBase")
# If there's no SegmentTemplate and no SegmentList, then SegmentBase is used or just BaseURL if segment_base is None:
# Regardless which of the two is used, we can just directly grab the BaseURL segment_base = adaptation_set.find("SegmentBase")
# Players would normally calculate segments via Byte-Ranges, but we don't care
track.url = rep_base_url
track.descriptor = track.Descriptor.URL
else:
segments: list[tuple[str, Optional[str]]] = [] segments: list[tuple[str, Optional[str]]] = []
track_kid: Optional[UUID] = None track_kid: Optional[UUID] = None
@ -291,7 +298,9 @@ class DASH:
if not rep_base_url: if not rep_base_url:
raise ValueError("Resolved Segment URL is not absolute, and no Base URL is available.") raise ValueError("Resolved Segment URL is not absolute, and no Base URL is available.")
value = urljoin(rep_base_url, value) value = urljoin(rep_base_url, value)
if not urlparse(value).query and manifest_url_query: if not urlparse(value).query:
manifest_url_query = urlparse(track.url).query
if manifest_url_query:
value += f"?{manifest_url_query}" value += f"?{manifest_url_query}"
segment_template.set(item, value) segment_template.set(item, value)
@ -350,8 +359,10 @@ class DASH:
initialization = segment_list.find("Initialization") initialization = segment_list.find("Initialization")
if initialization is not None: if initialization is not None:
source_url = initialization.get("sourceURL") source_url = initialization.get("sourceURL")
if source_url is None: if not source_url:
source_url = rep_base_url source_url = rep_base_url
elif not re.match("^https?://", source_url, re.IGNORECASE):
source_url = urljoin(rep_base_url, f"./{source_url}")
if initialization.get("range"): if initialization.get("range"):
init_range_header = {"Range": f"bytes={initialization.get('range')}"} init_range_header = {"Range": f"bytes={initialization.get('range')}"}
@ -366,16 +377,45 @@ class DASH:
segment_urls = segment_list.findall("SegmentURL") segment_urls = segment_list.findall("SegmentURL")
for segment_url in segment_urls: for segment_url in segment_urls:
media_url = segment_url.get("media") media_url = segment_url.get("media")
if media_url is None: if not media_url:
media_url = rep_base_url media_url = rep_base_url
elif not re.match("^https?://", media_url, re.IGNORECASE):
media_url = urljoin(rep_base_url, f"./{media_url}")
segments.append(( segments.append((
media_url, media_url,
segment_url.get("mediaRange") segment_url.get("mediaRange")
)) ))
elif segment_base is not None:
media_range = None
init_data = None
initialization = segment_base.find("Initialization")
if initialization is not None:
if initialization.get("range"):
init_range_header = {"Range": f"bytes={initialization.get('range')}"}
else:
init_range_header = None
res = session.get(url=rep_base_url, headers=init_range_header)
res.raise_for_status()
init_data = res.content
track_kid = track.get_key_id(init_data)
total_size = res.headers.get("Content-Range", "").split("/")[-1]
if total_size:
media_range = f"{len(init_data)}-{total_size}"
segments.append((
rep_base_url,
media_range
))
elif rep_base_url:
segments.append((
rep_base_url,
None
))
else: else:
log.error("Could not find a way to get segments from this MPD manifest.") log.error("Could not find a way to get segments from this MPD manifest.")
log.debug(manifest_url) log.debug(track.url)
sys.exit(1) sys.exit(1)
if not track.drm and isinstance(track, (Video, Audio)): if not track.drm and isinstance(track, (Video, Audio)):
@ -405,64 +445,58 @@ class DASH:
else: else:
drm = None drm = None
if DOWNLOAD_LICENCE_ONLY.is_set():
progress(downloaded="[yellow]SKIPPED")
return
progress(total=len(segments)) progress(total=len(segments))
download_sizes = [] downloader = track.downloader
download_speed_window = 5 if downloader.__name__ == "aria2c" and any(bytes_range is not None for url, bytes_range in segments):
last_speed_refresh = time.time() # aria2(c) is shit and doesn't support the Range header, fallback to the requests downloader
downloader = requests_downloader
with ThreadPoolExecutor(max_workers=16) as pool: for status_update in downloader(
for i, download in enumerate(futures.as_completed(( urls=[
pool.submit( {
DASH.download_segment, "url": url,
url=url, "headers": {
out_path=(save_dir / str(n).zfill(len(str(len(segments))))).with_suffix(".mp4"), "Range": f"bytes={bytes_range}"
track=track, } if bytes_range else {}
proxy=proxy, }
for url, bytes_range in segments
],
output_dir=save_dir,
filename="{i:0%d}.mp4" % (len(str(len(segments)))),
headers=session.headers, headers=session.headers,
cookies=session.cookies, cookies=session.cookies,
bytes_range=bytes_range proxy=proxy,
) max_workers=16
for n, (url, bytes_range) in enumerate(segments) ):
))): file_downloaded = status_update.get("file_downloaded")
try: if file_downloaded and callable(track.OnSegmentDownloaded):
download_size = download.result() track.OnSegmentDownloaded(file_downloaded)
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[yellow]CANCELLING")
pool.shutdown(wait=True, cancel_futures=True)
progress(downloaded="[yellow]CANCELLED")
# tell dl that it was cancelled
# the pool is already shut down, so exiting loop is fine
raise
except Exception:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[red]FAILING")
pool.shutdown(wait=True, cancel_futures=True)
progress(downloaded="[red]FAILED")
# tell dl that it failed
# the pool is already shut down, so exiting loop is fine
raise
else: else:
progress(advance=1) downloaded = status_update.get("downloaded")
if downloaded and downloaded.endswith("/s"):
status_update["downloaded"] = f"DASH {downloaded}"
progress(**status_update)
now = time.time() # see https://github.com/devine-dl/devine/issues/71
time_since = now - last_speed_refresh for control_file in save_dir.glob("*.aria2__temp"):
control_file.unlink()
if download_size: # no size == skipped dl
download_sizes.append(download_size)
if download_sizes and (time_since > download_speed_window or i == len(segments)):
data_size = sum(download_sizes)
download_speed = data_size / (time_since or 1)
progress(downloaded=f"DASH {filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
segments_to_merge = [
x
for x in sorted(save_dir.iterdir())
if x.is_file()
]
with open(save_path, "wb") as f: with open(save_path, "wb") as f:
if init_data: if init_data:
f.write(init_data) f.write(init_data)
for segment_file in sorted(save_dir.iterdir()): if len(segments_to_merge) > 1:
progress(downloaded="Merging", completed=0, total=len(segments_to_merge))
for segment_file in segments_to_merge:
segment_data = segment_file.read_bytes() segment_data = segment_file.read_bytes()
# TODO: fix encoding after decryption? # TODO: fix encoding after decryption?
if ( if (
@ -470,86 +504,31 @@ class DASH:
track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML) track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML)
): ):
segment_data = try_ensure_utf8(segment_data) segment_data = try_ensure_utf8(segment_data)
segment_data = html.unescape(segment_data.decode("utf8")).encode("utf8") segment_data = segment_data.decode("utf8"). \
replace("&lrm;", html.unescape("&lrm;")). \
replace("&rlm;", html.unescape("&rlm;")). \
encode("utf8")
f.write(segment_data) f.write(segment_data)
f.flush()
segment_file.unlink() segment_file.unlink()
progress(advance=1)
track.path = save_path
if callable(track.OnDownloaded):
track.OnDownloaded()
if drm: if drm:
progress(downloaded="Decrypting", completed=0, total=100) progress(downloaded="Decrypting", completed=0, total=100)
drm.decrypt(save_path) drm.decrypt(save_path)
track.drm = None track.drm = None
if callable(track.OnDecrypted): if callable(track.OnDecrypted):
track.OnDecrypted(track) track.OnDecrypted(drm)
progress(downloaded="Decrypted", completed=100) progress(downloaded="Decrypting", advance=100)
track.path = save_path
save_dir.rmdir() save_dir.rmdir()
progress(downloaded="Downloaded") progress(downloaded="Downloaded")
@staticmethod
def download_segment(
url: str,
out_path: Path,
track: AnyTrack,
proxy: Optional[str] = None,
headers: Optional[MutableMapping[str, str | bytes]] = None,
cookies: Optional[Union[MutableMapping[str, str], RequestsCookieJar]] = None,
bytes_range: Optional[str] = None
) -> int:
"""
Download a DASH Media Segment.
Parameters:
url: Full HTTP(S) URL to the Segment you want to download.
out_path: Path to save the downloaded Segment file to.
track: The Track object of which this Segment is for. Currently only used to
fix an invalid value in the TFHD box of Audio Tracks.
proxy: Proxy URI to use when downloading the Segment file.
headers: HTTP Headers to send when requesting the Segment file.
cookies: Cookies to send when requesting the Segment file. The actual cookies sent
will be resolved based on the URI among other parameters. Multiple cookies with
the same name but a different domain/path are resolved.
bytes_range: Download only specific bytes of the Segment file using the Range header.
Returns the file size of the downloaded Segment in bytes.
"""
if DOWNLOAD_CANCELLED.is_set():
raise KeyboardInterrupt()
if bytes_range:
# aria2(c) doesn't support byte ranges, use python-requests
downloader_ = requests_downloader
headers_ = dict(**headers, Range=f"bytes={bytes_range}")
else:
downloader_ = downloader
headers_ = headers
downloader_(
uri=url,
out=out_path,
headers=headers_,
cookies=cookies,
proxy=proxy,
segmented=True
)
# fix audio decryption on ATVP by fixing the sample description index
# TODO: Should this be done in the video data or the init data?
if isinstance(track, Audio):
with open(out_path, "rb+") as f:
segment_data = f.read()
fixed_segment_data = re.sub(
b"(tfhd\x00\x02\x00\x1a\x00\x00\x00\x01\x00\x00\x00)\x02",
b"\\g<1>\x01",
segment_data
)
if fixed_segment_data != segment_data:
f.seek(0)
f.write(fixed_segment_data)
return out_path.stat().st_size
@staticmethod @staticmethod
def _get( def _get(
item: str, item: str,
@ -685,6 +664,14 @@ class DASH:
for x in adaptation_set.findall("Role") for x in adaptation_set.findall("Role")
) )
@staticmethod
def is_sdh(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is for the Hearing Impaired."""
return any(
(x.get("schemeIdUri"), x.get("value")) == ("urn:tva:metadata:cs:AudioPurposeCS:2007", "2")
for x in adaptation_set.findall("Accessibility")
)
@staticmethod @staticmethod
def is_closed_caption(adaptation_set: Element) -> bool: def is_closed_caption(adaptation_set: Element) -> bool:
"""Check if contents of Adaptation Set is a Closed Caption Subtitle.""" """Check if contents of Adaptation Set is a Closed Caption Subtitle."""

View File

@ -2,18 +2,14 @@ from __future__ import annotations
import html import html
import logging import logging
import re import shutil
import subprocess
import sys import sys
import time
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor
from functools import partial from functools import partial
from hashlib import md5
from pathlib import Path from pathlib import Path
from queue import Queue
from threading import Lock
from typing import Any, Callable, Optional, Union from typing import Any, Callable, Optional, Union
from urllib.parse import urljoin from urllib.parse import urljoin
from zlib import crc32
import m3u8 import m3u8
import requests import requests
@ -22,14 +18,12 @@ from m3u8 import M3U8
from pywidevine.cdm import Cdm as WidevineCdm from pywidevine.cdm import Cdm as WidevineCdm
from pywidevine.pssh import PSSH from pywidevine.pssh import PSSH
from requests import Session from requests import Session
from rich import filesize
from devine.core.constants import DOWNLOAD_CANCELLED, AnyTrack from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY, AnyTrack
from devine.core.downloaders import downloader
from devine.core.downloaders import requests as requests_downloader from devine.core.downloaders import requests as requests_downloader
from devine.core.drm import DRM_T, ClearKey, Widevine from devine.core.drm import DRM_T, ClearKey, Widevine
from devine.core.tracks import Audio, Subtitle, Tracks, Video from devine.core.tracks import Audio, Subtitle, Tracks, Video
from devine.core.utilities import is_close_match, try_ensure_utf8 from devine.core.utilities import get_binary_path, get_extension, is_close_match, try_ensure_utf8
class HLS: class HLS:
@ -93,7 +87,7 @@ class HLS:
All Track objects' URL will be to another M3U(8) document. However, these documents All Track objects' URL will be to another M3U(8) document. However, these documents
will be Invariant Playlists and contain the list of segments URIs among other metadata. will be Invariant Playlists and contain the list of segments URIs among other metadata.
""" """
session_drm = HLS.get_drm(self.manifest.session_keys) session_drm = HLS.get_all_drm(self.manifest.session_keys)
audio_codecs_by_group_id: dict[str, Audio.Codec] = {} audio_codecs_by_group_id: dict[str, Audio.Codec] = {}
tracks = Tracks() tracks = Tracks()
@ -113,15 +107,19 @@ class HLS:
primary_track_type = Video primary_track_type = Video
tracks.add(primary_track_type( tracks.add(primary_track_type(
id_=md5(str(playlist).encode()).hexdigest()[0:7], # 7 chars only for filename length id_=hex(crc32(str(playlist).encode()))[2:],
url=urljoin(playlist.base_uri, playlist.uri), url=urljoin(playlist.base_uri, playlist.uri),
codec=primary_track_type.Codec.from_codecs(playlist.stream_info.codecs), codec=primary_track_type.Codec.from_codecs(playlist.stream_info.codecs),
language=language, # HLS manifests do not seem to have language info language=language, # HLS manifests do not seem to have language info
is_original_lang=True, # TODO: All we can do is assume Yes is_original_lang=True, # TODO: All we can do is assume Yes
bitrate=playlist.stream_info.average_bandwidth or playlist.stream_info.bandwidth, bitrate=playlist.stream_info.average_bandwidth or playlist.stream_info.bandwidth,
descriptor=Video.Descriptor.M3U, descriptor=Video.Descriptor.HLS,
drm=session_drm, drm=session_drm,
extra=playlist, data={
"hls": {
"playlist": playlist
}
},
# video track args # video track args
**(dict( **(dict(
range_=Video.Range.DV if any( range_=Video.Range.DV if any(
@ -164,14 +162,18 @@ class HLS:
raise ValueError(msg) raise ValueError(msg)
tracks.add(track_type( tracks.add(track_type(
id_=md5(str(media).encode()).hexdigest()[0:6], # 6 chars only for filename length id_=hex(crc32(str(media).encode()))[2:],
url=urljoin(media.base_uri, media.uri), url=urljoin(media.base_uri, media.uri),
codec=codec, codec=codec,
language=track_lang, # HLS media may not have language info, fallback if needed language=track_lang, # HLS media may not have language info, fallback if needed
is_original_lang=language and is_close_match(track_lang, [language]), is_original_lang=language and is_close_match(track_lang, [language]),
descriptor=Audio.Descriptor.M3U, descriptor=Audio.Descriptor.HLS,
drm=session_drm if media.type == "AUDIO" else None, drm=session_drm if media.type == "AUDIO" else None,
extra=media, data={
"hls": {
"media": media
}
},
# audio track args # audio track args
**(dict( **(dict(
bitrate=0, # TODO: M3U doesn't seem to state bitrate? bitrate=0, # TODO: M3U doesn't seem to state bitrate?
@ -236,287 +238,442 @@ class HLS:
else: else:
session_drm = None session_drm = None
progress(total=len(master.segments)) unwanted_segments = [
segment for segment in master.segments
if callable(track.OnSegmentFilter) and track.OnSegmentFilter(segment)
]
download_sizes = [] total_segments = len(master.segments) - len(unwanted_segments)
download_speed_window = 5 progress(total=total_segments)
last_speed_refresh = time.time()
segment_key = Queue(maxsize=1) downloader = track.downloader
segment_key.put((session_drm, None))
init_data = Queue(maxsize=1)
init_data.put(None)
range_offset = Queue(maxsize=1)
range_offset.put(0)
drm_lock = Lock()
with ThreadPoolExecutor(max_workers=16) as pool: urls: list[dict[str, Any]] = []
for i, download in enumerate(futures.as_completed(( range_offset = 0
pool.submit( for segment in master.segments:
HLS.download_segment, if segment in unwanted_segments:
segment=segment, continue
out_path=(save_dir / str(n).zfill(len(str(len(master.segments))))).with_suffix(".mp4"),
track=track, if segment.byterange:
init_data=init_data, if downloader.__name__ == "aria2c":
segment_key=segment_key, # aria2(c) is shit and doesn't support the Range header, fallback to the requests downloader
range_offset=range_offset, downloader = requests_downloader
drm_lock=drm_lock, byte_range = HLS.calculate_byte_range(segment.byterange, range_offset)
progress=progress, range_offset = byte_range.split("-")[0]
license_widevine=license_widevine,
session=session,
proxy=proxy
)
for n, segment in enumerate(master.segments)
))):
try:
download_size = download.result()
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[yellow]CANCELLING")
pool.shutdown(wait=True, cancel_futures=True)
progress(downloaded="[yellow]CANCELLED")
# tell dl that it was cancelled
# the pool is already shut down, so exiting loop is fine
raise
except Exception as e:
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[red]FAILING")
pool.shutdown(wait=True, cancel_futures=True)
progress(downloaded="[red]FAILED")
# tell dl that it failed
# the pool is already shut down, so exiting loop is fine
raise e
else: else:
# it successfully downloaded, and it was not cancelled byte_range = None
urls.append({
"url": urljoin(segment.base_uri, segment.uri),
"headers": {
"Range": f"bytes={byte_range}"
} if byte_range else {}
})
segment_save_dir = save_dir / "segments"
for status_update in downloader(
urls=urls,
output_dir=segment_save_dir,
filename="{i:0%d}{ext}" % len(str(len(urls))),
headers=session.headers,
cookies=session.cookies,
proxy=proxy,
max_workers=16
):
file_downloaded = status_update.get("file_downloaded")
if file_downloaded and callable(track.OnSegmentDownloaded):
track.OnSegmentDownloaded(file_downloaded)
else:
downloaded = status_update.get("downloaded")
if downloaded and downloaded.endswith("/s"):
status_update["downloaded"] = f"HLS {downloaded}"
progress(**status_update)
# see https://github.com/devine-dl/devine/issues/71
for control_file in segment_save_dir.glob("*.aria2__temp"):
control_file.unlink()
progress(total=total_segments, completed=0, downloaded="Merging")
name_len = len(str(total_segments))
discon_i = 0
range_offset = 0
map_data: Optional[tuple[m3u8.model.InitializationSection, bytes]] = None
if session_drm:
encryption_data: Optional[tuple[Optional[m3u8.Key], DRM_T]] = (None, session_drm)
else:
encryption_data: Optional[tuple[Optional[m3u8.Key], DRM_T]] = None
i = -1
for real_i, segment in enumerate(master.segments):
if segment not in unwanted_segments:
i += 1
is_last_segment = (real_i + 1) == len(master.segments)
def merge(to: Path, via: list[Path], delete: bool = False, include_map_data: bool = False):
"""
Merge all files to a given path, optionally including map data.
Parameters:
to: The output file with all merged data.
via: List of files to merge, in sequence.
delete: Delete the file once it's been merged.
include_map_data: Whether to include the init map data.
"""
with open(to, "wb") as x:
if include_map_data and map_data and map_data[1]:
x.write(map_data[1])
for file in via:
x.write(file.read_bytes())
x.flush()
if delete:
file.unlink()
def decrypt(include_this_segment: bool) -> Path:
"""
Decrypt all segments that uses the currently set DRM.
All segments that will be decrypted with this DRM will be merged together
in sequence, prefixed with the init data (if any), and then deleted. Once
merged they will be decrypted. The merged and decrypted file names state
the range of segments that were used.
Parameters:
include_this_segment: Whether to include the current segment in the
list of segments to merge and decrypt. This should be False if
decrypting on EXT-X-KEY changes, or True when decrypting on the
last segment.
Returns the decrypted path.
"""
drm = encryption_data[1]
first_segment_i = next(
int(file.stem)
for file in sorted(segment_save_dir.iterdir())
if file.stem.isdigit()
)
last_segment_i = max(0, i - int(not include_this_segment))
range_len = (last_segment_i - first_segment_i) + 1
segment_range = f"{str(first_segment_i).zfill(name_len)}-{str(last_segment_i).zfill(name_len)}"
merged_path = segment_save_dir / f"{segment_range}{get_extension(master.segments[last_segment_i].uri)}"
decrypted_path = segment_save_dir / f"{merged_path.stem}_decrypted{merged_path.suffix}"
files = [
file
for file in sorted(segment_save_dir.iterdir())
if file.stem.isdigit() and first_segment_i <= int(file.stem) <= last_segment_i
]
if not files:
raise ValueError(f"None of the segment files for {segment_range} exist...")
elif len(files) != range_len:
raise ValueError(f"Missing {range_len - len(files)} segment files for {segment_range}...")
merge(
to=merged_path,
via=files,
delete=True,
include_map_data=True
)
drm.decrypt(merged_path)
merged_path.rename(decrypted_path)
if callable(track.OnDecrypted):
track.OnDecrypted(drm, decrypted_path)
return decrypted_path
def merge_discontinuity(include_this_segment: bool, include_map_data: bool = True):
"""
Merge all segments of the discontinuity.
All segment files for this discontinuity must already be downloaded and
already decrypted (if it needs to be decrypted).
Parameters:
include_this_segment: Whether to include the current segment in the
list of segments to merge and decrypt. This should be False if
decrypting on EXT-X-KEY changes, or True when decrypting on the
last segment.
include_map_data: Whether to prepend the init map data before the
segment files when merging.
"""
last_segment_i = max(0, i - int(not include_this_segment))
files = [
file
for file in sorted(segment_save_dir.iterdir())
if int(file.stem.replace("_decrypted", "").split("-")[-1]) <= last_segment_i
]
if files:
to_dir = segment_save_dir.parent
to_path = to_dir / f"{str(discon_i).zfill(name_len)}{files[-1].suffix}"
merge(
to=to_path,
via=files,
delete=True,
include_map_data=include_map_data
)
if segment not in unwanted_segments:
if isinstance(track, Subtitle):
segment_file_ext = get_extension(segment.uri)
segment_file_path = segment_save_dir / f"{str(i).zfill(name_len)}{segment_file_ext}"
segment_data = try_ensure_utf8(segment_file_path.read_bytes())
if track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML):
segment_data = segment_data.decode("utf8"). \
replace("&lrm;", html.unescape("&lrm;")). \
replace("&rlm;", html.unescape("&rlm;")). \
encode("utf8")
segment_file_path.write_bytes(segment_data)
if segment.discontinuity and i != 0:
if encryption_data:
decrypt(include_this_segment=False)
merge_discontinuity(
include_this_segment=False,
include_map_data=not encryption_data or not encryption_data[1]
)
discon_i += 1
range_offset = 0 # TODO: Should this be reset or not?
map_data = None
if encryption_data:
encryption_data = (encryption_data[0], encryption_data[1])
if segment.init_section and (not map_data or segment.init_section != map_data[0]):
if segment.init_section.byterange:
init_byte_range = HLS.calculate_byte_range(
segment.init_section.byterange,
range_offset
)
range_offset = init_byte_range.split("-")[0]
init_range_header = {
"Range": f"bytes={init_byte_range}"
}
else:
init_range_header = {}
res = session.get(
url=urljoin(segment.init_section.base_uri, segment.init_section.uri),
headers=init_range_header
)
res.raise_for_status()
map_data = (segment.init_section, res.content)
if segment.keys:
key = HLS.get_supported_key(segment.keys)
if encryption_data and encryption_data[0] != key and i != 0 and segment not in unwanted_segments:
decrypt(include_this_segment=False)
if key is None:
encryption_data = None
elif not encryption_data or encryption_data[0] != key:
drm = HLS.get_drm(key, session)
if isinstance(drm, Widevine):
try:
if map_data:
track_kid = track.get_key_id(map_data[1])
else:
track_kid = None
progress(downloaded="LICENSING")
license_widevine(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
except Exception: # noqa
DOWNLOAD_CANCELLED.set() # skip pending track downloads
progress(downloaded="[red]FAILED")
raise
encryption_data = (key, drm)
# TODO: This wont work as we already downloaded
if DOWNLOAD_LICENCE_ONLY.is_set():
continue
if is_last_segment:
# required as it won't end with EXT-X-DISCONTINUITY nor a new key
if encryption_data:
decrypt(include_this_segment=True)
merge_discontinuity(
include_this_segment=True,
include_map_data=not encryption_data or not encryption_data[1]
)
progress(advance=1) progress(advance=1)
now = time.time() # TODO: Again still wont work, we've already downloaded
time_since = now - last_speed_refresh if DOWNLOAD_LICENCE_ONLY.is_set():
return
if download_size: # no size == skipped dl segment_save_dir.rmdir()
download_sizes.append(download_size)
if download_sizes and (time_since > download_speed_window or i == len(master.segments)):
data_size = sum(download_sizes)
download_speed = data_size / (time_since or 1)
progress(downloaded=f"HLS {filesize.decimal(download_speed)}/s")
last_speed_refresh = now
download_sizes.clear()
# finally merge all the discontinuity save files together to the final path
segments_to_merge = [
x
for x in sorted(save_dir.iterdir())
if x.is_file()
]
if len(segments_to_merge) == 1:
shutil.move(segments_to_merge[0], save_path)
else:
progress(downloaded="Merging")
if isinstance(track, (Video, Audio)):
HLS.merge_segments(
segments=segments_to_merge,
save_path=save_path
)
else:
with open(save_path, "wb") as f: with open(save_path, "wb") as f:
for segment_file in sorted(save_dir.iterdir()): for discontinuity_file in segments_to_merge:
segment_data = segment_file.read_bytes() discontinuity_data = discontinuity_file.read_bytes()
if isinstance(track, Subtitle): f.write(discontinuity_data)
segment_data = try_ensure_utf8(segment_data) f.flush()
if track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML):
segment_data = html.unescape(segment_data.decode("utf8")).encode("utf8") save_dir.rmdir()
f.write(segment_data)
segment_file.unlink()
progress(downloaded="Downloaded") progress(downloaded="Downloaded")
track.path = save_path track.path = save_path
save_dir.rmdir() if callable(track.OnDownloaded):
track.OnDownloaded()
@staticmethod @staticmethod
def download_segment( def merge_segments(segments: list[Path], save_path: Path) -> int:
segment: m3u8.Segment,
out_path: Path,
track: AnyTrack,
init_data: Queue,
segment_key: Queue,
range_offset: Queue,
drm_lock: Lock,
progress: partial,
license_widevine: Optional[Callable] = None,
session: Optional[Session] = None,
proxy: Optional[str] = None
) -> int:
""" """
Download (and Decrypt) an HLS Media Segment. Concatenate Segments by first demuxing with FFmpeg.
Note: Make sure all Queue objects passed are appropriately initialized with Returns the file size of the merged file.
a starting value or this function may get permanently stuck.
Parameters:
segment: The m3u8.Segment Object to Download.
out_path: Path to save the downloaded Segment file to.
track: The Track object of which this Segment is for. Currently used to fix an
invalid value in the TFHD box of Audio Tracks, for the OnSegmentFilter, and
for DRM-related operations like getting the Track ID and Decryption.
init_data: Queue for saving and loading the most recent init section data.
segment_key: Queue for saving and loading the most recent DRM object, and it's
adjacent Segment.Key object.
range_offset: Queue for saving and loading the most recent Segment Bytes Range.
drm_lock: Prevent more than one Download from doing anything DRM-related at the
same time. Make sure all calls to download_segment() use the same Lock object.
progress: Rich Progress bar to provide progress updates to.
license_widevine: Function used to license Widevine DRM objects. It must be passed
if the Segment's DRM uses Widevine.
proxy: Proxy URI to use when downloading the Segment file.
session: Python-Requests Session used when requesting init data.
stop_event: Prematurely stop the Download from beginning. Useful if ran from
a Thread Pool. It will raise a KeyboardInterrupt if set.
Returns the file size of the downloaded Segment in bytes.
""" """
if DOWNLOAD_CANCELLED.is_set(): ffmpeg = get_binary_path("ffmpeg")
raise KeyboardInterrupt() if not ffmpeg:
raise EnvironmentError("FFmpeg executable was not found but is required to merge HLS segments.")
if callable(track.OnSegmentFilter) and track.OnSegmentFilter(segment): demuxer_file = segments[0].parent / "ffmpeg_concat_demuxer.txt"
return 0 demuxer_file.write_text("\n".join([
f"file '{segment}'"
for segment in segments
]))
# handle init section changes subprocess.check_call([
newest_init_data = init_data.get() ffmpeg, "-hide_banner",
try: "-loglevel", "panic",
if segment.init_section and (not newest_init_data or segment.discontinuity): "-f", "concat",
# Only use the init data if there's no init data yet (e.g., start of file) "-safe", "0",
# or if EXT-X-DISCONTINUITY is reached at the same time as EXT-X-MAP. "-i", demuxer_file,
# Even if a new EXT-X-MAP is supplied, it may just be duplicate and would "-map", "0",
# be unnecessary and slow to re-download the init data each time. "-c", "copy",
if segment.init_section.byterange: save_path
previous_range_offset = range_offset.get() ])
byte_range = HLS.calculate_byte_range(segment.init_section.byterange, previous_range_offset) demuxer_file.unlink()
range_offset.put(byte_range.split("-")[0])
range_header = { return save_path.stat().st_size
"Range": f"bytes={byte_range}"
} @staticmethod
def get_supported_key(keys: list[Union[m3u8.model.SessionKey, m3u8.model.Key]]) -> Optional[m3u8.Key]:
"""
Get a support Key System from a list of Key systems.
Note that the key systems are chosen in an opinionated order.
Returns None if one of the key systems is method=NONE, which means all segments
from hence forth should be treated as plain text until another key system is
encountered, unless it's also method=NONE.
Raises NotImplementedError if none of the key systems are supported.
"""
if any(key.method == "NONE" for key in keys):
return None
unsupported_systems = []
for key in keys:
if not key:
continue
# TODO: Add a way to specify which supported key system to use
# TODO: Add support for 'SAMPLE-AES', 'AES-CTR', 'AES-CBC', 'ClearKey'
elif key.method == "AES-128":
return key
elif key.method == "ISO-23001-7":
return key
elif key.keyformat and key.keyformat.lower() == WidevineCdm.urn:
return key
else: else:
range_header = {} unsupported_systems.append(key.method + (f" ({key.keyformat})" if key.keyformat else ""))
res = session.get(
url=urljoin(segment.init_section.base_uri, segment.init_section.uri),
headers=range_header
)
res.raise_for_status()
newest_init_data = res.content
finally:
init_data.put(newest_init_data)
# handle segment key changes
with drm_lock:
newest_segment_key = segment_key.get()
try:
if segment.keys and newest_segment_key[1] != segment.keys:
drm = HLS.get_drm(
keys=segment.keys,
proxy=proxy
)
if drm:
track.drm = drm
# license and grab content keys
# TODO: What if we don't want to use the first DRM system?
drm = drm[0]
if isinstance(drm, Widevine):
track_kid = track.get_key_id(newest_init_data)
if not license_widevine:
raise ValueError("license_widevine func must be supplied to use Widevine DRM")
progress(downloaded="LICENSING")
license_widevine(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
newest_segment_key = (drm, segment.keys)
finally:
segment_key.put(newest_segment_key)
headers_ = session.headers
if segment.byterange:
# aria2(c) doesn't support byte ranges, use python-requests
downloader_ = requests_downloader
previous_range_offset = range_offset.get()
byte_range = HLS.calculate_byte_range(segment.byterange, previous_range_offset)
range_offset.put(byte_range.split("-")[0])
headers_["Range"] = f"bytes={byte_range}"
else: else:
downloader_ = downloader raise NotImplementedError(f"None of the key systems are supported: {', '.join(unsupported_systems)}")
downloader_(
uri=urljoin(segment.base_uri, segment.uri),
out=out_path,
headers=headers_,
cookies=session.cookies,
proxy=proxy,
segmented=True
)
download_size = out_path.stat().st_size
# fix audio decryption on ATVP by fixing the sample description index
# TODO: Should this be done in the video data or the init data?
if isinstance(track, Audio):
with open(out_path, "rb+") as f:
segment_data = f.read()
fixed_segment_data = re.sub(
b"(tfhd\x00\x02\x00\x1a\x00\x00\x00\x01\x00\x00\x00)\x02",
b"\\g<1>\x01",
segment_data
)
if fixed_segment_data != segment_data:
f.seek(0)
f.write(fixed_segment_data)
# prepend the init data to be able to decrypt
if newest_init_data:
with open(out_path, "rb+") as f:
segment_data = f.read()
f.seek(0)
f.write(newest_init_data)
f.write(segment_data)
# decrypt segment if encrypted
if newest_segment_key[0]:
newest_segment_key[0].decrypt(out_path)
track.drm = None
if callable(track.OnDecrypted):
track.OnDecrypted(track)
return download_size
@staticmethod @staticmethod
def get_drm( def get_drm(
key: Union[m3u8.model.SessionKey, m3u8.model.Key],
session: Optional[requests.Session] = None
) -> DRM_T:
"""
Convert HLS EXT-X-KEY data to an initialized DRM object.
Parameters:
key: m3u8 key system (EXT-X-KEY) object.
session: Optional session used to request AES-128 URIs.
Useful to set headers, proxies, cookies, and so forth.
Raises a NotImplementedError if the key system is not supported.
"""
if not isinstance(session, (Session, type(None))):
raise TypeError(f"Expected session to be a {Session}, not {type(session)}")
if not session:
session = Session()
# TODO: Add support for 'SAMPLE-AES', 'AES-CTR', 'AES-CBC', 'ClearKey'
if key.method == "AES-128":
drm = ClearKey.from_m3u_key(key, session)
elif key.method == "ISO-23001-7":
drm = Widevine(
pssh=PSSH.new(
key_ids=[key.uri.split(",")[-1]],
system_id=PSSH.SystemId.Widevine
)
)
elif key.keyformat and key.keyformat.lower() == WidevineCdm.urn:
drm = Widevine(
pssh=PSSH(key.uri.split(",")[-1]),
**key._extra_params # noqa
)
else:
raise NotImplementedError(f"The key system is not supported: {key}")
return drm
@staticmethod
def get_all_drm(
keys: list[Union[m3u8.model.SessionKey, m3u8.model.Key]], keys: list[Union[m3u8.model.SessionKey, m3u8.model.Key]],
proxy: Optional[str] = None proxy: Optional[str] = None
) -> list[DRM_T]: ) -> list[DRM_T]:
""" """
Convert HLS EXT-X-KEY data to initialized DRM objects. Convert HLS EXT-X-KEY data to initialized DRM objects.
You can supply key data for a single segment or for the entire manifest. Parameters:
This lets you narrow the results down to each specific segment's DRM status. keys: m3u8 key system (EXT-X-KEY) objects.
proxy: Optional proxy string used for requesting AES-128 URIs.
Returns an empty list if there were no supplied EXT-X-KEY data, or if all the Raises a NotImplementedError if none of the key systems are supported.
EXT-X-KEY's were of blank data. An empty list signals a DRM-free stream or segment.
Will raise a NotImplementedError if EXT-X-KEY data was supplied and none of them
were supported. A DRM-free track will never raise NotImplementedError.
""" """
drm = [] unsupported_keys: list[m3u8.Key] = []
unsupported_systems = [] drm_objects: list[DRM_T] = []
if any(key.method == "NONE" for key in keys):
return []
for key in keys: for key in keys:
if not key: try:
continue drm = HLS.get_drm(key, proxy)
# TODO: Add support for 'SAMPLE-AES', 'AES-CTR', 'AES-CBC', 'ClearKey' drm_objects.append(drm)
if key.method == "NONE": except NotImplementedError:
return [] unsupported_keys.append(key)
elif key.method == "AES-128":
drm.append(ClearKey.from_m3u_key(key, proxy))
elif key.method == "ISO-23001-7":
drm.append(Widevine(
pssh=PSSH.new(
key_ids=[key.uri.split(",")[-1]],
system_id=PSSH.SystemId.Widevine
)
))
elif key.keyformat and key.keyformat.lower() == WidevineCdm.urn:
drm.append(Widevine(
pssh=PSSH(key.uri.split(",")[-1]),
**key._extra_params # noqa
))
else:
unsupported_systems.append(key.method + (f" ({key.keyformat})" if key.keyformat else ""))
if not drm and unsupported_systems: if not drm_objects and unsupported_keys:
raise NotImplementedError(f"No support for any of the key systems: {', '.join(unsupported_systems)}") raise NotImplementedError(f"None of the key systems are supported: {unsupported_keys}")
return drm return drm_objects
@staticmethod @staticmethod
def calculate_byte_range(m3u_range: str, fallback_offset: int = 0) -> str: def calculate_byte_range(m3u_range: str, fallback_offset: int = 0) -> str:

View File

@ -0,0 +1,44 @@
from typing import Optional, Union
class SearchResult:
def __init__(
self,
id_: Union[str, int],
title: str,
description: Optional[str] = None,
label: Optional[str] = None,
url: Optional[str] = None
):
"""
A Search Result for any support Title Type.
Parameters:
id_: The search result's Title ID.
title: The primary display text, e.g., the Title's Name.
description: The secondary display text, e.g., the Title's Description or
further title information.
label: The tertiary display text. This will typically be used to display
an informative label or tag to the result. E.g., "unavailable", the
title's price tag, region, etc.
url: A hyperlink to the search result or title's page.
"""
if not isinstance(id_, (str, int)):
raise TypeError(f"Expected id_ to be a {str} or {int}, not {type(id_)}")
if not isinstance(title, str):
raise TypeError(f"Expected title to be a {str}, not {type(title)}")
if not isinstance(description, (str, type(None))):
raise TypeError(f"Expected description to be a {str}, not {type(description)}")
if not isinstance(label, (str, type(None))):
raise TypeError(f"Expected label to be a {str}, not {type(label)}")
if not isinstance(url, (str, type(None))):
raise TypeError(f"Expected url to be a {str}, not {type(url)}")
self.id = id_
self.title = title
self.description = description
self.label = label
self.url = url
__all__ = ("SearchResult",)

View File

@ -1,7 +1,8 @@
import base64 import base64
import logging import logging
from abc import ABCMeta, abstractmethod from abc import ABCMeta, abstractmethod
from http.cookiejar import CookieJar, MozillaCookieJar from collections.abc import Generator
from http.cookiejar import CookieJar
from typing import Optional, Union from typing import Optional, Union
from urllib.parse import urlparse from urllib.parse import urlparse
@ -16,8 +17,9 @@ from devine.core.config import config
from devine.core.console import console from devine.core.console import console
from devine.core.constants import AnyTrack from devine.core.constants import AnyTrack
from devine.core.credential import Credential from devine.core.credential import Credential
from devine.core.search_result import SearchResult
from devine.core.titles import Title_T, Titles_T from devine.core.titles import Title_T, Titles_T
from devine.core.tracks import Chapter, Tracks from devine.core.tracks import Chapters, Tracks
from devine.core.utilities import get_ip_info from devine.core.utilities import get_ip_info
@ -96,15 +98,12 @@ class Service(metaclass=ABCMeta):
backoff_factor=0.2, backoff_factor=0.2,
status_forcelist=[429, 500, 502, 503, 504] status_forcelist=[429, 500, 502, 503, 504]
), ),
# 16 connections is used for byte-ranged downloads
# double it to allow for 16 non-related connections
pool_maxsize=16 * 2,
pool_block=True pool_block=True
)) ))
session.mount("http://", session.adapters["https://"]) session.mount("http://", session.adapters["https://"])
return session return session
def authenticate(self, cookies: Optional[MozillaCookieJar] = None, credential: Optional[Credential] = None) -> None: def authenticate(self, cookies: Optional[CookieJar] = None, credential: Optional[Credential] = None) -> None:
""" """
Authenticate the Service with Cookies and/or Credentials (Email/Username and Password). Authenticate the Service with Cookies and/or Credentials (Email/Username and Password).
@ -120,9 +119,20 @@ class Service(metaclass=ABCMeta):
""" """
if cookies is not None: if cookies is not None:
if not isinstance(cookies, CookieJar): if not isinstance(cookies, CookieJar):
raise TypeError(f"Expected cookies to be a {MozillaCookieJar}, not {cookies!r}.") raise TypeError(f"Expected cookies to be a {CookieJar}, not {cookies!r}.")
self.session.cookies.update(cookies) self.session.cookies.update(cookies)
def search(self) -> Generator[SearchResult, None, None]:
"""
Search by query for titles from the Service.
The query must be taken as a CLI argument by the Service class.
Ideally just re-use the title ID argument (i.e. self.title).
Search results will be displayed in the order yielded.
"""
raise NotImplementedError(f"Search functionality has not been implemented by {self.__class__.__name__}")
def get_widevine_service_certificate(self, *, challenge: bytes, title: Title_T, track: AnyTrack) \ def get_widevine_service_certificate(self, *, challenge: bytes, title: Title_T, track: AnyTrack) \
-> Union[bytes, str]: -> Union[bytes, str]:
""" """
@ -207,24 +217,22 @@ class Service(metaclass=ABCMeta):
""" """
@abstractmethod @abstractmethod
def get_chapters(self, title: Title_T) -> list[Chapter]: def get_chapters(self, title: Title_T) -> Chapters:
""" """
Get Chapter objects of the Title. Get Chapters for the Title.
Return a list of Chapter objects. This will be run after get_tracks. If there's anything Parameters:
from the get_tracks that may be needed, e.g. "device_id" or a-like, store it in the class title: The current Title from `get_titles` that is being processed.
via `self` and re-use the value in get_chapters.
How it's used is generally the same as get_titles. These are only separated as to reduce You must return a Chapters object containing 0 or more Chapter objects.
function complexity and keep them focused on simple tasks.
You do not need to sort or order the chapters in any way. However, you do need to filter You do not need to set a Chapter number or sort/order the chapters in any way as
and alter them as needed by the service. No modification is made after get_chapters is the Chapters class automatically handles all of that for you. If there's no
ran. So that means ensure that the Chapter objects returned have consistent Chapter Titles descriptive name for a Chapter then do not set a name at all.
and Chapter Numbers.
:param title: The current `Title` from get_titles that is being executed. You must not set Chapter names to "Chapter {n}" or such. If you (or the user)
:return: List of Chapter objects, if available, empty list otherwise. wants "Chapter {n}" style Chapter names (or similar) then they can use the config
option `chapter_fallback_name`. For example, `"Chapter {i:02}"` for "Chapter 01".
""" """

View File

@ -1,8 +1,9 @@
from .audio import Audio from .audio import Audio
from .chapter import Chapter from .chapter import Chapter
from .chapters import Chapters
from .subtitle import Subtitle from .subtitle import Subtitle
from .track import Track from .track import Track
from .tracks import Tracks from .tracks import Tracks
from .video import Video from .video import Video
__all__ = ("Audio", "Chapter", "Subtitle", "Track", "Tracks", "Video") __all__ = ("Audio", "Chapter", "Chapters", "Subtitle", "Track", "Tracks", "Video")

View File

@ -1,95 +1,82 @@
from __future__ import annotations from __future__ import annotations
import re import re
from pathlib import Path
from typing import Optional, Union from typing import Optional, Union
from zlib import crc32
TIMESTAMP_FORMAT = re.compile(r"^(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(?P<ms>.\d{3}|)$")
class Chapter: class Chapter:
line_1 = re.compile(r"^CHAPTER(?P<number>\d+)=(?P<timecode>[\d\\.]+)$") def __init__(self, timestamp: Union[str, int, float], name: Optional[str] = None):
line_2 = re.compile(r"^CHAPTER(?P<number>\d+)NAME=(?P<title>[\d\\.]+)$") """
Create a new Chapter with a Timestamp and optional name.
def __init__(self, number: int, timecode: str, title: Optional[str] = None): The timestamp may be in the following formats:
self.id = f"chapter-{number}" - "HH:MM:SS" string, e.g., `25:05:23`.
self.number = number - "HH:MM:SS.mss" string, e.g., `25:05:23.120`.
self.timecode = timecode - a timecode integer in milliseconds, e.g., `90323120` is `25:05:23.120`.
self.title = title - a timecode float in seconds, e.g., `90323.12` is `25:05:23.120`.
if "." not in self.timecode: If you have a timecode integer in seconds, just multiply it by 1000.
self.timecode += ".000" If you have a timecode float in milliseconds (no decimal value), just convert
it to an integer.
"""
if timestamp is None:
raise ValueError("The timestamp must be provided.")
def __bool__(self) -> bool: if not isinstance(timestamp, (str, int, float)):
return self.number and self.number >= 0 and self.timecode raise TypeError(f"Expected timestamp to be {str}, {int} or {float}, not {type(timestamp)}")
if not isinstance(name, (str, type(None))):
raise TypeError(f"Expected name to be {str}, not {type(name)}")
if not isinstance(timestamp, str):
if isinstance(timestamp, int): # ms
hours, remainder = divmod(timestamp, 1000 * 60 * 60)
minutes, remainder = divmod(remainder, 1000 * 60)
seconds, ms = divmod(remainder, 1000)
elif isinstance(timestamp, float): # seconds.ms
hours, remainder = divmod(timestamp, 60 * 60)
minutes, remainder = divmod(remainder, 60)
seconds, ms = divmod(int(remainder * 1000), 1000)
else:
raise TypeError
timestamp = f"{hours:02}:{minutes:02}:{seconds:02}.{str(ms).zfill(3)[:3]}"
timestamp_m = TIMESTAMP_FORMAT.match(timestamp)
if not timestamp_m:
raise ValueError(f"The timestamp format is invalid: {timestamp}")
hour, minute, second, ms = timestamp_m.groups()
if not ms:
timestamp += ".000"
self.timestamp = timestamp
self.name = name
def __repr__(self) -> str: def __repr__(self) -> str:
""" return "{name}({items})".format(
OGM-based Simple Chapter Format intended for use with MKVToolNix. name=self.__class__.__name__,
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
This format is not officially part of the Matroska spec. This was a format
designed for OGM tools that MKVToolNix has since re-used. More Information:
https://mkvtoolnix.download/doc/mkvmerge.html#mkvmerge.chapters.simple
"""
return "CHAPTER{num}={time}\nCHAPTER{num}NAME={name}".format(
num=f"{self.number:02}",
time=self.timecode,
name=self.title or ""
) )
def __str__(self) -> str: def __str__(self) -> str:
return " | ".join(filter(bool, [ return " | ".join(filter(bool, [
"CHP", "CHP",
f"[{self.number:02}]", self.timestamp,
self.timecode, self.name
self.title
])) ]))
@property
def id(self) -> str:
"""Compute an ID from the Chapter data."""
checksum = crc32(str(self).encode("utf8"))
return hex(checksum)
@property @property
def named(self) -> bool: def named(self) -> bool:
"""Check if Chapter is named.""" """Check if Chapter is named."""
return bool(self.title) return bool(self.name)
@classmethod
def loads(cls, data: str) -> Chapter:
"""Load chapter data from a string."""
lines = [x.strip() for x in data.strip().splitlines(keepends=False)]
if len(lines) > 2:
return cls.loads("\n".join(lines))
one, two = lines
one_m = cls.line_1.match(one)
two_m = cls.line_2.match(two)
if not one_m or not two_m:
raise SyntaxError(f"An unexpected syntax error near:\n{one}\n{two}")
one_str, timecode = one_m.groups()
two_str, title = two_m.groups()
one_num, two_num = int(one_str.lstrip("0")), int(two_str.lstrip("0"))
if one_num != two_num:
raise SyntaxError(f"The chapter numbers ({one_num},{two_num}) does not match.")
if not timecode:
raise SyntaxError("The timecode is missing.")
if not title:
title = None
return cls(number=one_num, timecode=timecode, title=title)
@classmethod
def load(cls, path: Union[Path, str]) -> Chapter:
"""Load chapter data from a file."""
if isinstance(path, str):
path = Path(path)
return cls.loads(path.read_text(encoding="utf8"))
def dumps(self) -> str:
"""Return chapter data as a string."""
return repr(self)
def dump(self, path: Union[Path, str]) -> int:
"""Write chapter data to a file."""
if isinstance(path, str):
path = Path(path)
return path.write_text(self.dumps(), encoding="utf8")
__all__ = ("Chapter",) __all__ = ("Chapter",)

View File

@ -0,0 +1,156 @@
from __future__ import annotations
import re
from abc import ABC
from pathlib import Path
from typing import Any, Iterable, Optional, Union
from zlib import crc32
from sortedcontainers import SortedKeyList
from devine.core.tracks import Chapter
OGM_SIMPLE_LINE_1_FORMAT = re.compile(r"^CHAPTER(?P<number>\d+)=(?P<timestamp>\d{2,}:\d{2}:\d{2}\.\d{3})$")
OGM_SIMPLE_LINE_2_FORMAT = re.compile(r"^CHAPTER(?P<number>\d+)NAME=(?P<name>.*)$")
class Chapters(SortedKeyList, ABC):
def __init__(self, iterable: Optional[Iterable[Chapter]] = None):
super().__init__(key=lambda x: x.timestamp or 0)
for chapter in iterable or []:
self.add(chapter)
def __repr__(self) -> str:
return "{name}({items})".format(
name=self.__class__.__name__,
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
)
def __str__(self) -> str:
return "\n".join([
" | ".join(filter(bool, [
"CHP",
f"[{i:02}]",
chapter.timestamp,
chapter.name
]))
for i, chapter in enumerate(self, start=1)
])
@classmethod
def loads(cls, data: str) -> Chapters:
"""Load chapter data from a string."""
lines = [
line.strip()
for line in data.strip().splitlines(keepends=False)
]
if len(lines) % 2 != 0:
raise ValueError("The number of chapter lines must be even.")
chapters = []
for line_1, line_2 in zip(lines[::2], lines[1::2]):
line_1_match = OGM_SIMPLE_LINE_1_FORMAT.match(line_1)
if not line_1_match:
raise SyntaxError(f"An unexpected syntax error occurred on: {line_1}")
line_2_match = OGM_SIMPLE_LINE_2_FORMAT.match(line_2)
if not line_2_match:
raise SyntaxError(f"An unexpected syntax error occurred on: {line_2}")
line_1_number, timestamp = line_1_match.groups()
line_2_number, name = line_2_match.groups()
if line_1_number != line_2_number:
raise SyntaxError(
f"The chapter numbers {line_1_number} and {line_2_number} do not match on:\n{line_1}\n{line_2}")
if not timestamp:
raise SyntaxError(f"The timestamp is missing on: {line_1}")
chapters.append(Chapter(timestamp, name))
return cls(chapters)
@classmethod
def load(cls, path: Union[Path, str]) -> Chapters:
"""Load chapter data from a file."""
if isinstance(path, str):
path = Path(path)
return cls.loads(path.read_text(encoding="utf8"))
def dumps(self, fallback_name: str = "") -> str:
"""
Return chapter data in OGM-based Simple Chapter format.
https://mkvtoolnix.download/doc/mkvmerge.html#mkvmerge.chapters.simple
Parameters:
fallback_name: Name used for Chapters without a Name set.
The fallback name can use the following variables in f-string style:
- {i}: The Chapter number starting at 1.
E.g., `"Chapter {i}"`: "Chapter 1", "Intro", "Chapter 3".
- {j}: A number starting at 1 that increments any time a Chapter has no name.
E.g., `"Chapter {j}"`: "Chapter 1", "Intro", "Chapter 2".
These are formatted with f-strings, directives are supported.
For example, `"Chapter {i:02}"` will result in `"Chapter 01"`.
"""
chapters = []
j = 0
for i, chapter in enumerate(self, start=1):
if not chapter.name:
j += 1
chapters.append("CHAPTER{num}={time}\nCHAPTER{num}NAME={name}".format(
num=f"{i:02}",
time=chapter.timestamp,
name=chapter.name or fallback_name.format(
i=i,
j=j
)
))
return "\n".join(chapters)
def dump(self, path: Union[Path, str], *args: Any, **kwargs: Any) -> int:
"""
Write chapter data in OGM-based Simple Chapter format to a file.
Parameters:
path: The file path to write the Chapter data to, overwriting
any existing data.
See `Chapters.dumps` for more parameter documentation.
"""
if isinstance(path, str):
path = Path(path)
path.parent.mkdir(parents=True, exist_ok=True)
ogm_text = self.dumps(*args, **kwargs)
return path.write_text(ogm_text, encoding="utf8")
def add(self, value: Chapter) -> None:
if not isinstance(value, Chapter):
raise TypeError(f"Can only add {Chapter} objects, not {type(value)}")
if any(chapter.timestamp == value.timestamp for chapter in self):
raise ValueError(f"A Chapter with the Timestamp {value.timestamp} already exists")
super().add(value)
if not any(chapter.timestamp == "00:00:00.000" for chapter in self):
self.add(Chapter(0))
@property
def id(self) -> str:
"""Compute an ID from the Chapter data."""
checksum = crc32("\n".join([
chapter.id
for chapter in self
]).encode("utf8"))
return hex(checksum)
__all__ = ("Chapters", "Chapter")

View File

@ -4,11 +4,13 @@ import re
import subprocess import subprocess
from collections import defaultdict from collections import defaultdict
from enum import Enum from enum import Enum
from functools import partial
from io import BytesIO from io import BytesIO
from pathlib import Path from pathlib import Path
from typing import Any, Iterable, Optional from typing import Any, Callable, Iterable, Optional
import pycaption import pycaption
import requests
from construct import Container from construct import Container
from pycaption import Caption, CaptionList, CaptionNode, WebVTTReader from pycaption import Caption, CaptionList, CaptionNode, WebVTTReader
from pycaption.geometry import Layout from pycaption.geometry import Layout
@ -134,6 +136,9 @@ class Subtitle(Track):
if (self.cc or self.sdh) and self.forced: if (self.cc or self.sdh) and self.forced:
raise ValueError("A text track cannot be CC/SDH as well as Forced.") raise ValueError("A text track cannot be CC/SDH as well as Forced.")
# Called after Track has been converted to another format
self.OnConverted: Optional[Callable[[Subtitle.Codec], None]] = None
def get_track_name(self) -> Optional[str]: def get_track_name(self) -> Optional[str]:
"""Return the base Track Name.""" """Return the base Track Name."""
track_name = super().get_track_name() or "" track_name = super().get_track_name() or ""
@ -144,6 +149,21 @@ class Subtitle(Track):
track_name += flag track_name += flag
return track_name or None return track_name or None
def download(
self,
session: requests.Session,
prepare_drm: partial,
progress: Optional[partial] = None
):
super().download(session, prepare_drm, progress)
if not self.path:
return
if self.codec == Subtitle.Codec.fTTML:
self.convert(Subtitle.Codec.TimedTextMarkupLang)
elif self.codec == Subtitle.Codec.fVTT:
self.convert(Subtitle.Codec.WebVTT)
def convert(self, codec: Subtitle.Codec) -> Path: def convert(self, codec: Subtitle.Codec) -> Path:
""" """
Convert this Subtitle to another Format. Convert this Subtitle to another Format.
@ -181,14 +201,16 @@ class Subtitle(Track):
Subtitle.Codec.SubStationAlphav4: "AdvancedSubStationAlpha", Subtitle.Codec.SubStationAlphav4: "AdvancedSubStationAlpha",
Subtitle.Codec.TimedTextMarkupLang: "TimedText1.0" Subtitle.Codec.TimedTextMarkupLang: "TimedText1.0"
}.get(codec, codec.name) }.get(codec, codec.name)
subprocess.run( sub_edit_args = [
[
sub_edit_executable, sub_edit_executable,
"/Convert", self.path, sub_edit_format, "/Convert", self.path, sub_edit_format,
f"/outputfilename:{output_path.name}", f"/outputfilename:{output_path.name}",
f"/outputfolder:{output_path.parent}",
"/encoding:utf8" "/encoding:utf8"
], ]
if codec == Subtitle.Codec.SubRip:
sub_edit_args.append("/ConvertColorsToDialog")
subprocess.run(
sub_edit_args,
check=True, check=True,
stdout=subprocess.DEVNULL, stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL stderr=subprocess.DEVNULL
@ -209,9 +231,12 @@ class Subtitle(Track):
output_path.write_text(subtitle_text, encoding="utf8") output_path.write_text(subtitle_text, encoding="utf8")
self.swap(output_path) self.path = output_path
self.codec = codec self.codec = codec
if callable(self.OnConverted):
self.OnConverted(codec)
return output_path return output_path
@staticmethod @staticmethod
@ -499,27 +524,6 @@ class Subtitle(Track):
stdout=subprocess.DEVNULL stdout=subprocess.DEVNULL
) )
def remove_multi_lang_srt_header(self) -> None:
"""
Remove Multi-Language SRT Header from Subtitle.
Sometimes a SubRip (SRT) format Subtitle has a "MULTI-LANGUAGE SRT" line,
when it shouldn't. This can cause Subtitle format/syntax errors in some
programs including mkvmerge/MKVToolNix.
This should only be used if it truly is a normal SubRip (SRT) subtitle
just with this line added by mistake.
"""
if not self.path or not self.path.exists():
raise ValueError("You must download the subtitle track first.")
if self.codec != Subtitle.Codec.SubRip:
raise ValueError("Only SubRip (SRT) format Subtitles have the 'MULTI-LANGUAGE SRT' header.")
srt_text = self.path.read_text("utf8")
fixed_srt_text = srt_text.replace("MULTI-LANGUAGE SRT\n", "")
self.path.write_text(fixed_srt_text, "utf8")
def __str__(self) -> str: def __str__(self) -> str:
return " | ".join(filter(bool, [ return " | ".join(filter(bool, [
"SUB", "SUB",

View File

@ -1,65 +1,137 @@
import base64 import base64
import html
import logging
import re import re
import shutil import shutil
import subprocess import subprocess
from copy import copy
from enum import Enum from enum import Enum
from functools import partial
from pathlib import Path from pathlib import Path
from typing import Any, Callable, Iterable, Optional, Union from typing import Any, Callable, Iterable, Optional, Union
from uuid import UUID from uuid import UUID
from zlib import crc32
import requests import m3u8
from langcodes import Language from langcodes import Language
from requests import Session
from devine.core.constants import TERRITORY_MAP from devine.core.config import config
from devine.core.drm import DRM_T from devine.core.constants import DOWNLOAD_CANCELLED, DOWNLOAD_LICENCE_ONLY
from devine.core.utilities import get_binary_path, get_boxes from devine.core.downloaders import aria2c, curl_impersonate, requests
from devine.core.drm import DRM_T, Widevine
from devine.core.utilities import get_binary_path, get_boxes, try_ensure_utf8
from devine.core.utils.subprocess import ffprobe from devine.core.utils.subprocess import ffprobe
class Track: class Track:
class DRM(Enum):
pass
class Descriptor(Enum): class Descriptor(Enum):
URL = 1 # Direct URL, nothing fancy URL = 1 # Direct URL, nothing fancy
M3U = 2 # https://en.wikipedia.org/wiki/M3U (and M3U8) HLS = 2 # https://en.wikipedia.org/wiki/HTTP_Live_Streaming
MPD = 3 # https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP DASH = 3 # https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP
def __init__( def __init__(
self, self,
id_: str,
url: Union[str, list[str]], url: Union[str, list[str]],
language: Union[Language, str], language: Union[Language, str],
is_original_lang: bool = False, is_original_lang: bool = False,
descriptor: Descriptor = Descriptor.URL, descriptor: Descriptor = Descriptor.URL,
needs_repack: bool = False, needs_repack: bool = False,
name: Optional[str] = None,
drm: Optional[Iterable[DRM_T]] = None, drm: Optional[Iterable[DRM_T]] = None,
edition: Optional[str] = None, edition: Optional[str] = None,
extra: Optional[Any] = None downloader: Optional[Callable] = None,
data: Optional[dict] = None,
id_: Optional[str] = None,
) -> None: ) -> None:
self.id = id_ if not isinstance(url, (str, list)):
self.url = url raise TypeError(f"Expected url to be a {str}, or list of {str}, not {type(url)}")
# required basic metadata if not isinstance(language, (Language, str)):
self.language = Language.get(language) raise TypeError(f"Expected language to be a {Language} or {str}, not {type(language)}")
self.is_original_lang = bool(is_original_lang) if not isinstance(is_original_lang, bool):
# optional io metadata raise TypeError(f"Expected is_original_lang to be a {bool}, not {type(is_original_lang)}")
self.descriptor = descriptor if not isinstance(descriptor, Track.Descriptor):
self.needs_repack = bool(needs_repack) raise TypeError(f"Expected descriptor to be a {Track.Descriptor}, not {type(descriptor)}")
# drm if not isinstance(needs_repack, bool):
self.drm = drm raise TypeError(f"Expected needs_repack to be a {bool}, not {type(needs_repack)}")
# extra data if not isinstance(name, (str, type(None))):
self.edition: str = edition raise TypeError(f"Expected name to be a {str}, not {type(name)}")
self.extra: Any = extra or {} # allow anything for extra, but default to a dict if not isinstance(id_, (str, type(None))):
raise TypeError(f"Expected id_ to be a {str}, not {type(id_)}")
if not isinstance(edition, (str, type(None))):
raise TypeError(f"Expected edition to be a {str}, not {type(edition)}")
if not isinstance(downloader, (Callable, type(None))):
raise TypeError(f"Expected downloader to be a {Callable}, not {type(downloader)}")
if not isinstance(data, (dict, type(None))):
raise TypeError(f"Expected data to be a {dict}, not {type(data)}")
# events invalid_urls = ", ".join(set(type(x) for x in url if not isinstance(x, str)))
self.OnSegmentFilter: Optional[Callable] = None if invalid_urls:
self.OnDownloaded: Optional[Callable] = None raise TypeError(f"Expected all items in url to be a {str}, but found {invalid_urls}")
self.OnDecrypted: Optional[Callable] = None
self.OnRepacked: Optional[Callable] = None if drm is not None:
self.OnMultiplex: Optional[Callable] = None try:
iter(drm)
except TypeError:
raise TypeError(f"Expected drm to be an iterable, not {type(drm)}")
if downloader is None:
downloader = {
"aria2c": aria2c,
"curl_impersonate": curl_impersonate,
"requests": requests
}[config.downloader]
# should only be set internally
self.path: Optional[Path] = None self.path: Optional[Path] = None
self.url = url
self.language = Language.get(language)
self.is_original_lang = is_original_lang
self.descriptor = descriptor
self.needs_repack = needs_repack
self.name = name
self.drm = drm
self.edition: str = edition
self.downloader = downloader
self.data = data or {}
if self.name is None:
lang = Language.get(self.language)
if (lang.language or "").lower() == (lang.territory or "").lower():
lang.territory = None # e.g. en-en, de-DE
reduced = lang.simplify_script()
extra_parts = []
if reduced.script is not None:
script = reduced.script_name(max_distance=25)
if script and script != "Zzzz":
extra_parts.append(script)
if reduced.territory is not None:
territory = reduced.territory_name(max_distance=25)
if territory and territory != "ZZ":
territory = territory.removesuffix(" SAR China")
extra_parts.append(territory)
self.name = ", ".join(extra_parts) or None
if not id_:
this = copy(self)
this.url = self.url.rsplit("?", maxsplit=1)[0]
checksum = crc32(repr(this).encode("utf8"))
id_ = hex(checksum)[2:]
self.id = id_
# TODO: Currently using OnFoo event naming, change to just segment_filter
self.OnSegmentFilter: Optional[Callable] = None
# Called after one of the Track's segments have downloaded
self.OnSegmentDownloaded: Optional[Callable[[Path], None]] = None
# Called after the Track has downloaded
self.OnDownloaded: Optional[Callable] = None
# Called after the Track or one of its segments have been decrypted
self.OnDecrypted: Optional[Callable[[DRM_T, Optional[m3u8.Segment]], None]] = None
# Called after the Track has been repackaged
self.OnRepacked: Optional[Callable] = None
# Called before the Track is multiplexed
self.OnMultiplex: Optional[Callable] = None
def __repr__(self) -> str: def __repr__(self) -> str:
return "{name}({items})".format( return "{name}({items})".format(
@ -67,23 +139,208 @@ class Track:
items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()]) items=", ".join([f"{k}={repr(v)}" for k, v in self.__dict__.items()])
) )
def __eq__(self, other: object) -> bool: def __eq__(self, other: Any) -> bool:
return isinstance(other, Track) and self.id == other.id return isinstance(other, Track) and self.id == other.id
def download(
self,
session: Session,
prepare_drm: partial,
progress: Optional[partial] = None
):
"""Download and optionally Decrypt this Track."""
from devine.core.manifests import DASH, HLS
if DOWNLOAD_LICENCE_ONLY.is_set():
progress(downloaded="[yellow]SKIPPING")
if DOWNLOAD_CANCELLED.is_set():
progress(downloaded="[yellow]SKIPPED")
return
log = logging.getLogger("track")
proxy = next(iter(session.proxies.values()), None)
track_type = self.__class__.__name__
save_path = config.directories.temp / f"{track_type}_{self.id}.mp4"
if track_type == "Subtitle":
save_path = save_path.with_suffix(f".{self.codec.extension}")
if self.descriptor != self.Descriptor.URL:
save_dir = save_path.with_name(save_path.name + "_segments")
else:
save_dir = save_path.parent
def cleanup():
# track file (e.g., "foo.mp4")
save_path.unlink(missing_ok=True)
# aria2c control file (e.g., "foo.mp4.aria2" or "foo.mp4.aria2__temp")
save_path.with_suffix(f"{save_path.suffix}.aria2").unlink(missing_ok=True)
save_path.with_suffix(f"{save_path.suffix}.aria2__temp").unlink(missing_ok=True)
if save_dir.exists() and save_dir.name.endswith("_segments"):
shutil.rmtree(save_dir)
if not DOWNLOAD_LICENCE_ONLY.is_set():
if config.directories.temp.is_file():
raise ValueError(f"Temp Directory '{config.directories.temp}' must be a Directory, not a file")
config.directories.temp.mkdir(parents=True, exist_ok=True)
# Delete any pre-existing temp files matching this track.
# We can't re-use or continue downloading these tracks as they do not use a
# lock file. Or at least the majority don't. Even if they did I've encountered
# corruptions caused by sudden interruptions to the lock file.
cleanup()
try:
if self.descriptor == self.Descriptor.HLS:
HLS.download_track(
track=self,
save_path=save_path,
save_dir=save_dir,
progress=progress,
session=session,
proxy=proxy,
license_widevine=prepare_drm
)
elif self.descriptor == self.Descriptor.DASH:
DASH.download_track(
track=self,
save_path=save_path,
save_dir=save_dir,
progress=progress,
session=session,
proxy=proxy,
license_widevine=prepare_drm
)
elif self.descriptor == self.Descriptor.URL:
try:
if not self.drm and track_type in ("Video", "Audio"):
# the service might not have explicitly defined the `drm` property
# try find widevine DRM information from the init data of URL
try:
self.drm = [Widevine.from_track(self, session)]
except Widevine.Exceptions.PSSHNotFound:
# it might not have Widevine DRM, or might not have found the PSSH
log.warning("No Widevine PSSH was found for this track, is it DRM free?")
if self.drm:
track_kid = self.get_key_id(session=session)
drm = self.drm[0] # just use the first supported DRM system for now
if isinstance(drm, Widevine):
# license and grab content keys
if not prepare_drm:
raise ValueError("prepare_drm func must be supplied to use Widevine DRM")
progress(downloaded="LICENSING")
prepare_drm(drm, track_kid=track_kid)
progress(downloaded="[yellow]LICENSED")
else:
drm = None
if DOWNLOAD_LICENCE_ONLY.is_set():
progress(downloaded="[yellow]SKIPPED")
else:
for status_update in self.downloader(
urls=self.url,
output_dir=save_path.parent,
filename=save_path.name,
headers=session.headers,
cookies=session.cookies,
proxy=proxy
):
file_downloaded = status_update.get("file_downloaded")
if not file_downloaded:
progress(**status_update)
# see https://github.com/devine-dl/devine/issues/71
save_path.with_suffix(f"{save_path.suffix}.aria2__temp").unlink(missing_ok=True)
self.path = save_path
if callable(self.OnDownloaded):
self.OnDownloaded()
if drm:
progress(downloaded="Decrypting", completed=0, total=100)
drm.decrypt(save_path)
self.drm = None
if callable(self.OnDecrypted):
self.OnDecrypted(drm)
progress(downloaded="Decrypted", completed=100)
if track_type == "Subtitle" and self.codec.name not in ("fVTT", "fTTML"):
track_data = self.path.read_bytes()
track_data = try_ensure_utf8(track_data)
track_data = track_data.decode("utf8"). \
replace("&lrm;", html.unescape("&lrm;")). \
replace("&rlm;", html.unescape("&rlm;")). \
encode("utf8")
self.path.write_bytes(track_data)
progress(downloaded="Downloaded")
except KeyboardInterrupt:
DOWNLOAD_CANCELLED.set()
progress(downloaded="[yellow]CANCELLED")
raise
except Exception:
DOWNLOAD_CANCELLED.set()
progress(downloaded="[red]FAILED")
raise
except (Exception, KeyboardInterrupt):
if not DOWNLOAD_LICENCE_ONLY.is_set():
cleanup()
raise
if DOWNLOAD_CANCELLED.is_set():
# we stopped during the download, let's exit
return
if not DOWNLOAD_LICENCE_ONLY.is_set():
if self.path.stat().st_size <= 3: # Empty UTF-8 BOM == 3 bytes
raise IOError("Download failed, the downloaded file is empty.")
if callable(self.OnDownloaded):
self.OnDownloaded(self)
def delete(self) -> None:
if self.path:
self.path.unlink()
self.path = None
def move(self, target: Union[Path, str]) -> Path:
"""
Move the Track's file from current location, to target location.
This will overwrite anything at the target path.
Raises:
TypeError: If the target argument is not the expected type.
ValueError: If track has no file to move, or the target does not exist.
OSError: If the file somehow failed to move.
Returns the new location of the track.
"""
if not isinstance(target, (str, Path)):
raise TypeError(f"Expected {target} to be a {Path} or {str}, not {type(target)}")
if not self.path:
raise ValueError("Track has no file to move")
if not isinstance(target, Path):
target = Path(target)
if not target.exists():
raise ValueError(f"Target file {repr(target)} does not exist")
moved_to = Path(shutil.move(self.path, target))
if moved_to.resolve() != target.resolve():
raise OSError(f"Failed to move {self.path} to {target}")
self.path = target
return target
def get_track_name(self) -> Optional[str]: def get_track_name(self) -> Optional[str]:
"""Return the base Track Name. This may be enhanced in sub-classes.""" """Get the Track Name."""
if (self.language.language or "").lower() == (self.language.territory or "").lower(): return self.name
self.language.territory = None # e.g. en-en, de-DE
if self.language.territory == "US":
self.language.territory = None
reduced = self.language.simplify_script()
extra_parts = []
if reduced.script is not None:
extra_parts.append(reduced.script_name(max_distance=25))
if reduced.territory is not None:
territory = reduced.territory_name(max_distance=25)
extra_parts.append(TERRITORY_MAP.get(territory, territory))
return ", ".join(extra_parts) or None
def get_key_id(self, init_data: Optional[bytes] = None, *args, **kwargs) -> Optional[UUID]: def get_key_id(self, init_data: Optional[bytes] = None, *args, **kwargs) -> Optional[UUID]:
""" """
@ -109,7 +366,6 @@ class Track:
if not isinstance(init_data, bytes): if not isinstance(init_data, bytes):
raise TypeError(f"Expected init_data to be bytes, not {init_data!r}") raise TypeError(f"Expected init_data to be bytes, not {init_data!r}")
# try get via ffprobe, needed for non mp4 data e.g. WEBM from Google Play
probe = ffprobe(init_data) probe = ffprobe(init_data)
if probe: if probe:
for stream in probe.get("streams") or []: for stream in probe.get("streams") or []:
@ -117,14 +373,12 @@ class Track:
if enc_key_id: if enc_key_id:
return UUID(bytes=base64.b64decode(enc_key_id)) return UUID(bytes=base64.b64decode(enc_key_id))
# look for track encryption mp4 boxes
for tenc in get_boxes(init_data, b"tenc"): for tenc in get_boxes(init_data, b"tenc"):
if tenc.key_ID.int != 0: if tenc.key_ID.int != 0:
return tenc.key_ID return tenc.key_ID
# look for UUID mp4 boxes holding track encryption mp4 boxes
for uuid_box in get_boxes(init_data, b"uuid"): for uuid_box in get_boxes(init_data, b"uuid"):
if uuid_box.extended_type == UUID("8974dbce-7be7-4c51-84f9-7148f9882554"): if uuid_box.extended_type == UUID("8974dbce-7be7-4c51-84f9-7148f9882554"): # tenc
tenc = uuid_box.data tenc = uuid_box.data
if tenc.key_ID.int != 0: if tenc.key_ID.int != 0:
return tenc.key_ID return tenc.key_ID
@ -134,7 +388,7 @@ class Track:
maximum_size: int = 20000, maximum_size: int = 20000,
url: Optional[str] = None, url: Optional[str] = None,
byte_range: Optional[str] = None, byte_range: Optional[str] = None,
session: Optional[requests.Session] = None session: Optional[Session] = None
) -> bytes: ) -> bytes:
""" """
Get the Track's Initial Segment Data Stream. Get the Track's Initial Segment Data Stream.
@ -158,20 +412,24 @@ class Track:
byte_range: Range of bytes to download from the explicit or implicit URL. byte_range: Range of bytes to download from the explicit or implicit URL.
session: Session context, e.g., authorization and headers. session: Session context, e.g., authorization and headers.
""" """
if not session: if not isinstance(maximum_size, int):
session = requests.Session() raise TypeError(f"Expected maximum_size to be an {int}, not {type(maximum_size)}")
if not isinstance(url, (str, type(None))):
raise TypeError(f"Expected url to be a {str}, not {type(url)}")
if not isinstance(byte_range, (str, type(None))):
raise TypeError(f"Expected byte_range to be a {str}, not {type(byte_range)}")
if not isinstance(session, (Session, type(None))):
raise TypeError(f"Expected session to be a {Session}, not {type(session)}")
if self.descriptor != self.Descriptor.URL and not url:
# We cannot know which init map from the HLS or DASH playlist is actually used.
# For DASH this could be from any adaptation set, any period, e.t.c.
# For HLS we could make some assumptions, but it's best that it is explicitly provided.
raise ValueError(
f"An explicit URL to an init map or file must be provided for {self.descriptor.name} tracks."
)
url = url or self.url
if not url: if not url:
raise ValueError("The track must have an URL to point towards it's data.") if self.descriptor != self.Descriptor.URL:
raise ValueError(f"An explicit URL must be provided for {self.descriptor.name} tracks")
if not self.url:
raise ValueError("An explicit URL must be provided as the track has no URL")
url = self.url
if not session:
session = Session()
content_length = maximum_size content_length = maximum_size
@ -188,7 +446,6 @@ class Track:
if "Content-Length" in size_test.headers: if "Content-Length" in size_test.headers:
content_length_header = int(size_test.headers["Content-Length"]) content_length_header = int(size_test.headers["Content-Length"])
if content_length_header > 0: if content_length_header > 0:
# use whichever is smaller in case this is a large file
content_length = min(content_length_header, maximum_size) content_length = min(content_length_header, maximum_size)
range_test = session.head(url, headers={"Range": "bytes=0-1"}) range_test = session.head(url, headers={"Range": "bytes=0-1"})
if range_test.status_code == 206: if range_test.status_code == 206:
@ -204,8 +461,6 @@ class Track:
res.raise_for_status() res.raise_for_status()
init_data = res.content init_data = res.content
else: else:
# Take advantage of streaming support to take just the first n bytes
# This is a hacky alternative to HTTP's Range on unsupported servers
init_data = None init_data = None
with session.get(url, stream=True) as s: with session.get(url, stream=True) as s:
for chunk in s.iter_content(content_length): for chunk in s.iter_content(content_length):
@ -216,11 +471,6 @@ class Track:
return init_data return init_data
def delete(self) -> None:
if self.path:
self.path.unlink()
self.path = None
def repackage(self) -> None: def repackage(self) -> None:
if not self.path or not self.path.exists(): if not self.path or not self.path.exists():
raise ValueError("Cannot repackage a Track that has not been downloaded.") raise ValueError("Cannot repackage a Track that has not been downloaded.")
@ -259,36 +509,7 @@ class Track:
else: else:
raise raise
self.swap(output_path) self.path = output_path
self.move(original_path)
def move(self, target: Union[str, Path]) -> bool:
"""
Move the Track's file from current location, to target location.
This will overwrite anything at the target path.
"""
if not self.path:
return False
target = Path(target)
ok = Path(shutil.move(self.path, target)).resolve() == target.resolve()
if ok:
self.path = target
return ok
def swap(self, target: Union[str, Path]) -> bool:
"""
Swaps the Track's file with the Target file. The current Track's file is deleted.
Returns False if the Track is not yet downloaded, or the target path does not exist.
"""
target = Path(target)
if not target.exists() or not self.path:
return False
self.path.unlink()
ok = Path(shutil.move(target, self.path)).resolve() == self.path.resolve()
if not ok:
return False
return self.move(target)
__all__ = ("Track",) __all__ = ("Track",)

View File

@ -6,7 +6,6 @@ from functools import partial
from pathlib import Path from pathlib import Path
from typing import Callable, Iterator, Optional, Sequence, Union from typing import Callable, Iterator, Optional, Sequence, Union
from Cryptodome.Random import get_random_bytes
from langcodes import Language, closest_supported_match from langcodes import Language, closest_supported_match
from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeRemainingColumn from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeRemainingColumn
from rich.table import Table from rich.table import Table
@ -14,9 +13,9 @@ from rich.tree import Tree
from devine.core.config import config from devine.core.config import config
from devine.core.console import console from devine.core.console import console
from devine.core.constants import LANGUAGE_MAX_DISTANCE, LANGUAGE_MUX_MAP, AnyTrack, TrackT from devine.core.constants import LANGUAGE_MAX_DISTANCE, AnyTrack, TrackT
from devine.core.tracks.audio import Audio from devine.core.tracks.audio import Audio
from devine.core.tracks.chapter import Chapter from devine.core.tracks.chapters import Chapter, Chapters
from devine.core.tracks.subtitle import Subtitle from devine.core.tracks.subtitle import Subtitle
from devine.core.tracks.track import Track from devine.core.tracks.track import Track
from devine.core.tracks.video import Video from devine.core.tracks.video import Video
@ -37,11 +36,11 @@ class Tracks:
Chapter: 3 Chapter: 3
} }
def __init__(self, *args: Union[Tracks, list[Track], Track]): def __init__(self, *args: Union[Tracks, Sequence[Union[AnyTrack, Chapter, Chapters]], Track, Chapter, Chapters]):
self.videos: list[Video] = [] self.videos: list[Video] = []
self.audio: list[Audio] = [] self.audio: list[Audio] = []
self.subtitles: list[Subtitle] = [] self.subtitles: list[Subtitle] = []
self.chapters: list[Chapter] = [] self.chapters = Chapters()
if args: if args:
self.add(args) self.add(args)
@ -52,6 +51,13 @@ class Tracks:
def __len__(self) -> int: def __len__(self) -> int:
return len(self.videos) + len(self.audio) + len(self.subtitles) return len(self.videos) + len(self.audio) + len(self.subtitles)
def __add__(
self,
other: Union[Tracks, Sequence[Union[AnyTrack, Chapter, Chapters]], Track, Chapter, Chapters]
) -> Tracks:
self.add(other)
return self
def __repr__(self) -> str: def __repr__(self) -> str:
return "{name}({items})".format( return "{name}({items})".format(
name=self.__class__.__name__, name=self.__class__.__name__,
@ -137,7 +143,7 @@ class Tracks:
def add( def add(
self, self,
tracks: Union[Tracks, Sequence[Union[AnyTrack, Chapter]], Track, Chapter], tracks: Union[Tracks, Sequence[Union[AnyTrack, Chapter, Chapters]], Track, Chapter, Chapters],
warn_only: bool = False warn_only: bool = False
) -> None: ) -> None:
"""Add a provided track to its appropriate array and ensuring it's not a duplicate.""" """Add a provided track to its appropriate array and ensuring it's not a duplicate."""
@ -166,7 +172,7 @@ class Tracks:
elif isinstance(track, Subtitle): elif isinstance(track, Subtitle):
self.subtitles.append(track) self.subtitles.append(track)
elif isinstance(track, Chapter): elif isinstance(track, Chapter):
self.chapters.append(track) self.chapters.add(track)
else: else:
raise ValueError("Track type was not set or is invalid.") raise ValueError("Track type was not set or is invalid.")
@ -243,13 +249,6 @@ class Tracks:
continue continue
self.subtitles.sort(key=lambda x: is_close_match(language, [x.language]), reverse=True) self.subtitles.sort(key=lambda x: is_close_match(language, [x.language]), reverse=True)
def sort_chapters(self) -> None:
"""Sort chapter tracks by chapter number."""
if not self.chapters:
return
# number
self.chapters.sort(key=lambda x: x.number)
def select_video(self, x: Callable[[Video], bool]) -> None: def select_video(self, x: Callable[[Video], bool]) -> None:
self.videos = list(filter(x, self.videos)) self.videos = list(filter(x, self.videos))
@ -289,16 +288,6 @@ class Tracks:
][:per_language or None]) ][:per_language or None])
return selected return selected
def export_chapters(self, to_file: Optional[Union[Path, str]] = None) -> str:
"""Export all chapters in order to a string or file."""
self.sort_chapters()
data = "\n".join(map(repr, self.chapters))
if to_file:
to_file = Path(to_file)
to_file.parent.mkdir(parents=True, exist_ok=True)
to_file.write_text(data, encoding="utf8")
return data
def mux(self, title: str, delete: bool = True, progress: Optional[partial] = None) -> tuple[Path, int]: def mux(self, title: str, delete: bool = True, progress: Optional[partial] = None) -> tuple[Path, int]:
""" """
Multiplex all the Tracks into a Matroska Container file. Multiplex all the Tracks into a Matroska Container file.
@ -322,11 +311,9 @@ class Tracks:
if not vt.path or not vt.path.exists(): if not vt.path or not vt.path.exists():
raise ValueError("Video Track must be downloaded before muxing...") raise ValueError("Video Track must be downloaded before muxing...")
if callable(vt.OnMultiplex): if callable(vt.OnMultiplex):
vt.OnMultiplex(vt) vt.OnMultiplex()
cl.extend([ cl.extend([
"--language", "0:{}".format(LANGUAGE_MUX_MAP.get( "--language", f"0:{vt.language}",
str(vt.language), str(vt.language)
)),
"--default-track", f"0:{i == 0}", "--default-track", f"0:{i == 0}",
"--original-flag", f"0:{vt.is_original_lang}", "--original-flag", f"0:{vt.is_original_lang}",
"--compression", "0:none", # disable extra compression "--compression", "0:none", # disable extra compression
@ -337,12 +324,10 @@ class Tracks:
if not at.path or not at.path.exists(): if not at.path or not at.path.exists():
raise ValueError("Audio Track must be downloaded before muxing...") raise ValueError("Audio Track must be downloaded before muxing...")
if callable(at.OnMultiplex): if callable(at.OnMultiplex):
at.OnMultiplex(at) at.OnMultiplex()
cl.extend([ cl.extend([
"--track-name", f"0:{at.get_track_name() or ''}", "--track-name", f"0:{at.get_track_name() or ''}",
"--language", "0:{}".format(LANGUAGE_MUX_MAP.get( "--language", f"0:{at.language}",
str(at.language), str(at.language)
)),
"--default-track", f"0:{i == 0}", "--default-track", f"0:{i == 0}",
"--visual-impaired-flag", f"0:{at.descriptive}", "--visual-impaired-flag", f"0:{at.descriptive}",
"--original-flag", f"0:{at.is_original_lang}", "--original-flag", f"0:{at.is_original_lang}",
@ -354,13 +339,11 @@ class Tracks:
if not st.path or not st.path.exists(): if not st.path or not st.path.exists():
raise ValueError("Text Track must be downloaded before muxing...") raise ValueError("Text Track must be downloaded before muxing...")
if callable(st.OnMultiplex): if callable(st.OnMultiplex):
st.OnMultiplex(st) st.OnMultiplex()
default = bool(self.audio and is_close_match(st.language, [self.audio[0].language]) and st.forced) default = bool(self.audio and is_close_match(st.language, [self.audio[0].language]) and st.forced)
cl.extend([ cl.extend([
"--track-name", f"0:{st.get_track_name() or ''}", "--track-name", f"0:{st.get_track_name() or ''}",
"--language", "0:{}".format(LANGUAGE_MUX_MAP.get( "--language", f"0:{st.language}",
str(st.language), str(st.language)
)),
"--sub-charset", "0:UTF-8", "--sub-charset", "0:UTF-8",
"--forced-track", f"0:{st.forced}", "--forced-track", f"0:{st.forced}",
"--default-track", f"0:{default}", "--default-track", f"0:{default}",
@ -373,9 +356,9 @@ class Tracks:
if self.chapters: if self.chapters:
chapters_path = config.directories.temp / config.filenames.chapters.format( chapters_path = config.directories.temp / config.filenames.chapters.format(
title=sanitize_filename(title), title=sanitize_filename(title),
random=get_random_bytes(16).hex() random=self.chapters.id
) )
self.export_chapters(chapters_path) self.chapters.dump(chapters_path, fallback_name=config.chapter_fallback_name)
cl.extend(["--chapter-charset", "UTF-8", "--chapters", str(chapters_path)]) cl.extend(["--chapter-charset", "UTF-8", "--chapters", str(chapters_path)])
else: else:
chapters_path = None chapters_path = None

View File

@ -200,8 +200,8 @@ class Video(Track):
str(output_path) str(output_path)
], check=True) ], check=True)
self.swap(output_path) self.path = output_path
self.move(original_path) original_path.unlink()
def ccextractor( def ccextractor(
self, track_id: Any, out_path: Union[Path, str], language: Language, original: bool = False self, track_id: Any, out_path: Union[Path, str], language: Language, original: bool = False
@ -321,11 +321,12 @@ class Video(Track):
i = file.index(b"x264") i = file.index(b"x264")
encoding_settings = file[i: i + file[i:].index(b"\x00")].replace(b":", br"\\:").replace(b",", br"\,").decode() encoding_settings = file[i: i + file[i:].index(b"\x00")].replace(b":", br"\\:").replace(b",", br"\,").decode()
cleaned_path = self.path.with_suffix(f".cleaned{self.path.suffix}") original_path = self.path
cleaned_path = original_path.with_suffix(f".cleaned{original_path.suffix}")
subprocess.run([ subprocess.run([
executable, "-hide_banner", executable, "-hide_banner",
"-loglevel", "panic", "-loglevel", "panic",
"-i", self.path, "-i", original_path,
"-map_metadata", "-1", "-map_metadata", "-1",
"-fflags", "bitexact", "-fflags", "bitexact",
"-bsf:v", f"filter_units=remove_types=6,h264_metadata=sei_user_data={uuid}+{encoding_settings}", "-bsf:v", f"filter_units=remove_types=6,h264_metadata=sei_user_data={uuid}+{encoding_settings}",
@ -335,7 +336,8 @@ class Video(Track):
log.info(" + Removed") log.info(" + Removed")
self.swap(cleaned_path) self.path = cleaned_path
original_path.unlink()
return True return True

View File

@ -1,8 +1,10 @@
import ast import ast
import contextlib import contextlib
import importlib.util import importlib.util
import os
import re import re
import shutil import shutil
import socket
import sys import sys
import time import time
import unicodedata import unicodedata
@ -10,11 +12,10 @@ from collections import defaultdict
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
from types import ModuleType from types import ModuleType
from typing import AsyncIterator, Optional, Sequence, Union from typing import Optional, Sequence, Union
from urllib.parse import urlparse from urllib.parse import ParseResult, urlparse
import chardet import chardet
import pproxy
import requests import requests
from construct import ValidationError from construct import ValidationError
from langcodes import Language, closest_match from langcodes import Language, closest_match
@ -244,35 +245,36 @@ def try_ensure_utf8(data: bytes) -> bytes:
return data return data
@contextlib.asynccontextmanager def get_free_port() -> int:
async def start_pproxy(proxy: str) -> AsyncIterator[str]: """
proxy = urlparse(proxy) Get an available port to use between a-b (inclusive).
scheme = { The port is freed as soon as this has returned, therefore, it
"https": "http+ssl", is possible for the port to be taken before you try to use it.
"socks5h": "socks" """
}.get(proxy.scheme, proxy.scheme) with contextlib.closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
s.bind(("", 0))
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return s.getsockname()[1]
remote_server = f"{scheme}://{proxy.hostname}"
if proxy.port:
remote_server += f":{proxy.port}"
if proxy.username or proxy.password:
remote_server += "#"
if proxy.username:
remote_server += proxy.username
if proxy.password:
remote_server += f":{proxy.password}"
server = pproxy.Server("http://localhost:0") # random port def get_extension(value: Union[str, Path, ParseResult]) -> Optional[str]:
remote = pproxy.Connection(remote_server) """
handler = await server.start_server({"rserver": [remote]}) Get a URL or Path file extension/suffix.
try: Note: The returned value will begin with `.`.
port = handler.sockets[0].getsockname()[1] """
yield f"http://localhost:{port}" if isinstance(value, ParseResult):
finally: value_parsed = value
handler.close() elif isinstance(value, (str, Path)):
await handler.wait_closed() value_parsed = urlparse(str(value))
else:
raise TypeError(f"Expected {str}, {Path}, or {ParseResult}, got {type(value)}")
if value_parsed.path:
ext = os.path.splitext(value_parsed.path)[1]
if ext and ext != ".":
return ext
class FPS(ast.NodeVisitor): class FPS(ast.NodeVisitor):

View File

@ -1,7 +1,8 @@
import re import re
from typing import Optional, Union from typing import Any, Optional, Union
import click import click
from click.shell_completion import CompletionItem
from pywidevine.cdm import Cdm as WidevineCdm from pywidevine.cdm import Cdm as WidevineCdm
@ -122,6 +123,62 @@ class QualityList(click.ParamType):
return sorted(resolutions, reverse=True) return sorted(resolutions, reverse=True)
class MultipleChoice(click.Choice):
"""
The multiple choice type allows multiple values to be checked against
a fixed set of supported values.
It internally uses and is based off of click.Choice.
"""
name = "multiple_choice"
def __repr__(self) -> str:
return f"MultipleChoice({list(self.choices)})"
def convert(
self,
value: Any,
param: Optional[click.Parameter] = None,
ctx: Optional[click.Context] = None
) -> list[Any]:
if not value:
return []
if isinstance(value, str):
values = value.split(",")
elif isinstance(value, list):
values = value
else:
self.fail(
f"{value!r} is not a supported value.",
param,
ctx
)
chosen_values: list[Any] = []
for value in values:
chosen_values.append(super().convert(value, param, ctx))
return chosen_values
def shell_complete(
self,
ctx: click.Context,
param: click.Parameter,
incomplete: str
) -> list[CompletionItem]:
"""
Complete choices that start with the incomplete value.
Parameters:
ctx: Invocation context for this command.
param: The parameter that is requesting completion.
incomplete: Value being completed. May be empty.
"""
incomplete = incomplete.rsplit(",")[-1]
return super(self).shell_complete(ctx, param, incomplete)
SEASON_RANGE = SeasonRange() SEASON_RANGE = SeasonRange()
LANGUAGE_RANGE = LanguageRange() LANGUAGE_RANGE = LanguageRange()
QUALITY_LIST = QualityList() QUALITY_LIST = QualityList()

214
devine/vaults/API.py Normal file
View File

@ -0,0 +1,214 @@
from typing import Iterator, Optional, Union
from uuid import UUID
from requests import Session
from devine.core import __version__
from devine.core.vault import Vault
class API(Vault):
"""Key Vault using a simple RESTful HTTP API call."""
def __init__(self, name: str, uri: str, token: str):
super().__init__(name)
self.uri = uri.rstrip("/")
self.session = Session()
self.session.headers.update({
"User-Agent": f"Devine v{__version__}"
})
self.session.headers.update({
"Authorization": f"Bearer {token}"
})
def get_key(self, kid: Union[UUID, str], service: str) -> Optional[str]:
if isinstance(kid, UUID):
kid = kid.hex
data = self.session.get(
url=f"{self.uri}/{service.lower()}/{kid}",
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.ServiceTagInvalid,
4: Exceptions.KeyIdInvalid
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
content_key = data.get("content_key")
if not content_key:
return None
if not isinstance(content_key, str):
raise ValueError(f"Expected {content_key} to be {str}, was {type(content_key)}")
return content_key
def get_keys(self, service: str) -> Iterator[tuple[str, str]]:
page = 1
while True:
data = self.session.get(
url=f"{self.uri}/{service.lower()}",
params={
"page": page,
"total": 10
},
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.PageInvalid,
4: Exceptions.ServiceTagInvalid,
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
content_keys = data.get("content_keys")
if content_keys:
if not isinstance(content_keys, dict):
raise ValueError(f"Expected {content_keys} to be {dict}, was {type(content_keys)}")
for key_id, key in content_keys.items():
yield key_id, key
pages = int(data["pages"])
if pages <= page:
break
page += 1
def add_key(self, service: str, kid: Union[UUID, str], key: str) -> bool:
if isinstance(kid, UUID):
kid = kid.hex
data = self.session.post(
url=f"{self.uri}/{service.lower()}/{kid}",
json={
"content_key": key
},
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.ServiceTagInvalid,
4: Exceptions.KeyIdInvalid,
5: Exceptions.ContentKeyInvalid
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
# the kid:key was new to the vault (optional)
added = bool(data.get("added"))
# the key for kid was changed/updated (optional)
updated = bool(data.get("updated"))
return added or updated
def add_keys(self, service: str, kid_keys: dict[Union[UUID, str], str]) -> int:
data = self.session.post(
url=f"{self.uri}/{service.lower()}",
json={
"content_keys": {
str(kid).replace("-", ""): key
for kid, key in kid_keys.items()
}
},
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
3: Exceptions.ServiceTagInvalid,
4: Exceptions.KeyIdInvalid,
5: Exceptions.ContentKeyInvalid
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
# each kid:key that was new to the vault (optional)
added = int(data.get("added"))
# each key for a kid that was changed/updated (optional)
updated = int(data.get("updated"))
return added + updated
def get_services(self) -> Iterator[str]:
data = self.session.post(
url=self.uri,
headers={
"Accept": "application/json"
}
).json()
code = int(data.get("code", 0))
message = data.get("message")
error = {
0: None,
1: Exceptions.AuthRejected,
2: Exceptions.TooManyRequests,
}.get(code, ValueError)
if error:
raise error(f"{message} ({code})")
service_list = data.get("service_list", [])
if not isinstance(service_list, list):
raise ValueError(f"Expected {service_list} to be {list}, was {type(service_list)}")
for service in service_list:
yield service
class Exceptions:
class AuthRejected(Exception):
"""Authentication Error Occurred, is your token valid? Do you have permission to make this call?"""
class TooManyRequests(Exception):
"""Rate Limited; Sent too many requests in a given amount of time."""
class PageInvalid(Exception):
"""Requested page does not exist."""
class ServiceTagInvalid(Exception):
"""The Service Tag is invalid."""
class KeyIdInvalid(Exception):
"""The Key ID is invalid."""
class ContentKeyInvalid(Exception):
"""The Content Key is invalid."""

1413
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -4,8 +4,8 @@ build-backend = "poetry.core.masonry.api"
[tool.poetry] [tool.poetry]
name = "devine" name = "devine"
version = "2.2.0" version = "3.1.0"
description = "Open-Source Movie, TV, and Music Downloading Solution." description = "Modular Movie, TV, and Music Archival Software."
license = "GPL-3.0-only" license = "GPL-3.0-only"
authors = ["rlaphoenix <rlaphoenix@pm.me>"] authors = ["rlaphoenix <rlaphoenix@pm.me>"]
readme = "README.md" readme = "README.md"
@ -39,39 +39,43 @@ Brotli = "^1.1.0"
click = "^8.1.7" click = "^8.1.7"
construct = "^2.8.8" construct = "^2.8.8"
crccheck = "^1.3.0" crccheck = "^1.3.0"
jsonpickle = "^3.0.2" jsonpickle = "^3.0.3"
langcodes = { extras = ["data"], version = "^3.3.0" } langcodes = { extras = ["data"], version = "^3.3.0" }
lxml = "^4.9.3" lxml = "^5.1.0"
pproxy = "^2.7.8" pproxy = "^2.7.9"
protobuf = "^4.24.4" protobuf = "^4.25.3"
pycaption = "^2.2.0" pycaption = "^2.2.4"
pycryptodomex = "^3.19.0" pycryptodomex = "^3.20.0"
pyjwt = "^2.8.0" pyjwt = "^2.8.0"
pymediainfo = "^6.1.0" pymediainfo = "^6.1.0"
pymp4 = "^1.4.0" pymp4 = "^1.4.0"
pymysql = "^1.1.0" pymysql = "^1.1.0"
pywidevine = { extras = ["serve"], version = "^1.7.0" } pywidevine = { extras = ["serve"], version = "^1.8.0" }
PyYAML = "^6.0.1" PyYAML = "^6.0.1"
requests = { extras = ["socks"], version = "^2.31.0" } requests = { extras = ["socks"], version = "^2.31.0" }
rich = "^13.7.0" rich = "^13.7.1"
"rlaphoenix.m3u8" = "^3.4.0" "rlaphoenix.m3u8" = "^3.4.0"
"ruamel.yaml" = "^0.17.40" "ruamel.yaml" = "^0.18.6"
sortedcontainers = "^2.4.0" sortedcontainers = "^2.4.0"
subtitle-filter = "^1.4.8" subtitle-filter = "^1.4.8"
Unidecode = "^1.3.7" Unidecode = "^1.3.8"
urllib3 = "^2.1.0" urllib3 = "^2.2.1"
chardet = "^5.2.0" chardet = "^5.2.0"
curl-cffi = "^0.5.10" curl-cffi = "^0.6.1"
# Temporary explicit versions of these langcodes dependencies as language-data v1.1
# uses marisa-trie v0.7.8 which doesn't have Python 3.12 wheels.
language-data = "^1.2.0.dev3"
marisa-trie = "^1.1.0"
[tool.poetry.dev-dependencies] [tool.poetry.dev-dependencies]
pre-commit = "^3.5.0" pre-commit = "^3.6.2"
mypy = "^1.7.1" mypy = "^1.8.0"
mypy-protobuf = "^3.5.0" mypy-protobuf = "^3.5.0"
types-protobuf = "^4.24.0.4" types-protobuf = "^4.24.0.20240129"
types-PyMySQL = "^1.1.0.1" types-PyMySQL = "^1.1.0.1"
types-requests = "^2.31.0.10" types-requests = "^2.31.0.20240218"
isort = "^5.12.0" isort = "^5.13.2"
ruff = "~0.1.6" ruff = "~0.3.0"
[tool.poetry.scripts] [tool.poetry.scripts]
devine = "devine.core.__main__:main" devine = "devine.core.__main__:main"
@ -79,6 +83,8 @@ devine = "devine.core.__main__:main"
[tool.ruff] [tool.ruff]
force-exclude = true force-exclude = true
line-length = 120 line-length = 120
[tool.ruff.lint]
select = ["E4", "E7", "E9", "F", "W"] select = ["E4", "E7", "E9", "F", "W"]
[tool.isort] [tool.isort]