1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2025-07-20 09:56:36 +00:00

Compare commits

..

85 Commits

Author SHA1 Message Date
R0hanW
1a8474c3ca
[ie/PlayerFm] Add extractor (#13016)
Closes #4518
Authored by: R0hanW
2025-07-19 01:38:52 +02:00
bashonly
09982bc33e
[ie/dangalplay] Support other login regions (#13768)
Authored by: bashonly
2025-07-18 23:24:52 +00:00
Víctor Schmidt
c8329fc572
[ie/rai] Fix formats extraction (#13572)
Closes #13548
Authored by: moonshinerd, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-18 22:43:04 +00:00
bashonly
1f27a9f8ba
[core] Warn when skipping formats (#13090)
Authored by: bashonly
2025-07-18 21:59:50 +00:00
bashonly
4919051e44
[core] Don't let format testing alter the return code (#13767)
Closes #13750
Authored by: bashonly
2025-07-18 21:55:02 +00:00
bashonly
5f951ce929
[ie/aenetworks] Support new URL formats (#13747)
Closes #13745
Authored by: bashonly
2025-07-18 20:06:02 +00:00
bashonly
28bf46b7da
[utils] urlhandle_detect_ext: Use x-amz-meta-file-type headers (#13749)
Authored by: bashonly
2025-07-18 19:46:06 +00:00
bashonly
b8abd255e4
[utils] mimetype2ext: Always parse flac from audio/flac (#13748)
Authored by: bashonly
2025-07-18 19:43:40 +00:00
bashonly
c1ac543c81
[ie/soundcloud] Always extract original format extension (#13746)
Closes #13743
Authored by: bashonly
2025-07-16 23:19:58 +00:00
flanter21
dcc4cba39e
[ie/blackboardcollaborate] Support subtitles and authwalled videos (#12473)
Authored by: flanter21
2025-07-16 23:17:48 +00:00
Nikolay Fedorov
3a84be9d16
[ie/TheHighWire] Add extractor (#13505)
Closes #13364
Authored by: swayll
2025-07-14 19:01:53 +00:00
rdamas
d42a6ff0c4
[ie/archive.org] Fix extractor (#13706)
Closes #13704
Authored by: rdamas
2025-07-14 18:55:52 +00:00
bashonly
ade876efb3
[ie/francetv] Improve error handling (#13726)
Closes #13324
Authored by: bashonly
2025-07-14 17:25:45 +00:00
bashonly
7e0af2b1f0
[ie/hotstar] Improve error handling (#13727)
Authored by: bashonly
2025-07-14 17:24:52 +00:00
doe1080
d57a0b5aa7 [ie/noovo] Remove extractor (#13429)
Authored by: doe1080
2025-07-14 01:12:00 +02:00
doe1080
6fb3947c0d [ie/bellmedia] Remove extractor (#13429)
Authored by: doe1080
2025-07-14 01:12:00 +02:00
doe1080
9f54ea3898 [ie/ctv] Remove extractor (#13429)
Authored by: doe1080
2025-07-14 01:12:00 +02:00
chauhantirth
07d1d85f63
[ie/hotstar] Fix support for free accounts (#13700)
Fixes b5bd057fe86550f3aa67f2fc8790d1c6a251c57b

Closes #13600
Authored by: chauhantirth
2025-07-13 22:35:26 +00:00
doe1080
5d693446e8
[ie/limelight] Remove extractors (#13267)
Authored by: doe1080
2025-07-14 00:10:59 +02:00
doe1080
23e9389f93 [ie/bandaichannel] Remove extractor (#13152)
Closes #8829
Authored by: doe1080
2025-07-13 23:53:47 +02:00
doe1080
6d39c420f7 [ie/JoqrAg] Remove extractor (#13152)
Authored by: doe1080
2025-07-13 23:53:47 +02:00
barsnick
85c3fa1925
[ie/RaiSudtirol] Support alternative domain (#13718)
Authored by: barsnick
2025-07-13 23:35:10 +02:00
Povilas Balzaravičius
b4b4486eff
[ie/LRTRadio] Fix extractor (#13717)
Authored by: Pawka
2025-07-13 21:24:37 +00:00
Frank Cai
630f3389c3
[ie/UnitedNationsWebTv] Add extractor (#13538)
Closes #2675
Authored by: averageFOSSenjoyer
2025-07-13 23:16:01 +02:00
bashonly
a6db1d297a
[ie/vimeo] Handle age-restricted videos (#13719)
Closes #13716
Authored by: bashonly
2025-07-13 21:09:39 +00:00
ShockedPlot7560
0f33950c77
[ie/mixlr] Add extractors (#13561)
Authored by: ShockedPlot7560, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-13 01:35:51 +02:00
bashonly
b5fea53f20
[ie] Rework _search_nextjs_v13_data helper (#13711)
Fix 5245231e4a39ecd5595d4337d46d85e150e2430a

Authored by: bashonly
2025-07-12 23:12:05 +00:00
bashonly
5245231e4a
[ie] Add _search_nextjs_v13_data helper (#13398)
* Fixes FranceTVSiteIE livestream extraction
* Fixes GoPlayIE metadata extraction

Authored by: bashonly
2025-07-12 22:12:46 +00:00
Lyuben Ivanov
3ae61e0f31
[ie/BTVPlus] Add extractor (#13541)
Authored by: bubo
2025-07-12 21:56:11 +02:00
bashonly
a5d697f62d
[ie/vimeo] Fix extractor (#13692)
Closes #13180, Closes #13689
Authored by: bashonly
2025-07-12 19:23:22 +00:00
coletdjnz
6e5bee418b
[ie/youtube] Ensure context params are consistent for web clients (#13701)
Authored by: coletdjnz
2025-07-12 13:44:27 +12:00
coletdjnz
5b57b72c1a
[ie/youtube] Do not require PO Token for premium accounts (#13640)
Authored by: coletdjnz
2025-07-11 18:54:01 +12:00
doe1080
2aaf1aa71d
[ie/newspicks] Fix extractor (#13612)
Closes #10472
Authored by: doe1080
2025-07-09 22:21:47 +00:00
Nikolay Fedorov
7b4c96e089
[ie/mir24.tv] Add extractor (#13651)
Closes #13365
Authored by: swayll
2025-07-09 22:16:33 +00:00
bashonly
0b359b184d
[ie/9gag] Support browser impersonation (#13678)
Closes #10837
Authored by: bashonly
2025-07-09 21:58:19 +00:00
bashonly
805519bfaa
[jsinterp] Fix undefined variable name caching (#13677)
Fix b342d27f3f82d913976509ddf5bff539ad8567ec

Authored by: bashonly
2025-07-09 20:45:47 +00:00
coletdjnz
aa9f1f4d57
[ie/youtube] Log bad playability statuses of player responses (#13647)
Authored by: coletdjnz
2025-07-09 18:29:54 +12:00
InvalidUsernameException
fd36b8f31b
[test:download] Support playlist_maxcount (#13433)
Authored by: InvalidUsernameException
2025-07-08 04:19:03 +00:00
barsnick
99093e96fd
[devscripts] Fix filename/directory Bash completions (#13620)
Closes #13619
Authored by: barsnick
2025-07-08 04:18:15 +00:00
garret1317
7c49a93788
[ie/NhkRadiru] Fix metadata extraction (#12708)
Authored by: garret1317
2025-07-08 03:55:19 +00:00
bashonly
884f35d54a
[ie/BiliBiliBangumi] Fix geo-block detection (#13667)
Closes #13634
Authored by: bashonly
2025-07-08 03:54:27 +00:00
bashonly
c23d837b65
[ie/youtube:tab] Fix subscriptions feed extraction (#13665)
Adds support for LOCKUP_CONTENT_TYPE_VIDEO view models

Closes #13658
Authored by: bashonly
2025-07-07 20:25:34 +00:00
bashonly
a7113722ec
[fd/hls] Do not fall back to ffmpeg when native is required (#13655)
Authored by: bashonly
2025-07-06 22:14:22 +00:00
bashonly
0e68332bcb
[ie/youtube] Fix subtitles extraction (#13659)
Fixes regression introduced in 2ba5391cd68ed4f2415c827d2cecbcbc75ace10b

Closes #13654
Authored by: bashonly
2025-07-06 22:07:21 +00:00
bashonly
422cc8cb2f
[ie/twitch] Improve error handling (#13618)
Authored by: bashonly
2025-07-06 22:03:34 +00:00
bashonly
fca94ac5d6 [ie/youtube] Extract global nsig helper functions (#13639)
Authored by: bashonly, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-05 18:23:15 -05:00
bashonly
b342d27f3f [jsinterp] Cache undefined variable names (#13639)
Authored by: bashonly
2025-07-05 18:23:15 -05:00
bashonly
b6328ca050 [jsinterp] Fix variable scoping (#13639)
Authored by: bashonly, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-07-05 18:23:15 -05:00
bashonly
0b41746964
[ie/sproutvideo] Fix extractor (#13610)
Closes #13606
Authored by: bashonly
2025-07-02 13:21:06 +00:00
Simon Sawicki
c316416b97
[rh:requests] Do not allocate 2GB on read (#13603)
Fixes c2ff2dbaec7929015373fe002e9bd4849931a4ce

Authored by: Grub4K
2025-07-02 01:42:00 +02:00
Simon Sawicki
e99c0b838a
[ie] Detect invalid m3u8 playlist data (#13601)
Authored by: Grub4K
2025-07-02 00:32:32 +02:00
Simon Sawicki
c2ff2dbaec
[rh:requests] Work around partial read dropping data (#13599)
Authored by: Grub4K
2025-07-02 00:12:43 +02:00
sepro
ca5cce5b07
[cleanup] Bump ruff to 0.12.x (#13596)
Authored by: seproDev
2025-07-01 21:17:11 +02:00
sepro
f3008bc5f8
No longer enable --mtime by default (#12781)
Closes #12780
Authored by: seproDev
2025-07-01 13:23:53 +02:00
github-actions[bot]
30fa54280b Release 2025.06.30
Created by: bashonly

:ci skip all
2025-06-30 23:47:20 +00:00
bashonly
b018784498
[cleanup] Misc (#13590)
Authored by: bashonly
2025-06-30 23:44:42 +00:00
bashonly
11b9416e10
[ie/sproutvideo] Support browser impersonation (#13589)
Closes #13576
Authored by: bashonly
2025-06-30 23:37:56 +00:00
Clark
35fc33fbc5
[ie/sauceplus] Add extractor (#13567)
Authored by: ceandreasen, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-06-30 23:25:28 +00:00
helpimnotdrowning
b16722ede8
[ie/kick] Support subscriber-only content (#13550)
Closes #13442
Authored by: helpimnotdrowning
2025-06-30 23:24:04 +00:00
bashonly
500761e41a
[ie] Fix m3u8 playlist data corruption (#13588)
Revert 7b81634fb1d15999757e7a9883daa6ef09ea785b

Closes #13581
Authored by: bashonly
2025-06-30 23:06:22 +00:00
bashonly
2ba5391cd6
[ie/youtube] Fix premium formats extraction (#13586)
Fix ff6f94041aeee19c5559e1c1cd693960a1c1dd14

Closes #13545
Authored by: bashonly
2025-06-30 23:02:59 +00:00
bashonly
e9f157669e
[ie/hotstar] Fix formats extraction (#13585)
Fix b5bd057fe86550f3aa67f2fc8790d1c6a251c57b

Authored by: bashonly
2025-06-30 19:19:43 +00:00
sepro
958153a226
[jsinterp] Fix extract_object (#13580)
Fixes sig extraction for YouTube player `e12fbea4`

Authored by: seproDev
2025-06-30 15:50:33 +02:00
bashonly
1b88384634
[ci] Add signature tests (#13582)
Authored by: bashonly
2025-06-30 13:05:52 +00:00
Simon Sawicki
7b81634fb1
[ie] Detect invalid m3u8 playlist data (#13563)
Authored by: Grub4K
2025-06-29 18:49:27 +02:00
bashonly
7e2504f941
[ie/jiocinema] Remove extractors (#13565)
Closes #10123, Closes #10144, Closes #10225, Closes #10240, Closes #10508
Authored by: bashonly
2025-06-28 23:32:21 +00:00
bashonly
4bd9a7ade7
[ie/hotstar:series] Fix extractor (#13564)
* Removes HotStarSeasonIE and HotStarPlaylistIE

Authored by: bashonly
2025-06-28 23:30:51 +00:00
chauhantirth
b5bd057fe8
[ie/hotstar] Fix extractor (#13530)
Closes #11195
Authored by: chauhantirth, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-06-28 02:29:43 +00:00
bashonly
5e292baad6
[ie/hotstar] Raise for login required (#10405)
Closes #10366
Authored by: bashonly
2025-06-27 22:31:06 +00:00
bashonly
0a6b104489
[ie/hotstar] Fix metadata extraction (#13560)
Closes #7946
Authored by: bashonly
2025-06-27 22:29:37 +00:00
doe1080
06c1a8cdff
[ie/niconico:live] Fix extractor and downloader (#13158)
Authored by: doe1080
2025-06-26 17:45:03 +00:00
c-basalt
99b85ac102
[ie/BilibiliSpaceVideo] Extract hidden-mode collections as playlists (#13533)
Closes #13435
Authored by: c-basalt
2025-06-26 17:42:41 +00:00
github-actions[bot]
eff0759705 Release 2025.06.25
Created by: bashonly

:ci skip all
2025-06-25 23:53:38 +00:00
Anton Larionov
1838a1ce5d
[ie/mave] Add extractor (#13380)
Authored by: anlar
2025-06-25 23:51:20 +00:00
doe1080
2600849bad
[ie/huya:live] Fix extractor (#13520)
Authored by: doe1080
2025-06-25 23:37:49 +00:00
D Trombett
3bd3029160
[ie/tv8.it] Support slugless URLs (#13478)
Authored by: DTrombett
2025-06-25 23:26:23 +00:00
D Trombett
a4ce4327c9
[ie/SportDeutschland] Fix extractor (#13519)
Closes #13518
Authored by: DTrombett
2025-06-25 23:24:39 +00:00
Cæsim
c57412d1f9
[ie/lsm] Fix extractors (#13126)
Closes #12298
Authored by: Caesim404
2025-06-25 19:24:20 +00:00
bashonly
5b559d0072
[ie/sproutvideo] Fix extractor (#13544)
Closes #13540
Authored by: bashonly
2025-06-25 19:02:37 +00:00
bashonly
8f94b76cbf
[ie/youtube] Check any ios m3u8 formats prior to download (#13524)
Closes #13511
Authored by: bashonly
2025-06-25 18:32:57 +00:00
bashonly
ff6f94041a
[ie/youtube] Improve player context payloads (#13539)
Closes #12563
Authored by: bashonly
2025-06-25 17:10:00 +00:00
Simon Sawicki
73bf102116
[test] traversal: Fix morsel tests for Python 3.14 (#13471)
Authored by: Grub4K
2025-06-17 09:45:19 +02:00
doe1080
1722c55400
[ie/hypergryph] Improve metadata extraction (#13415)
Closes #13384
Authored by: doe1080, eason1478

Co-authored-by: eason1478 <134664337+eason1478@users.noreply.github.com>
2025-06-12 23:25:08 +00:00
doe1080
e6bd4a3da2
[ie/brightcove:new] Improve metadata extraction (#13461)
Authored by: doe1080
2025-06-12 23:16:48 +00:00
bashonly
51887484e4
[ie] Add _search_nuxt_json helper (#13386)
* Adds InfoExtractor._search_nuxt_json for webpage extraction
* Adds InfoExtractor._resolve_nuxt_array for direct use with payload JSON
* Adds yt_dlp.utils.jslib module for Python solutions to common JavaScript libraries
* Adds devalue.parse and devalue.parse_iter to jslib utils

Ref:
* 9e503be0f2
* f3fd2aa93d/src/parse.js

Authored by: bashonly, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.dev>
2025-06-12 22:15:01 +00:00
81 changed files with 3539 additions and 2181 deletions

41
.github/workflows/signature-tests.yml vendored Normal file
View File

@ -0,0 +1,41 @@
name: Signature Tests
on:
push:
paths:
- .github/workflows/signature-tests.yml
- test/test_youtube_signature.py
- yt_dlp/jsinterp.py
pull_request:
paths:
- .github/workflows/signature-tests.yml
- test/test_youtube_signature.py
- yt_dlp/jsinterp.py
permissions:
contents: read
concurrency:
group: signature-tests-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
jobs:
tests:
name: Signature Tests
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
python-version: ['3.9', '3.10', '3.11', '3.12', '3.13', pypy-3.10, pypy-3.11]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install test requirements
run: python3 ./devscripts/install_deps.py --only-optional --include test
- name: Run tests
timeout-minutes: 15
run: |
python3 -m yt_dlp -v || true # Print debug head
python3 ./devscripts/run_tests.py test/test_youtube_signature.py

View File

@ -779,3 +779,8 @@ brian6932
iednod55
maxbin123
nullpos
anlar
eason1478
ceandreasen
chauhantirth
helpimnotdrowning

View File

@ -4,6 +4,48 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2025.06.30
#### Core changes
- **jsinterp**: [Fix `extract_object`](https://github.com/yt-dlp/yt-dlp/commit/958153a226214c86879e36211ac191bf78289578) ([#13580](https://github.com/yt-dlp/yt-dlp/issues/13580)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **bilibilispacevideo**: [Extract hidden-mode collections as playlists](https://github.com/yt-dlp/yt-dlp/commit/99b85ac102047446e6adf5b62bfc3c8d80b53778) ([#13533](https://github.com/yt-dlp/yt-dlp/issues/13533)) by [c-basalt](https://github.com/c-basalt)
- **hotstar**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/b5bd057fe86550f3aa67f2fc8790d1c6a251c57b) ([#13530](https://github.com/yt-dlp/yt-dlp/issues/13530)) by [bashonly](https://github.com/bashonly), [chauhantirth](https://github.com/chauhantirth) (With fixes in [e9f1576](https://github.com/yt-dlp/yt-dlp/commit/e9f157669e24953a88d15ce22053649db7a8e81e) by [bashonly](https://github.com/bashonly))
- [Fix metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/0a6b1044899f452cd10b6c7a6b00fa985a9a8b97) ([#13560](https://github.com/yt-dlp/yt-dlp/issues/13560)) by [bashonly](https://github.com/bashonly)
- [Raise for login required](https://github.com/yt-dlp/yt-dlp/commit/5e292baad62c749b6c340621ab2d0f904165ddfb) ([#10405](https://github.com/yt-dlp/yt-dlp/issues/10405)) by [bashonly](https://github.com/bashonly)
- series: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/4bd9a7ade7e0508b9795b3e72a69eeb40788b62b) ([#13564](https://github.com/yt-dlp/yt-dlp/issues/13564)) by [bashonly](https://github.com/bashonly)
- **jiocinema**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/7e2504f941a11ea2b0dba00de3f0295cdc253e79) ([#13565](https://github.com/yt-dlp/yt-dlp/issues/13565)) by [bashonly](https://github.com/bashonly)
- **kick**: [Support subscriber-only content](https://github.com/yt-dlp/yt-dlp/commit/b16722ede83377f77ea8352dcd0a6ca8e83b8f0f) ([#13550](https://github.com/yt-dlp/yt-dlp/issues/13550)) by [helpimnotdrowning](https://github.com/helpimnotdrowning)
- **niconico**: live: [Fix extractor and downloader](https://github.com/yt-dlp/yt-dlp/commit/06c1a8cdffe14050206683253726875144192ef5) ([#13158](https://github.com/yt-dlp/yt-dlp/issues/13158)) by [doe1080](https://github.com/doe1080)
- **sauceplus**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/35fc33fbc51c7f5392fb2300f65abf6cf107ef90) ([#13567](https://github.com/yt-dlp/yt-dlp/issues/13567)) by [bashonly](https://github.com/bashonly), [ceandreasen](https://github.com/ceandreasen)
- **sproutvideo**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/11b9416e10cff7513167d76d6c47774fcdd3e26a) ([#13589](https://github.com/yt-dlp/yt-dlp/issues/13589)) by [bashonly](https://github.com/bashonly)
- **youtube**: [Fix premium formats extraction](https://github.com/yt-dlp/yt-dlp/commit/2ba5391cd68ed4f2415c827d2cecbcbc75ace10b) ([#13586](https://github.com/yt-dlp/yt-dlp/issues/13586)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **ci**: [Add signature tests](https://github.com/yt-dlp/yt-dlp/commit/1b883846347addeab12663fd74317fd544341a1c) ([#13582](https://github.com/yt-dlp/yt-dlp/issues/13582)) by [bashonly](https://github.com/bashonly)
- **cleanup**: Miscellaneous: [b018784](https://github.com/yt-dlp/yt-dlp/commit/b0187844988e557c7e1e6bb1aabd4c1176768d86) by [bashonly](https://github.com/bashonly)
### 2025.06.25
#### Extractor changes
- [Add `_search_nuxt_json` helper](https://github.com/yt-dlp/yt-dlp/commit/51887484e46ab6015c041cb1ab626a55f25a03bd) ([#13386](https://github.com/yt-dlp/yt-dlp/issues/13386)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- **brightcove**: new: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/e6bd4a3da295b760ab20b39c18ce8934d312c2bf) ([#13461](https://github.com/yt-dlp/yt-dlp/issues/13461)) by [doe1080](https://github.com/doe1080)
- **huya**: live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/2600849badb0d08c55b58dcc77a13af6ba423da6) ([#13520](https://github.com/yt-dlp/yt-dlp/issues/13520)) by [doe1080](https://github.com/doe1080)
- **hypergryph**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/1722c55400ff30bb5aee5dd7a262f0b7e9ce2f0e) ([#13415](https://github.com/yt-dlp/yt-dlp/issues/13415)) by [doe1080](https://github.com/doe1080), [eason1478](https://github.com/eason1478)
- **lsm**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/c57412d1f9cf0124adc972a47858ac42b740c61d) ([#13126](https://github.com/yt-dlp/yt-dlp/issues/13126)) by [Caesim404](https://github.com/Caesim404)
- **mave**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1838a1ce5d4ade80770ba9162eaffc9a1607dc70) ([#13380](https://github.com/yt-dlp/yt-dlp/issues/13380)) by [anlar](https://github.com/anlar)
- **sportdeutschland**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/a4ce4327c9836691d3b6b00e44a90b6741601ed8) ([#13519](https://github.com/yt-dlp/yt-dlp/issues/13519)) by [DTrombett](https://github.com/DTrombett)
- **sproutvideo**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/5b559d0072b7164daf06bacdc41c6f11283452c8) ([#13544](https://github.com/yt-dlp/yt-dlp/issues/13544)) by [bashonly](https://github.com/bashonly)
- **tv8.it**: [Support slugless URLs](https://github.com/yt-dlp/yt-dlp/commit/3bd30291601c47fa4a257983473884103ecab0c7) ([#13478](https://github.com/yt-dlp/yt-dlp/issues/13478)) by [DTrombett](https://github.com/DTrombett)
- **youtube**
- [Check any `ios` m3u8 formats prior to download](https://github.com/yt-dlp/yt-dlp/commit/8f94b76cbf7bbd9dfd8762c63cdea04f90f1297f) ([#13524](https://github.com/yt-dlp/yt-dlp/issues/13524)) by [bashonly](https://github.com/bashonly)
- [Improve player context payloads](https://github.com/yt-dlp/yt-dlp/commit/ff6f94041aeee19c5559e1c1cd693960a1c1dd14) ([#13539](https://github.com/yt-dlp/yt-dlp/issues/13539)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **test**: `traversal`: [Fix morsel tests for Python 3.14](https://github.com/yt-dlp/yt-dlp/commit/73bf10211668e4a59ccafd790e06ee82d9fea9ea) ([#13471](https://github.com/yt-dlp/yt-dlp/issues/13471)) by [Grub4K](https://github.com/Grub4K)
### 2025.06.09
#### Extractor changes

View File

@ -1156,15 +1156,15 @@ You can configure yt-dlp by placing any supported command line option in a confi
* `/etc/yt-dlp/config`
* `/etc/yt-dlp/config.txt`
E.g. with the following configuration file, yt-dlp will always extract the audio, not copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
E.g. with the following configuration file, yt-dlp will always extract the audio, copy the mtime, use a proxy and save all videos under `YouTube` directory in your home directory:
```
# Lines starting with # are comments
# Always extract audio
-x
# Do not copy the mtime
--no-mtime
# Copy the mtime
--mtime
# Use this proxy
--proxy 127.0.0.1:3128
@ -1799,6 +1799,7 @@ The following extractors use this feature:
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv`, `tv_simply` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `web_embedded` client is added for age-restricted videos but only works if the video is embeddable. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player), `initial_data` (skip initial data/next ep request). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause issues such as missing formats or metadata. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) and [#12826](https://github.com/yt-dlp/yt-dlp/issues/12826) for more details
* `webpage_skip`: Skip extraction of embedded webpage data. One or both of `player_response`, `initial_data`. These options are for testing purposes and don't skip any network requests
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1900,6 +1901,10 @@ The following extractors use this feature:
#### tver
* `backend`: Backend API to use for extraction - one of `streaks` (default) or `brightcove` (deprecated)
#### vimeo
* `client`: Client to extract video data from. One of `android` (default), `ios` or `web`. The `ios` client only works with previously cached OAuth tokens. The `web` client only works when authenticated with credentials or account cookies
* `original_format_policy`: Policy for when to try extracting original formats. One of `always`, `never`, or `auto`. The default `auto` policy tries to avoid exceeding the API rate-limit by only making an extra request when Vimeo publicizes the video's downloadability
**Note**: These options may be changed/removed in the future without concern for backward compatibility
<!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE -->
@ -2262,6 +2267,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* yt-dlp uses modern http client backends such as `requests`. Use `--compat-options prefer-legacy-http-handler` to prefer the legacy http handler (`urllib`) to be used for standard http requests.
* The sub-modules `swfinterp`, `casefold` are removed.
* Passing `--simulate` (or calling `extract_info` with `download=False`) no longer alters the default format selection. See [#9843](https://github.com/yt-dlp/yt-dlp/issues/9843) for details.
* yt-dlp no longer applies the server modified time to downloaded files by default. Use `--mtime` or `--compat-options mtime-by-default` to revert this.
For ease of use, a few more compat options are available:
@ -2271,7 +2277,7 @@ For ease of use, a few more compat options are available:
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization`
* `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx`
* `--compat-options 2023`: Same as `--compat-options 2024,prefer-vp9-sort`
* `--compat-options 2024`: Currently does nothing. Use this to enable all future compat options
* `--compat-options 2024`: Same as `--compat-options mtime-by-default`. Use this to enable all future compat options
The following compat options restore vulnerable behavior from before security patches:

View File

@ -10,9 +10,13 @@ __yt_dlp()
diropts="--cache-dir"
if [[ ${prev} =~ ${fileopts} ]]; then
local IFS=$'\n'
type compopt &>/dev/null && compopt -o filenames
COMPREPLY=( $(compgen -f -- ${cur}) )
return 0
elif [[ ${prev} =~ ${diropts} ]]; then
local IFS=$'\n'
type compopt &>/dev/null && compopt -o dirnames
COMPREPLY=( $(compgen -d -- ${cur}) )
return 0
fi

View File

@ -254,5 +254,13 @@
{
"action": "remove",
"when": "d596824c2f8428362c072518856065070616e348"
},
{
"action": "remove",
"when": "7b81634fb1d15999757e7a9883daa6ef09ea785b"
},
{
"action": "remove",
"when": "500761e41acb96953a5064e951d41d190c287e46"
}
]

View File

@ -75,7 +75,7 @@ dev = [
]
static-analysis = [
"autopep8~=2.0",
"ruff~=0.11.0",
"ruff~=0.12.0",
]
test = [
"pytest~=8.1",
@ -210,10 +210,12 @@ ignore = [
"TD001", # invalid-todo-tag
"TD002", # missing-todo-author
"TD003", # missing-todo-link
"PLC0415", # import-outside-top-level
"PLE0604", # invalid-all-object (false positives)
"PLE0643", # potential-index-error (false positives)
"PLW0603", # global-statement
"PLW1510", # subprocess-run-without-check
"PLW1641", # eq-without-hash
"PLW2901", # redefined-loop-name
"RUF001", # ambiguous-unicode-character-string
"RUF012", # mutable-class-default

View File

@ -575,9 +575,7 @@ The only reliable way to check if a site is supported is to try it.
- **HollywoodReporterPlaylist**
- **Holodex**
- **HotNewHipHop**: (**Currently broken**)
- **hotstar**
- **hotstar:playlist**
- **hotstar:season**
- **hotstar**: JioHotstar
- **hotstar:series**
- **hrfernsehen**
- **HRTi**: [*hrti*](## "netrc machine")
@ -590,7 +588,7 @@ The only reliable way to check if a site is supported is to try it.
- **Hungama**
- **HungamaAlbumPlaylist**
- **HungamaSong**
- **huya:live**: huya.com
- **huya:live**: 虎牙直播
- **huya:video**: 虎牙视频
- **Hypem**
- **Hytale**
@ -647,8 +645,6 @@ The only reliable way to check if a site is supported is to try it.
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo**: (**Currently broken**)
- **jiocinema**: [*jiocinema*](## "netrc machine")
- **jiocinema:series**: [*jiocinema*](## "netrc machine")
- **jiosaavn:album**
- **jiosaavn:artist**
- **jiosaavn:playlist**
@ -776,6 +772,7 @@ The only reliable way to check if a site is supported is to try it.
- **massengeschmack.tv**
- **Masters**
- **MatchTV**
- **Mave**
- **MBN**: mbn.co.kr (매일방송)
- **MDR**: MDR.DE
- **MedalTV**
@ -832,7 +829,7 @@ The only reliable way to check if a site is supported is to try it.
- **Mojevideo**: mojevideo.sk
- **Mojvideo**
- **Monstercat**
- **MonsterSirenHypergryphMusic**
- **monstersiren**: 塞壬唱片
- **Motherless**
- **MotherlessGallery**
- **MotherlessGroup**
@ -1298,6 +1295,7 @@ The only reliable way to check if a site is supported is to try it.
- **SampleFocus**
- **Sangiin**: 参議院インターネット審議中継 (archive)
- **Sapo**: SAPO Vídeos
- **SaucePlus**: Sauce+
- **SBS**: sbs.com.au
- **sbs.co.kr**
- **sbs.co.kr:allvod_program**

View File

@ -36,6 +36,18 @@ class InfoExtractorTestRequestHandler(http.server.BaseHTTPRequestHandler):
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(TEAPOT_RESPONSE_BODY.encode())
elif self.path == '/fake.m3u8':
self.send_response(200)
self.send_header('Content-Length', '1024')
self.end_headers()
self.wfile.write(1024 * b'\x00')
elif self.path == '/bipbop.m3u8':
with open('test/testdata/m3u8/bipbop_16x9.m3u8', 'rb') as f:
data = f.read()
self.send_response(200)
self.send_header('Content-Length', str(len(data)))
self.end_headers()
self.wfile.write(data)
else:
assert False
@ -1947,6 +1959,208 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
with self.assertWarns(DeprecationWarning):
self.assertEqual(self.ie._search_nextjs_data('', None, default='{}'), {})
def test_search_nextjs_v13_data(self):
HTML = R'''
<script>(self.__next_f=self.__next_f||[]).push([0])</script>
<script>self.__next_f.push([2,"0:[\"$\",\"$L0\",null,{\"do_not_add_this\":\"fail\"}]\n"])</script>
<script>self.__next_f.push([1,"1:I[46975,[],\"HTTPAccessFallbackBoundary\"]\n2:I[32630,[\"8183\",\"static/chunks/8183-768193f6a9e33cdd.js\"]]\n"])</script>
<script nonce="abc123">self.__next_f.push([1,"e:[false,[\"$\",\"div\",null,{\"children\":[\"$\",\"$L18\",null,{\"foo\":\"bar\"}]}],false]\n "])</script>
<script>self.__next_f.push([1,"2a:[[\"$\",\"div\",null,{\"className\":\"flex flex-col\",\"children\":[]}],[\"$\",\"$L16\",null,{\"meta\":{\"dateCreated\":1730489700,\"uuid\":\"40cac41d-8d29-4ef5-aa11-75047b9f0907\"}}]]\n"])</script>
<script>self.__next_f.push([1,"df:[\"$undefined\",[\"$\",\"div\",null,{\"children\":[\"$\",\"$L17\",null,{}],\"do_not_include_this_field\":\"fail\"}],[\"$\",\"div\",null,{\"children\":[[\"$\",\"$L19\",null,{\"duplicated_field_name\":{\"x\":1}}],[\"$\",\"$L20\",null,{\"duplicated_field_name\":{\"y\":2}}]]}],\"$undefined\"]\n"])</script>
<script>self.__next_f.push([3,"MzM6WyIkIiwiJEwzMiIsbnVsbCx7ImRlY29kZWQiOiJzdWNjZXNzIn1d"])</script>
'''
EXPECTED = {
'18': {
'foo': 'bar',
},
'16': {
'meta': {
'dateCreated': 1730489700,
'uuid': '40cac41d-8d29-4ef5-aa11-75047b9f0907',
},
},
'19': {
'duplicated_field_name': {'x': 1},
},
'20': {
'duplicated_field_name': {'y': 2},
},
}
self.assertEqual(self.ie._search_nextjs_v13_data(HTML, None), EXPECTED)
self.assertEqual(self.ie._search_nextjs_v13_data('', None, fatal=False), {})
self.assertEqual(self.ie._search_nextjs_v13_data(None, None, fatal=False), {})
def test_search_nuxt_json(self):
HTML_TMPL = '<script data-ssr="true" id="__NUXT_DATA__" type="application/json">[{}]</script>'
VALID_DATA = '''
["ShallowReactive",1],
{"data":2,"state":21,"once":25,"_errors":28,"_server_errors":30},
["ShallowReactive",3],
{"$abcdef123456":4},
{"podcast":5,"activeEpisodeData":7},
{"podcast":6,"seasons":14},
{"title":10,"id":11},
["Reactive",8],
{"episode":9,"creators":18,"empty_list":20},
{"title":12,"id":13,"refs":34,"empty_refs":35},
"Series Title",
"podcast-id-01",
"Episode Title",
"episode-id-99",
[15,16,17],
1,
2,
3,
[19],
"Podcast Creator",
[],
{"$ssite-config":22},
{"env":23,"name":24,"map":26,"numbers":14},
"production",
"podcast-website",
["Set"],
["Reactive",27],
["Map"],
["ShallowReactive",29],
{},
["NuxtError",31],
{"status":32,"message":33},
503,
"Service Unavailable",
[36,37],
[38,39],
["Ref",40],
["ShallowRef",41],
["EmptyRef",42],
["EmptyShallowRef",43],
"ref",
"shallow_ref",
"{\\"ref\\":1}",
"{\\"shallow_ref\\":2}"
'''
PAYLOAD = {
'data': {
'$abcdef123456': {
'podcast': {
'podcast': {
'title': 'Series Title',
'id': 'podcast-id-01',
},
'seasons': [1, 2, 3],
},
'activeEpisodeData': {
'episode': {
'title': 'Episode Title',
'id': 'episode-id-99',
'refs': ['ref', 'shallow_ref'],
'empty_refs': [{'ref': 1}, {'shallow_ref': 2}],
},
'creators': ['Podcast Creator'],
'empty_list': [],
},
},
},
'state': {
'$ssite-config': {
'env': 'production',
'name': 'podcast-website',
'map': [],
'numbers': [1, 2, 3],
},
},
'once': [],
'_errors': {},
'_server_errors': {
'status': 503,
'message': 'Service Unavailable',
},
}
PARTIALLY_INVALID = [(
'''
{"data":1},
{"invalid_raw_list":2},
[15,16,17]
''',
{'data': {'invalid_raw_list': [None, None, None]}},
), (
'''
{"data":1},
["EmptyRef",2],
"not valid JSON"
''',
{'data': None},
), (
'''
{"data":1},
["EmptyShallowRef",2],
"not valid JSON"
''',
{'data': None},
)]
INVALID = [
'''
[]
''',
'''
["unsupported",1],
{"data":2},
{}
''',
]
DEFAULT = object()
self.assertEqual(self.ie._search_nuxt_json(HTML_TMPL.format(VALID_DATA), None), PAYLOAD)
self.assertEqual(self.ie._search_nuxt_json('', None, fatal=False), {})
self.assertIs(self.ie._search_nuxt_json('', None, default=DEFAULT), DEFAULT)
for data, expected in PARTIALLY_INVALID:
self.assertEqual(
self.ie._search_nuxt_json(HTML_TMPL.format(data), None, fatal=False), expected)
for data in INVALID:
self.assertIs(
self.ie._search_nuxt_json(HTML_TMPL.format(data), None, default=DEFAULT), DEFAULT)
class TestInfoExtractorNetwork(unittest.TestCase):
def setUp(self, /):
self.httpd = http.server.HTTPServer(
('127.0.0.1', 0), InfoExtractorTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
self.called = False
def require_warning(*args, **kwargs):
self.called = True
self.ydl = FakeYDL()
self.ydl.report_warning = require_warning
self.ie = DummyIE(self.ydl)
def tearDown(self, /):
self.ydl.close()
self.httpd.shutdown()
self.httpd.server_close()
self.server_thread.join(1)
def test_extract_m3u8_formats(self):
formats, subtitles = self.ie._extract_m3u8_formats_and_subtitles(
f'http://127.0.0.1:{self.port}/bipbop.m3u8', None, fatal=False)
self.assertFalse(self.called)
self.assertTrue(formats)
self.assertTrue(subtitles)
def test_extract_m3u8_formats_warning(self):
formats, subtitles = self.ie._extract_m3u8_formats_and_subtitles(
f'http://127.0.0.1:{self.port}/fake.m3u8', None, fatal=False)
self.assertTrue(self.called, 'Warning was not issued for binary m3u8 file')
self.assertFalse(formats)
self.assertFalse(subtitles)
if __name__ == '__main__':
unittest.main()

235
test/test_devalue.py Normal file
View File

@ -0,0 +1,235 @@
#!/usr/bin/env python3
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import datetime as dt
import json
import math
import re
import unittest
from yt_dlp.utils.jslib import devalue
TEST_CASES_EQUALS = [{
'name': 'int',
'unparsed': [-42],
'parsed': -42,
}, {
'name': 'str',
'unparsed': ['woo!!!'],
'parsed': 'woo!!!',
}, {
'name': 'Number',
'unparsed': [['Object', 42]],
'parsed': 42,
}, {
'name': 'String',
'unparsed': [['Object', 'yar']],
'parsed': 'yar',
}, {
'name': 'Infinity',
'unparsed': -4,
'parsed': math.inf,
}, {
'name': 'negative Infinity',
'unparsed': -5,
'parsed': -math.inf,
}, {
'name': 'negative zero',
'unparsed': -6,
'parsed': -0.0,
}, {
'name': 'RegExp',
'unparsed': [['RegExp', 'regexp', 'gim']], # XXX: flags are ignored
'parsed': re.compile('regexp'),
}, {
'name': 'Date',
'unparsed': [['Date', '2001-09-09T01:46:40.000Z']],
'parsed': dt.datetime.fromtimestamp(1e9, tz=dt.timezone.utc),
}, {
'name': 'Array',
'unparsed': [[1, 2, 3], 'a', 'b', 'c'],
'parsed': ['a', 'b', 'c'],
}, {
'name': 'Array (empty)',
'unparsed': [[]],
'parsed': [],
}, {
'name': 'Array (sparse)',
'unparsed': [[-2, 1, -2], 'b'],
'parsed': [None, 'b', None],
}, {
'name': 'Object',
'unparsed': [{'foo': 1, 'x-y': 2}, 'bar', 'z'],
'parsed': {'foo': 'bar', 'x-y': 'z'},
}, {
'name': 'Set',
'unparsed': [['Set', 1, 2, 3], 1, 2, 3],
'parsed': [1, 2, 3],
}, {
'name': 'Map',
'unparsed': [['Map', 1, 2], 'a', 'b'],
'parsed': [['a', 'b']],
}, {
'name': 'BigInt',
'unparsed': [['BigInt', '1']],
'parsed': 1,
}, {
'name': 'Uint8Array',
'unparsed': [['Uint8Array', 'AQID']],
'parsed': [1, 2, 3],
}, {
'name': 'ArrayBuffer',
'unparsed': [['ArrayBuffer', 'AQID']],
'parsed': [1, 2, 3],
}, {
'name': 'str (repetition)',
'unparsed': [[1, 1], 'a string'],
'parsed': ['a string', 'a string'],
}, {
'name': 'None (repetition)',
'unparsed': [[1, 1], None],
'parsed': [None, None],
}, {
'name': 'dict (repetition)',
'unparsed': [[1, 1], {}],
'parsed': [{}, {}],
}, {
'name': 'Object without prototype',
'unparsed': [['null']],
'parsed': {},
}, {
'name': 'cross-realm POJO',
'unparsed': [{}],
'parsed': {},
}]
TEST_CASES_IS = [{
'name': 'bool',
'unparsed': [True],
'parsed': True,
}, {
'name': 'Boolean',
'unparsed': [['Object', False]],
'parsed': False,
}, {
'name': 'undefined',
'unparsed': -1,
'parsed': None,
}, {
'name': 'null',
'unparsed': [None],
'parsed': None,
}, {
'name': 'NaN',
'unparsed': -3,
'parsed': math.nan,
}]
TEST_CASES_INVALID = [{
'name': 'empty string',
'unparsed': '',
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'hole',
'unparsed': -2,
'error': ValueError,
'pattern': r'invalid integer input',
}, {
'name': 'string',
'unparsed': 'hello',
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'number',
'unparsed': 42,
'error': ValueError,
'pattern': r'invalid integer input',
}, {
'name': 'boolean',
'unparsed': True,
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'null',
'unparsed': None,
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'object',
'unparsed': {},
'error': ValueError,
'pattern': r'expected int or list as input',
}, {
'name': 'empty array',
'unparsed': [],
'error': ValueError,
'pattern': r'expected a non-empty list as input',
}, {
'name': 'Python negative indexing',
'unparsed': [[1, 2, 3, 4, 5, 6, 7, -7], 1, 2, 3, 4, 5, 6, 7],
'error': IndexError,
'pattern': r'invalid index: -7',
}]
class TestDevalue(unittest.TestCase):
def test_devalue_parse_equals(self):
for tc in TEST_CASES_EQUALS:
self.assertEqual(devalue.parse(tc['unparsed']), tc['parsed'], tc['name'])
def test_devalue_parse_is(self):
for tc in TEST_CASES_IS:
self.assertIs(devalue.parse(tc['unparsed']), tc['parsed'], tc['name'])
def test_devalue_parse_invalid(self):
for tc in TEST_CASES_INVALID:
with self.assertRaisesRegex(tc['error'], tc['pattern'], msg=tc['name']):
devalue.parse(tc['unparsed'])
def test_devalue_parse_cyclical(self):
name = 'Map (cyclical)'
result = devalue.parse([['Map', 1, 0], 'self'])
self.assertEqual(result[0][0], 'self', name)
self.assertIs(result, result[0][1], name)
name = 'Set (cyclical)'
result = devalue.parse([['Set', 0, 1], 42])
self.assertEqual(result[1], 42, name)
self.assertIs(result, result[0], name)
result = devalue.parse([[0]])
self.assertIs(result, result[0], 'Array (cyclical)')
name = 'Object (cyclical)'
result = devalue.parse([{'self': 0}])
self.assertIs(result, result['self'], name)
name = 'Object with null prototype (cyclical)'
result = devalue.parse([['null', 'self', 0]])
self.assertIs(result, result['self'], name)
name = 'Objects (cyclical)'
result = devalue.parse([[1, 2], {'second': 2}, {'first': 1}])
self.assertIs(result[0], result[1]['first'], name)
self.assertIs(result[1], result[0]['second'], name)
def test_devalue_parse_revivers(self):
self.assertEqual(
devalue.parse([['indirect', 1], {'a': 2}, 'b'], revivers={'indirect': lambda x: x}),
{'a': 'b'}, 'revivers (indirect)')
self.assertEqual(
devalue.parse([['parse', 1], '{"a":0}'], revivers={'parse': lambda x: json.loads(x)}),
{'a': 0}, 'revivers (parse)')
if __name__ == '__main__':
unittest.main()

View File

@ -14,6 +14,7 @@ import json
from test.helper import (
assertGreaterEqual,
assertLessEqual,
expect_info_dict,
expect_warnings,
get_params,
@ -121,10 +122,13 @@ def generator(test_case, tname):
params = get_params(test_case.get('params', {}))
params['outtmpl'] = tname + '_' + params['outtmpl']
if is_playlist and 'playlist' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
params.setdefault('playlistend', test_case.get(
'playlist_mincount', test_case.get('playlist_count', -2) + 1))
params.setdefault('playlistend', max(
test_case.get('playlist_mincount', -1),
test_case.get('playlist_count', -2) + 1,
test_case.get('playlist_maxcount', -2) + 1))
params.setdefault('skip_download', True)
if 'playlist_duration_sum' not in test_case:
params.setdefault('extract_flat', 'in_playlist')
ydl = YoutubeDL(params, auto_init=False)
ydl.add_default_info_extractors()
@ -159,6 +163,7 @@ def generator(test_case, tname):
try_rm(os.path.splitext(tc_filename)[0] + '.info.json')
try_rm_tcs_files()
try:
test_url = test_case['url']
try_num = 1
while True:
try:
@ -166,7 +171,7 @@ def generator(test_case, tname):
# for outside error handling, and returns the exit code
# instead of the result dict.
res_dict = ydl.extract_info(
test_case['url'],
test_url,
force_generic_extractor=params.get('force_generic_extractor', False))
except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one
@ -194,23 +199,23 @@ def generator(test_case, tname):
self.assertTrue('entries' in res_dict)
expect_info_dict(self, res_dict, test_case.get('info_dict', {}))
num_entries = len(res_dict.get('entries', []))
if 'playlist_mincount' in test_case:
mincount = test_case['playlist_mincount']
assertGreaterEqual(
self,
len(res_dict['entries']),
test_case['playlist_mincount'],
'Expected at least %d in playlist %s, but got only %d' % (
test_case['playlist_mincount'], test_case['url'],
len(res_dict['entries'])))
self, num_entries, mincount,
f'Expected at least {mincount} entries in playlist {test_url}, but got only {num_entries}')
if 'playlist_count' in test_case:
count = test_case['playlist_count']
got = num_entries if num_entries <= count else 'more'
self.assertEqual(
len(res_dict['entries']),
test_case['playlist_count'],
'Expected %d entries in playlist %s, but got %d.' % (
test_case['playlist_count'],
test_case['url'],
len(res_dict['entries']),
))
num_entries, count,
f'Expected exactly {count} entries in playlist {test_url}, but got {got}')
if 'playlist_maxcount' in test_case:
maxcount = test_case['playlist_maxcount']
assertLessEqual(
self, num_entries, maxcount,
f'Expected at most {maxcount} entries in playlist {test_url}, but got more')
if 'playlist_duration_sum' in test_case:
got_duration = sum(e['duration'] for e in res_dict['entries'])
self.assertEqual(

View File

@ -478,6 +478,10 @@ class TestJSInterpreter(unittest.TestCase):
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111)
def test_extract_object(self):
jsi = JSInterpreter('var a={};a.xy={};var xy;var zxy={};xy={z:function(){return "abc"}};')
self.assertTrue('z' in jsi.extract_object('xy', None))
def test_increment_decrement(self):
self._test('function f() { var x = 1; return ++x; }', 2)
self._test('function f() { var x = 1; return x++; }', 1)
@ -486,6 +490,57 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f() { var a = "test--"; return a; }', 'test--')
self._test('function f() { var b = 1; var a = "b--"; return a; }', 'b--')
def test_nested_function_scoping(self):
self._test(R'''
function f() {
var g = function() {
var P = 2;
return P;
};
var P = 1;
g();
return P;
}
''', 1)
self._test(R'''
function f() {
var x = function() {
for (var w = 1, M = []; w < 2; w++) switch (w) {
case 1:
M.push("a");
case 2:
M.push("b");
}
return M
};
var w = "c";
var M = "d";
var y = x();
y.push(w);
y.push(M);
return y;
}
''', ['a', 'b', 'c', 'd'])
self._test(R'''
function f() {
var P, Q;
var z = 100;
var g = function() {
var P, Q; P = 2; Q = 15;
z = 0;
return P+Q;
};
P = 1; Q = 10;
var x = g(), y = 3;
return P+Q+x+y+z;
}
''', 31)
def test_undefined_varnames(self):
jsi = JSInterpreter('function f(){ var a; return [a, b]; }')
self._test(jsi, [JS_Undefined, JS_Undefined])
self.assertEqual(jsi._undefined_varnames, {'b'})
if __name__ == '__main__':
unittest.main()

View File

@ -22,7 +22,6 @@ import ssl
import tempfile
import threading
import time
import urllib.error
import urllib.request
import warnings
import zlib
@ -223,10 +222,7 @@ class HTTPTestRequestHandler(http.server.BaseHTTPRequestHandler):
if encoding == 'br' and brotli:
payload = brotli.compress(payload)
elif encoding == 'gzip':
buf = io.BytesIO()
with gzip.GzipFile(fileobj=buf, mode='wb') as f:
f.write(payload)
payload = buf.getvalue()
payload = gzip.compress(payload, mtime=0)
elif encoding == 'deflate':
payload = zlib.compress(payload)
elif encoding == 'unsupported':
@ -729,6 +725,17 @@ class TestHTTPRequestHandler(TestRequestHandlerBase):
assert 'X-test-heaDer: test' in res
def test_partial_read_then_full_read(self, handler):
with handler() as rh:
for encoding in ('', 'gzip', 'deflate'):
res = validate_and_send(rh, Request(
f'http://127.0.0.1:{self.http_port}/content-encoding',
headers={'ytdl-encoding': encoding}))
assert res.headers.get('Content-Encoding') == encoding
assert res.read(6) == b'<html>'
assert res.read(0) == b''
assert res.read() == b'<video src="/vid.mp4" /></html>'
@pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
class TestClientCertificate:

View File

@ -416,18 +416,8 @@ class TestTraversal:
'`any` should allow further branching'
def test_traversal_morsel(self):
values = {
'expires': 'a',
'path': 'b',
'comment': 'c',
'domain': 'd',
'max-age': 'e',
'secure': 'f',
'httponly': 'g',
'version': 'h',
'samesite': 'i',
}
morsel = http.cookies.Morsel()
values = dict(zip(morsel, 'abcdefghijklmnop'))
morsel.set('item_key', 'item_value', 'coded_value')
morsel.update(values)
values['key'] = 'item_key'

View File

@ -133,6 +133,11 @@ _SIG_TESTS = [
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/e12fbea4/player_ias.vflset/en_US/base.js',
'gN7a-hudCuAuPH6fByOk1_GNXN0yNMHShjZXS2VOgsEItAJz0tipeavEOmNdYN-wUtcEqD3bCXjc0iyKfAyZxCBGgIARwsSdQfJ2CJtt',
'JC2JfQdSswRAIgGBCxZyAfKyi0cjXCb3DqEctUw-NYdNmOEvaepit0zJAtIEsgOV2SXZjhSHMNy0NXNG_1kOyBf6HPuAuCduh-a',
),
]
_NSIG_TESTS = [
@ -328,6 +333,50 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/fc2a56a5/tv-player-ias.vflset/tv-player-ias.js',
'qTKWg_Il804jd2kAC', 'OtUAm2W6gyzJjB9u',
),
(
'https://www.youtube.com/s/player/a74bf670/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'hQP7k1hA22OrNTnq',
),
(
'https://www.youtube.com/s/player/6275f73c/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '-I03XF0iyf6I_X0A',
),
(
'https://www.youtube.com/s/player/20c72c18/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '-I03XF0iyf6I_X0A',
),
(
'https://www.youtube.com/s/player/9fe2e06e/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '6r5ekNIiEMPutZy',
),
(
'https://www.youtube.com/s/player/680f8c75/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '0ml9caTwpa55Jf',
),
(
'https://www.youtube.com/s/player/14397202/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'ozZFAN21okDdJTa',
),
(
'https://www.youtube.com/s/player/5dcb2c1f/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'p7iTbRZDYAF',
),
(
'https://www.youtube.com/s/player/a10d7fcc/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '9Zue7DDHJSD',
),
(
'https://www.youtube.com/s/player/8e20cb06/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', '5-4tTneTROTpMzba',
),
(
'https://www.youtube.com/s/player/e12fbea4/player_ias_tce.vflset/en_US/base.js',
'kM5r52fugSZRAKHfo3', 'XkeRfXIPOkSwfg',
),
(
'https://www.youtube.com/s/player/ef259203/player_ias_tce.vflset/en_US/base.js',
'rPqBC01nJpqhhi2iA2U', 'hY7dbiKFT51UIA',
),
]

View File

@ -482,7 +482,8 @@ class YoutubeDL:
The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat,
format-sort, no-clean-infojson, no-playlist-metafiles,
no-keep-subs, no-attach-info-json, allow-unsafe-ext, prefer-vp9-sort.
no-keep-subs, no-attach-info-json, allow-unsafe-ext, prefer-vp9-sort,
mtime-by-default.
Refer __init__.py for their implementation
progress_template: Dictionary of templates for progress outputs.
Allowed keys are 'download', 'postprocess',
@ -2194,7 +2195,7 @@ class YoutubeDL:
return op(actual_value, comparison_value)
return _filter
def _check_formats(self, formats):
def _check_formats(self, formats, warning=True):
for f in formats:
working = f.get('__working')
if working is not None:
@ -2207,6 +2208,9 @@ class YoutubeDL:
continue
temp_file = tempfile.NamedTemporaryFile(suffix='.tmp', delete=False, dir=path or None)
temp_file.close()
# If FragmentFD fails when testing a fragment, it will wrongly set a non-zero return code.
# Save the actual return code for later. See https://github.com/yt-dlp/yt-dlp/issues/13750
original_retcode = self._download_retcode
try:
success, _ = self.dl(temp_file.name, f, test=True)
except (DownloadError, OSError, ValueError, *network_exceptions):
@ -2217,11 +2221,18 @@ class YoutubeDL:
os.remove(temp_file.name)
except OSError:
self.report_warning(f'Unable to delete temporary file "{temp_file.name}"')
# Restore the actual return code
self._download_retcode = original_retcode
f['__working'] = success
if success:
f.pop('__needs_testing', None)
yield f
else:
self.to_screen('[info] Unable to download format {}. Skipping...'.format(f['format_id']))
msg = f'Unable to download format {f["format_id"]}. Skipping...'
if warning:
self.report_warning(msg)
else:
self.to_screen(f'[info] {msg}')
def _select_formats(self, formats, selector):
return list(selector({
@ -2947,7 +2958,7 @@ class YoutubeDL:
)
if self.params.get('check_formats') is True:
formats = LazyList(self._check_formats(formats[::-1]), reverse=True)
formats = LazyList(self._check_formats(formats[::-1], warning=False), reverse=True)
if not formats or formats[0] is not info_dict:
# only set the 'formats' fields if the original info_dict list them
@ -3963,6 +3974,7 @@ class YoutubeDL:
self._format_out('UNSUPPORTED', self.Styles.BAD_FORMAT) if f.get('ext') in ('f4f', 'f4m') else None,
(self._format_out('Maybe DRM', self.Styles.WARNING) if f.get('has_drm') == 'maybe'
else self._format_out('DRM', self.Styles.BAD_FORMAT) if f.get('has_drm') else None),
self._format_out('Untested', self.Styles.WARNING) if f.get('__needs_testing') else None,
format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
delim=', '), delim=' '),

View File

@ -159,6 +159,12 @@ def set_compat_opts(opts):
elif 'prefer-vp9-sort' in opts.compat_opts:
opts.format_sort.extend(FormatSorter._prefer_vp9_sort)
if 'mtime-by-default' in opts.compat_opts:
if opts.updatetime is None:
opts.updatetime = True
else:
_unused_compat_opt('mtime-by-default')
_video_multistreams_set = set_default_compat('multistreams', 'allow_multiple_video_streams', False, remove_compat=False)
_audio_multistreams_set = set_default_compat('multistreams', 'allow_multiple_audio_streams', False, remove_compat=False)
if _video_multistreams_set is False and _audio_multistreams_set is False:

View File

@ -435,7 +435,7 @@ def sub_bytes_inv(data):
def rotate(data):
return data[1:] + [data[0]]
return [*data[1:], data[0]]
def key_schedule_core(data, rcon_iteration):

View File

@ -302,7 +302,7 @@ class FragmentFD(FileDownloader):
elif to_file:
self.try_rename(ctx['tmpfilename'], ctx['filename'])
filetime = ctx.get('fragment_filetime')
if self.params.get('updatetime', True) and filetime:
if self.params.get('updatetime') and filetime:
with contextlib.suppress(Exception):
os.utime(ctx['filename'], (time.time(), filetime))

View File

@ -94,12 +94,19 @@ class HlsFD(FragmentFD):
can_download, message = self.can_download(s, info_dict, self.params.get('allow_unplayable_formats')), None
if can_download:
has_ffmpeg = FFmpegFD.available()
no_crypto = not Cryptodome.AES and '#EXT-X-KEY:METHOD=AES-128' in s
if no_crypto and has_ffmpeg:
can_download, message = False, 'The stream has AES-128 encryption and pycryptodomex is not available'
elif no_crypto:
message = ('The stream has AES-128 encryption and neither ffmpeg nor pycryptodomex are available; '
'Decryption will be performed natively, but will be extremely slow')
if not Cryptodome.AES and '#EXT-X-KEY:METHOD=AES-128' in s:
# Even if pycryptodomex isn't available, force HlsFD for m3u8s that won't work with ffmpeg
ffmpeg_can_dl = not traverse_obj(info_dict, ((
'extra_param_to_segment_url', 'extra_param_to_key_url',
'hls_media_playlist_data', ('hls_aes', ('uri', 'key', 'iv')),
), any))
message = 'The stream has AES-128 encryption and {} available'.format(
'neither ffmpeg nor pycryptodomex are' if ffmpeg_can_dl and not has_ffmpeg else
'pycryptodomex is not')
if has_ffmpeg and ffmpeg_can_dl:
can_download = False
else:
message += '; decryption will be performed natively, but will be extremely slow'
elif info_dict.get('extractor_key') == 'Generic' and re.search(r'(?m)#EXT-X-MEDIA-SEQUENCE:(?!0$)', s):
install_ffmpeg = '' if has_ffmpeg else 'install ffmpeg and '
message = ('Live HLS streams are not supported by the native downloader. If this is a livestream, '

View File

@ -348,7 +348,7 @@ class HttpFD(FileDownloader):
self.try_rename(ctx.tmpfilename, ctx.filename)
# Update file modification time
if self.params.get('updatetime', True):
if self.params.get('updatetime'):
info_dict['filetime'] = self.try_utime(ctx.filename, ctx.data.headers.get('last-modified', None))
self._hook_progress({

View File

@ -5,47 +5,46 @@ import time
from .common import FileDownloader
from .external import FFmpegFD
from ..networking import Request
from ..utils import DownloadError, str_or_none, try_get
from ..networking.websocket import WebSocketResponse
from ..utils import DownloadError, str_or_none, truncate_string
from ..utils.traversal import traverse_obj
class NiconicoLiveFD(FileDownloader):
""" Downloads niconico live without being stopped """
def real_download(self, filename, info_dict):
video_id = info_dict['video_id']
ws_url = info_dict['url']
ws_extractor = info_dict['ws']
ws_origin_host = info_dict['origin']
live_quality = info_dict.get('live_quality', 'high')
live_latency = info_dict.get('live_latency', 'high')
video_id = info_dict['id']
opts = info_dict['downloader_options']
quality, ws_extractor, ws_url = opts['max_quality'], opts['ws'], opts['ws_url']
dl = FFmpegFD(self.ydl, self.params or {})
new_info_dict = info_dict.copy()
new_info_dict.update({
'protocol': 'm3u8',
})
new_info_dict['protocol'] = 'm3u8'
def communicate_ws(reconnect):
if reconnect:
ws = self.ydl.urlopen(Request(ws_url, headers={'Origin': f'https://{ws_origin_host}'}))
# Support --load-info-json as if it is a reconnect attempt
if reconnect or not isinstance(ws_extractor, WebSocketResponse):
ws = self.ydl.urlopen(Request(
ws_url, headers={'Origin': 'https://live.nicovideo.jp'}))
if self.ydl.params.get('verbose', False):
self.to_screen('[debug] Sending startWatching request')
self.write_debug('Sending startWatching request')
ws.send(json.dumps({
'type': 'startWatching',
'data': {
'reconnect': True,
'room': {
'commentable': True,
'protocol': 'webSocket',
},
'stream': {
'quality': live_quality,
'protocol': 'hls+fmp4',
'latency': live_latency,
'accessRightMethod': 'single_cookie',
'chasePlay': False,
'latency': 'high',
'protocol': 'hls',
'quality': quality,
},
'room': {
'protocol': 'webSocket',
'commentable': True,
},
'reconnect': True,
},
'type': 'startWatching',
}))
else:
ws = ws_extractor
@ -58,7 +57,6 @@ class NiconicoLiveFD(FileDownloader):
if not data or not isinstance(data, dict):
continue
if data.get('type') == 'ping':
# pong back
ws.send(r'{"type":"pong"}')
ws.send(r'{"type":"keepSeat"}')
elif data.get('type') == 'disconnect':
@ -66,12 +64,10 @@ class NiconicoLiveFD(FileDownloader):
return True
elif data.get('type') == 'error':
self.write_debug(data)
message = try_get(data, lambda x: x['body']['code'], str) or recv
message = traverse_obj(data, ('body', 'code', {str_or_none}), default=recv)
return DownloadError(message)
elif self.ydl.params.get('verbose', False):
if len(recv) > 100:
recv = recv[:100] + '...'
self.to_screen(f'[debug] Server said: {recv}')
self.write_debug(f'Server response: {truncate_string(recv, 100)}')
def ws_main():
reconnect = False
@ -81,7 +77,8 @@ class NiconicoLiveFD(FileDownloader):
if ret is True:
return
except BaseException as e:
self.to_screen('[{}] {}: Connection error occured, reconnecting after 10 seconds: {}'.format('niconico:live', video_id, str_or_none(e)))
self.to_screen(
f'[niconico:live] {video_id}: Connection error occured, reconnecting after 10 seconds: {e}')
time.sleep(10)
continue
finally:

View File

@ -201,7 +201,6 @@ from .banbye import (
BanByeChannelIE,
BanByeIE,
)
from .bandaichannel import BandaiChannelIE
from .bandcamp import (
BandcampAlbumIE,
BandcampIE,
@ -229,7 +228,6 @@ from .beatbump import (
from .beatport import BeatportIE
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
from .berufetv import BerufeTVIE
from .bet import BetIE
from .bfi import BFIPlayerIE
@ -275,7 +273,10 @@ from .bitchute import (
BitChuteChannelIE,
BitChuteIE,
)
from .blackboardcollaborate import BlackboardCollaborateIE
from .blackboardcollaborate import (
BlackboardCollaborateIE,
BlackboardCollaborateLaunchIE,
)
from .bleacherreport import (
BleacherReportCMSIE,
BleacherReportIE,
@ -309,6 +310,7 @@ from .brilliantpala import (
BrilliantpalaClassesIE,
BrilliantpalaElearnIE,
)
from .btvplus import BTVPlusIE
from .bundesliga import BundesligaIE
from .bundestag import BundestagIE
from .bunnycdn import BunnyCdnIE
@ -446,7 +448,6 @@ from .cspan import (
CSpanIE,
)
from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import (
@ -805,9 +806,7 @@ from .holodex import HolodexIE
from .hotnewhiphop import HotNewHipHopIE
from .hotstar import (
HotStarIE,
HotStarPlaylistIE,
HotStarPrefixIE,
HotStarSeasonIE,
HotStarSeriesIE,
)
from .hrefli import HrefLiRedirectIE
@ -921,10 +920,6 @@ from .japandiet import (
ShugiinItvVodIE,
)
from .jeuxvideo import JeuxVideoIE
from .jiocinema import (
JioCinemaIE,
JioCinemaSeriesIE,
)
from .jiosaavn import (
JioSaavnAlbumIE,
JioSaavnArtistIE,
@ -934,7 +929,6 @@ from .jiosaavn import (
JioSaavnSongIE,
)
from .joj import JojIE
from .joqrag import JoqrAgIE
from .jove import JoveIE
from .jstream import JStreamIE
from .jtbc import (
@ -1037,11 +1031,6 @@ from .likee import (
LikeeIE,
LikeeUserIE,
)
from .limelight import (
LimelightChannelIE,
LimelightChannelListIE,
LimelightMediaIE,
)
from .linkedin import (
LinkedInEventsIE,
LinkedInIE,
@ -1107,6 +1096,7 @@ from .markiza import (
from .massengeschmacktv import MassengeschmackTVIE
from .masters import MastersIE
from .matchtv import MatchTVIE
from .mave import MaveIE
from .mbn import MBNIE
from .mdr import MDRIE
from .medaltv import MedalTVIE
@ -1152,6 +1142,7 @@ from .minds import (
MindsIE,
)
from .minoto import MinotoIE
from .mir24tv import Mir24TvIE
from .mirrativ import (
MirrativIE,
MirrativUserIE,
@ -1172,6 +1163,10 @@ from .mixcloud import (
MixcloudPlaylistIE,
MixcloudUserIE,
)
from .mixlr import (
MixlrIE,
MixlrRecoringIE,
)
from .mlb import (
MLBIE,
MLBTVIE,
@ -1382,7 +1377,6 @@ from .nobelprize import NobelPrizeIE
from .noice import NoicePodcastIE
from .nonktube import NonkTubeIE
from .noodlemagazine import NoodleMagazineIE
from .noovo import NoovoIE
from .nosnl import NOSNLArticleIE
from .nova import (
NovaEmbedIE,
@ -1563,6 +1557,7 @@ from .platzi import (
PlatziCourseIE,
PlatziIE,
)
from .playerfm import PlayerFmIE
from .playplustv import PlayPlusTVIE
from .playsuisse import PlaySuisseIE
from .playtvak import PlaytvakIE
@ -1829,6 +1824,7 @@ from .safari import (
from .saitosan import SaitosanIE
from .samplefocus import SampleFocusIE
from .sapo import SapoIE
from .sauceplus import SaucePlusIE
from .sbs import SBSIE
from .sbscokr import (
SBSCoKrAllvodProgramIE,
@ -2100,6 +2096,7 @@ from .theguardian import (
TheGuardianPodcastIE,
TheGuardianPodcastPlaylistIE,
)
from .thehighwire import TheHighWireIE
from .theholetv import TheHoleTvIE
from .theintercept import TheInterceptIE
from .theplatform import (
@ -2288,6 +2285,7 @@ from .uliza import (
)
from .umg import UMGDeIE
from .unistra import UnistraIE
from .unitednations import UnitedNationsWebTvIE
from .unity import UnityIE
from .unsupported import (
KnownDRMIE,

View File

@ -111,11 +111,9 @@ class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id>
shows/[^/]+/season-\d+/episode-\d+|
(?:
(?:movie|special)s/[^/]+|
(?:shows/[^/]+/)?videos
)/[^/?#&]+
shows/[^/?#]+/season-\d+/episode-\d+|
(?P<type>movie|special)s/[^/?#]+(?P<extra>/[^/?#]+)?|
(?:shows/[^/?#]+/)?videos/[^/?#]+
)'''
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
@ -128,7 +126,7 @@ class AENetworksIE(AENetworksBaseIE):
'upload_date': '20120529',
'uploader': 'AENE-NEW',
'duration': 2592.0,
'thumbnail': r're:^https?://.*\.jpe?g$',
'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:5',
'tags': 'count:14',
'categories': ['Mountain Men'],
@ -139,10 +137,7 @@ class AENetworksIE(AENetworksBaseIE):
'series': 'Mountain Men',
'age_limit': 0,
},
'params': {
# m3u8 download
'skip_download': True,
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
'skip': 'Geo-restricted - This content is not available in your location.',
}, {
@ -156,7 +151,7 @@ class AENetworksIE(AENetworksBaseIE):
'upload_date': '20160112',
'uploader': 'AENE-NEW',
'duration': 1277.695,
'thumbnail': r're:^https?://.*\.jpe?g$',
'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:4',
'tags': 'count:23',
'episode': 'Inlawful Entry',
@ -166,10 +161,53 @@ class AENetworksIE(AENetworksBaseIE):
'series': 'Duck Dynasty',
'age_limit': 0,
},
'params': {
# m3u8 download
'skip_download': True,
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
}, {
'url': 'https://play.mylifetime.com/movies/v-c-andrews-web-of-dreams',
'info_dict': {
'id': '1590627395981',
'ext': 'mp4',
'title': 'VC Andrews\' Web of Dreams',
'description': 'md5:2a8ba13ae64271c79eb65c0577d312ce',
'uploader': 'AENE-NEW',
'age_limit': 14,
'duration': 5253.665,
'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:8',
'tags': ['lifetime', 'mylifetime', 'lifetime channel', "VC Andrews' Web of Dreams"],
'series': '',
'season': 'Season 0',
'season_number': 0,
'episode': 'VC Andrews\' Web of Dreams',
'episode_number': 0,
'timestamp': 1566489703.0,
'upload_date': '20190822',
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
}, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story',
'info_dict': {
'id': '1488235587551',
'ext': 'mp4',
'title': 'Hunting JonBenet\'s Killer: The Untold Story',
'description': 'md5:209869425ee392d74fe29201821e48b4',
'uploader': 'AENE-NEW',
'age_limit': 14,
'duration': 5003.903,
'thumbnail': r're:https?://.+/.+\.jpg',
'chapters': 'count:10',
'tags': 'count:11',
'series': '',
'season': 'Season 0',
'season_number': 0,
'episode': 'Hunting JonBenet\'s Killer: The Untold Story',
'episode_number': 0,
'timestamp': 1554987697.0,
'upload_date': '20190411',
},
'params': {'skip_download': 'm3u8'},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
@ -198,7 +236,9 @@ class AENetworksIE(AENetworksBaseIE):
}]
def _real_extract(self, url):
domain, canonical = self._match_valid_url(url).groups()
domain, canonical, url_type, extra = self._match_valid_url(url).group('domain', 'id', 'type', 'extra')
if url_type in ('movie', 'special') and not extra:
canonical += f'/full-{url_type}'
return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)

View File

@ -16,6 +16,7 @@ from ..utils import (
dict_get,
extract_attributes,
get_element_by_id,
get_element_text_and_html_by_tag,
int_or_none,
join_nonempty,
js_to_json,
@ -72,6 +73,7 @@ class ArchiveOrgIE(InfoExtractor):
'display_id': 'Cops-v2.mp4',
'thumbnail': r're:https://archive\.org/download/.*\.jpg',
'duration': 1091.96,
'track': 'Cops-v2',
},
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
@ -86,6 +88,7 @@ class ArchiveOrgIE(InfoExtractor):
'thumbnail': r're:https://archive\.org/download/.*\.jpg',
'duration': 59.77,
'display_id': 'Commercial-JFK1960ElectionAdCampaignJingle.mpg',
'track': 'Commercial-JFK1960ElectionAdCampaignJingle',
},
}, {
'url': 'https://archive.org/details/Election_Ads/Commercial-Nixon1960ElectionAdToughonDefense.mpg',
@ -102,6 +105,7 @@ class ArchiveOrgIE(InfoExtractor):
'duration': 59.51,
'license': 'http://creativecommons.org/licenses/publicdomain/',
'thumbnail': r're:https://archive\.org/download/.*\.jpg',
'track': 'Commercial-Nixon1960ElectionAdToughonDefense',
},
}, {
'url': 'https://archive.org/details/gd1977-05-08.shure57.stevenson.29303.flac16',
@ -182,6 +186,7 @@ class ArchiveOrgIE(InfoExtractor):
'duration': 130.46,
'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel1_01_000117.jpg',
'display_id': 'irelandthemakingofarepublicreel1_01.mov',
'track': 'irelandthemakingofarepublicreel1 01',
},
}, {
'md5': '67335ee3b23a0da930841981c1e79b02',
@ -192,6 +197,7 @@ class ArchiveOrgIE(InfoExtractor):
'title': 'irelandthemakingofarepublicreel1_02.mov',
'display_id': 'irelandthemakingofarepublicreel1_02.mov',
'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel1_02_001374.jpg',
'track': 'irelandthemakingofarepublicreel1 02',
},
}, {
'md5': 'e470e86787893603f4a341a16c281eb5',
@ -202,6 +208,7 @@ class ArchiveOrgIE(InfoExtractor):
'title': 'irelandthemakingofarepublicreel2.mov',
'thumbnail': 'https://archive.org/download/irelandthemakingofarepublic/irelandthemakingofarepublic.thumbs/irelandthemakingofarepublicreel2_001554.jpg',
'display_id': 'irelandthemakingofarepublicreel2.mov',
'track': 'irelandthemakingofarepublicreel2',
},
},
],
@ -229,15 +236,8 @@ class ArchiveOrgIE(InfoExtractor):
@staticmethod
def _playlist_data(webpage):
element = re.findall(r'''(?xs)
<input
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s+class=['"]?js-play8-playlist['"]?
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
\s*/>
''', webpage)[0]
return json.loads(extract_attributes(element)['value'])
element = get_element_text_and_html_by_tag('play-av', webpage)[1]
return json.loads(extract_attributes(element)['playlist'])
def _real_extract(self, url):
video_id = urllib.parse.unquote_plus(self._match_id(url))

View File

@ -1,33 +0,0 @@
from .brightcove import BrightcoveNewBaseIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewBaseIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View File

@ -1,91 +0,0 @@
from .common import InfoExtractor
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn(?:bloomberg)?|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space|
etalk|
marilyn
)\.ca|
(?:much|cp24)\.com
)/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '3e5b8e38370741d5089da79161646635',
'info_dict': {
'id': '1403070',
'ext': 'flv',
'title': 'David Cockfield\'s Top Picks',
'description': 'md5:810f7f8c6a83ad5b48677c3f8e5bb2c3',
'upload_date': '20180525',
'timestamp': 1527288600,
'season_id': '73997',
'season': '2018',
'thumbnail': 'http://images2.9c9media.com/image_asset/2018_5_25_baf30cbd-b28d-4a18-9903-4bb8713b00f5_PNG_956x536.jpg',
'tags': [],
'categories': ['ETFs'],
'season_number': 8,
'duration': 272.038,
'series': 'Market Call Tonight',
},
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True,
}, {
'url': 'http://www.tsn.ca/video/expectations-high-for-milos-raonic-at-us-open~939549',
'only_matching': True,
}, {
'url': 'http://www.bnn.ca/video/berman-s-call-part-two-viewer-questions~939654',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}, {
'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
'etalk': 'ctv',
'bnnbloomberg': 'bnn',
'marilyn': 'ctv_marilyn',
}
def _real_extract(self, url):
domain, video_id = self._match_valid_url(url).groups()
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': f'9c9media:{self._DOMAINS.get(domain, domain)}_web:{video_id}',
'ie_key': 'NineCNineMedia',
}

View File

@ -900,7 +900,9 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
headers=headers))
geo_blocked = traverse_obj(play_info, (
'raw', 'data', 'plugins', lambda _, v: v['name'] == 'AreaLimitPanel', 'config', 'is_block', {bool}, any))
('result', ('raw', 'data')), 'plugins',
lambda _, v: v['name'] == 'AreaLimitPanel',
'config', 'is_block', {bool}, any))
premium_only = play_info.get('code') == -10403
video_info = traverse_obj(play_info, (('result', ('raw', 'data')), 'video_info', {dict}, any)) or {}
@ -914,7 +916,7 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
if traverse_obj(play_info, ((
('result', 'play_check', 'play_detail'), # 'PLAY_PREVIEW' vs 'PLAY_WHOLE'
('raw', 'data', 'play_video_type'), # 'preview' vs 'whole'
(('result', ('raw', 'data')), 'play_video_type'), # 'preview' vs 'whole' vs 'none'
), any, {lambda x: x in ('PLAY_PREVIEW', 'preview')})):
self.report_warning(
'Only preview format is available, '
@ -1226,6 +1228,26 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
'id': '313580179',
},
'playlist_mincount': 92,
}, {
# Hidden-mode collection
'url': 'https://space.bilibili.com/3669403/video',
'info_dict': {
'id': '3669403',
},
'playlist': [{
'info_dict': {
'_type': 'playlist',
'id': '3669403_3958082',
'title': '合集·直播回放',
'description': '',
'uploader': '月路Yuel',
'uploader_id': '3669403',
'timestamp': int,
'upload_date': str,
'thumbnail': str,
},
}],
'params': {'playlist_items': '7'},
}]
def _real_extract(self, url):
@ -1282,8 +1304,14 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
}
def get_entries(page_data):
for entry in traverse_obj(page_data, ('list', 'vlist')) or []:
yield self.url_result(f'https://www.bilibili.com/video/{entry["bvid"]}', BiliBiliIE, entry['bvid'])
for entry in traverse_obj(page_data, ('list', 'vlist', ..., {dict})):
if traverse_obj(entry, ('meta', 'attribute')) == 156:
# hidden-mode collection doesn't show its videos in uploads; extract as playlist instead
yield self.url_result(
f'https://space.bilibili.com/{entry["mid"]}/lists/{entry["meta"]["id"]}?type=season',
BilibiliCollectionListIE, f'{entry["mid"]}_{entry["meta"]["id"]}')
else:
yield self.url_result(f'https://www.bilibili.com/video/{entry["bvid"]}', BiliBiliIE, entry['bvid'])
metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries)
return self.playlist_result(paged_list, playlist_id)

View File

@ -1,16 +1,27 @@
from .common import InfoExtractor
from ..utils import parse_iso8601
from ..utils import (
UnsupportedError,
float_or_none,
int_or_none,
join_nonempty,
jwt_decode_hs256,
mimetype2ext,
parse_iso8601,
parse_qs,
url_or_none,
)
from ..utils.traversal import traverse_obj
class BlackboardCollaborateIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?P<region>[a-z-]+)\.bbcollab\.com/
(?P<region>[a-z]+)(?:-lti)?\.bbcollab\.com/
(?:
collab/ui/session/playback/load|
recording
)/
(?P<id>[^/]+)'''
(?P<id>[^/?#]+)'''
_TESTS = [
{
'url': 'https://us-lti.bbcollab.com/collab/ui/session/playback/load/0a633b6a88824deb8c918f470b22b256',
@ -19,9 +30,55 @@ class BlackboardCollaborateIE(InfoExtractor):
'id': '0a633b6a88824deb8c918f470b22b256',
'title': 'HESI A2 Information Session - Thursday, May 6, 2021 - recording_1',
'ext': 'mp4',
'duration': 1896000,
'timestamp': 1620331399,
'duration': 1896,
'timestamp': 1620333295,
'upload_date': '20210506',
'subtitles': {
'live_chat': 'mincount:1',
},
},
},
{
'url': 'https://eu.bbcollab.com/collab/ui/session/playback/load/4bde2dee104f40289a10f8e554270600',
'md5': '108db6a8f83dcb0c2a07793649581865',
'info_dict': {
'id': '4bde2dee104f40289a10f8e554270600',
'title': 'Meeting - Azerbaycanca erize formasi',
'ext': 'mp4',
'duration': 880,
'timestamp': 1671176868,
'upload_date': '20221216',
},
},
{
'url': 'https://eu.bbcollab.com/recording/f83be390ecff46c0bf7dccb9dddcf5f6',
'md5': 'e3b0b88ddf7847eae4b4c0e2d40b83a5',
'info_dict': {
'id': 'f83be390ecff46c0bf7dccb9dddcf5f6',
'title': 'Keynote lecture by Laura Carvalho - recording_1',
'ext': 'mp4',
'duration': 5506,
'timestamp': 1662721705,
'upload_date': '20220909',
'subtitles': {
'live_chat': 'mincount:1',
},
},
},
{
'url': 'https://eu.bbcollab.com/recording/c3e1e7c9e83d4cd9981c93c74888d496',
'md5': 'fdb2d8c43d66fbc0b0b74ef5e604eb1f',
'info_dict': {
'id': 'c3e1e7c9e83d4cd9981c93c74888d496',
'title': 'International Ally User Group - recording_18',
'ext': 'mp4',
'duration': 3479,
'timestamp': 1721919621,
'upload_date': '20240725',
'subtitles': {
'en': 'mincount:1',
'live_chat': 'mincount:1',
},
},
},
{
@ -42,22 +99,81 @@ class BlackboardCollaborateIE(InfoExtractor):
},
]
def _call_api(self, region, video_id, path=None, token=None, note=None, fatal=False):
# Ref: https://github.com/blackboard/BBDN-Collab-Postman-REST
return self._download_json(
join_nonempty(f'https://{region}.bbcollab.com/collab/api/csa/recordings', video_id, path, delim='/'),
video_id, note or 'Downloading JSON metadata', fatal=fatal,
headers={'Authorization': f'Bearer {token}'} if token else None)
def _real_extract(self, url):
mobj = self._match_valid_url(url)
region = mobj.group('region')
video_id = mobj.group('id')
info = self._download_json(
f'https://{region}.bbcollab.com/collab/api/csa/recordings/{video_id}/data', video_id)
duration = info.get('duration')
title = info['name']
upload_date = info.get('created')
streams = info['streams']
formats = [{'format_id': k, 'url': url} for k, url in streams.items()]
token = parse_qs(url).get('authToken', [None])[-1]
video_info = self._call_api(region, video_id, path='data/secure', token=token, note='Trying auth token')
if video_info:
video_extra = self._call_api(region, video_id, token=token, note='Retrieving extra attributes')
else:
video_info = self._call_api(region, video_id, path='data', note='Trying fallback', fatal=True)
video_extra = {}
formats = traverse_obj(video_info, ('extStreams', lambda _, v: url_or_none(v['streamUrl']), {
'url': 'streamUrl',
'ext': ('contentType', {mimetype2ext}),
'aspect_ratio': ('aspectRatio', {float_or_none}),
}))
if filesize := traverse_obj(video_extra, ('storageSize', {int_or_none})):
for fmt in formats:
fmt['filesize'] = filesize
subtitles = {}
for subs in traverse_obj(video_info, ('subtitles', lambda _, v: url_or_none(v['url']))):
subtitles.setdefault(subs.get('lang') or 'und', []).append({
'name': traverse_obj(subs, ('label', {str})),
'url': subs['url'],
})
for live_chat_url in traverse_obj(video_info, ('chats', ..., 'url', {url_or_none})):
subtitles.setdefault('live_chat', []).append({'url': live_chat_url})
return {
'duration': duration,
**traverse_obj(video_info, {
'title': ('name', {str}),
'timestamp': ('created', {parse_iso8601}),
'duration': ('duration', {int_or_none(scale=1000)}),
}),
'formats': formats,
'id': video_id,
'timestamp': parse_iso8601(upload_date),
'title': title,
'subtitles': subtitles,
}
class BlackboardCollaborateLaunchIE(InfoExtractor):
_VALID_URL = r'https?://[a-z]+\.bbcollab\.com/launch/(?P<id>[^/?#]+)'
_TESTS = [
{
'url': 'https://au.bbcollab.com/launch/eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJiYkNvbGxhYkFwaSIsInN1YiI6ImJiQ29sbGFiQXBpIiwiZXhwIjoxNzQwNDE2NDgzLCJpYXQiOjE3NDA0MTYxODMsInJlc291cmNlQWNjZXNzVGlja2V0Ijp7InJlc291cmNlSWQiOiI3MzI4YzRjZTNmM2U0ZTcwYmY3MTY3N2RkZTgzMzk2NSIsImNvbnN1bWVySWQiOiJhM2Q3NGM0Y2QyZGU0MGJmODFkMjFlODNlMmEzNzM5MCIsInR5cGUiOiJSRUNPUkRJTkciLCJyZXN0cmljdGlvbiI6eyJ0eXBlIjoiVElNRSIsImV4cGlyYXRpb25Ib3VycyI6MCwiZXhwaXJhdGlvbk1pbnV0ZXMiOjUsIm1heFJlcXVlc3RzIjotMX0sImRpc3Bvc2l0aW9uIjoiTEFVTkNIIiwibGF1bmNoVHlwZSI6bnVsbCwibGF1bmNoQ29tcG9uZW50IjpudWxsLCJsYXVuY2hQYXJhbUtleSI6bnVsbH19.xuELw4EafEwUMoYcCHidGn4Tw9O1QCbYHzYGJUl0kKk',
'only_matching': True,
},
{
'url': 'https://us.bbcollab.com/launch/eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJiYkNvbGxhYkFwaSIsInN1YiI6ImJiQ29sbGFiQXBpIiwiZXhwIjoxNjk0NDgxOTc3LCJpYXQiOjE2OTQ0ODE2NzcsInJlc291cmNlQWNjZXNzVGlja2V0Ijp7InJlc291cmNlSWQiOiI3YWU0MTFhNTU3NjU0OWFiOTZlYjVmMTM1YmY3MWU5MCIsImNvbnN1bWVySWQiOiJBRUU2MEI4MDI2QzM3ODU2RjMwMzNEN0ZEOTQzMTFFNSIsInR5cGUiOiJSRUNPUkRJTkciLCJyZXN0cmljdGlvbiI6eyJ0eXBlIjoiVElNRSIsImV4cGlyYXRpb25Ib3VycyI6MCwiZXhwaXJhdGlvbk1pbnV0ZXMiOjUsIm1heFJlcXVlc3RzIjotMX0sImRpc3Bvc2l0aW9uIjoiTEFVTkNIIiwibGF1bmNoVHlwZSI6bnVsbCwibGF1bmNoQ29tcG9uZW50IjpudWxsLCJsYXVuY2hQYXJhbUtleSI6bnVsbH19.yOhRZNaIjXYoMYMpcTzgjZJCnIFaYf2cAzbco8OAxlY',
'only_matching': True,
},
{
'url': 'https://eu.bbcollab.com/launch/eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJiYkNvbGxhYkFwaSIsInN1YiI6ImJiQ29sbGFiQXBpIiwiZXhwIjoxNzUyNjgyODYwLCJpYXQiOjE3NTI2ODI1NjAsInJlc291cmNlQWNjZXNzVGlja2V0Ijp7InJlc291cmNlSWQiOiI4MjQzYjFiODg2Nzk0NTZkYjkwN2NmNDZmZmE1MmFhZiIsImNvbnN1bWVySWQiOiI5ZTY4NzYwZWJiNzM0MzRiYWY3NTQyZjA1YmJkOTMzMCIsInR5cGUiOiJSRUNPUkRJTkciLCJyZXN0cmljdGlvbiI6eyJ0eXBlIjoiVElNRSIsImV4cGlyYXRpb25Ib3VycyI6MCwiZXhwaXJhdGlvbk1pbnV0ZXMiOjUsIm1heFJlcXVlc3RzIjotMX0sImRpc3Bvc2l0aW9uIjoiTEFVTkNIIiwibGF1bmNoVHlwZSI6bnVsbCwibGF1bmNoQ29tcG9uZW50IjpudWxsLCJsYXVuY2hQYXJhbUtleSI6bnVsbH19.Xj4ymojYLwZ1vKPKZ-KxjpqQvFXoJekjRaG0npngwWs',
'only_matching': True,
},
]
def _real_extract(self, url):
token = self._match_id(url)
video_id = jwt_decode_hs256(token)['resourceAccessTicket']['resourceId']
redirect_url = self._request_webpage(url, video_id).url
if self.suitable(redirect_url):
raise UnsupportedError(redirect_url)
return self.url_result(redirect_url, BlackboardCollaborateIE, video_id)

View File

@ -495,8 +495,6 @@ class BrightcoveLegacyIE(InfoExtractor):
class BrightcoveNewBaseIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip()
formats, subtitles = [], {}
sources = json_data.get('sources') or []
for source in sources:
@ -600,16 +598,18 @@ class BrightcoveNewBaseIE(AdobePassIE):
return {
'id': video_id,
'title': title,
'description': clean_html(json_data.get('description')),
'thumbnails': thumbnails,
'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': json_data.get('account_id'),
'formats': formats,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
'is_live': is_live,
**traverse_obj(json_data, {
'title': ('name', {clean_html}),
'description': ('description', {clean_html}),
'tags': ('tags', ..., {str}, filter, all, filter),
'timestamp': ('published_at', {parse_iso8601}),
'uploader_id': ('account_id', {str}),
}),
}
@ -645,10 +645,7 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
'uploader_id': '4036320279001',
'formats': 'mincount:39',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': '404 Not Found',
}, {
# playlist stream
'url': 'https://players.brightcove.net/1752604059001/S13cJdUBz_default/index.html?playlistId=5718313430001',
@ -709,7 +706,6 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
'ext': 'mp4',
'title': 'TGD_01-032_5',
'thumbnail': r're:^https?://.*\.jpg$',
'tags': [],
'timestamp': 1646078943,
'uploader_id': '1569565978001',
'upload_date': '20220228',
@ -721,7 +717,6 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
'ext': 'mp4',
'title': 'TGD 01-087 (Airs 05.25.22)_Segment 5',
'thumbnail': r're:^https?://.*\.jpg$',
'tags': [],
'timestamp': 1651604591,
'uploader_id': '1569565978001',
'upload_date': '20220503',

View File

@ -0,0 +1,73 @@
from .common import InfoExtractor
from ..utils import (
bug_reports_message,
clean_html,
get_element_by_class,
js_to_json,
mimetype2ext,
strip_or_none,
url_or_none,
urljoin,
)
from ..utils.traversal import traverse_obj
class BTVPlusIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?btvplus\.bg/produkt/(?:predavaniya|seriali|novini)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://btvplus.bg/produkt/predavaniya/67271/btv-reporterite/btv-reporterite-12-07-2025-g',
'info_dict': {
'ext': 'mp4',
'id': '67271',
'title': 'bTV Репортерите - 12.07.2025 г.',
'thumbnail': 'https://cdn.btv.bg/media/images/940x529/Jul2025/2113606319.jpg',
},
}, {
'url': 'https://btvplus.bg/produkt/seriali/66942/sezon-2/plen-sezon-2-epizod-55',
'info_dict': {
'ext': 'mp4',
'id': '66942',
'title': 'Плен - сезон 2, епизод 55',
'thumbnail': 'https://cdn.btv.bg/media/images/940x529/Jun2025/2113595104.jpg',
},
}, {
'url': 'https://btvplus.bg/produkt/novini/67270/btv-novinite-centralna-emisija-12-07-2025',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_url = self._search_regex(
r'var\s+videoUrl\s*=\s*[\'"]([^\'"]+)[\'"]',
webpage, 'player URL')
player_config = self._download_json(
urljoin('https://btvplus.bg', player_url), video_id)['config']
videojs_data = self._search_json(
r'videojs\(["\'][^"\']+["\'],', player_config, 'videojs data',
video_id, transform_source=js_to_json)
formats = []
subtitles = {}
for src in traverse_obj(videojs_data, ('sources', lambda _, v: url_or_none(v['src']))):
ext = mimetype2ext(src.get('type'))
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src['src'], video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
self.report_warning(f'Unknown format type {ext}{bug_reports_message()}')
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'title': (
strip_or_none(self._og_search_title(webpage, default=None))
or clean_html(get_element_by_class('product-title', webpage))),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
}

View File

@ -11,7 +11,7 @@ from ..utils.traversal import traverse_obj
class CloudyCDNIE(InfoExtractor):
_VALID_URL = r'(?:https?:)?//embed\.cloudycdn\.services/(?P<site_id>[^/?#]+)/media/(?P<id>[\w-]+)'
_VALID_URL = r'(?:https?:)?//embed\.(?P<domain>cloudycdn\.services|backscreen\.com)/(?P<site_id>[^/?#]+)/media/(?P<id>[\w-]+)'
_EMBED_REGEX = [rf'<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://embed.cloudycdn.services/ltv/media/46k_d23-6000-105?',
@ -23,7 +23,7 @@ class CloudyCDNIE(InfoExtractor):
'duration': 1442,
'upload_date': '20231121',
'title': 'D23-6000-105_cetstud',
'thumbnail': 'https://store.cloudycdn.services/tmsp00060/assets/media/660858/placeholder1700589200.jpg',
'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/660858/placeholder1700589200.jpg',
},
}, {
'url': 'https://embed.cloudycdn.services/izm/media/26e_lv-8-5-1',
@ -33,7 +33,7 @@ class CloudyCDNIE(InfoExtractor):
'ext': 'mp4',
'title': 'LV-8-5-1',
'timestamp': 1669767167,
'thumbnail': 'https://store.cloudycdn.services/tmsp00120/assets/media/488306/placeholder1679423604.jpg',
'thumbnail': 'https://store.bstrm.net/tmsp00120/assets/media/488306/placeholder1679423604.jpg',
'duration': 1205,
'upload_date': '20221130',
},
@ -48,9 +48,21 @@ class CloudyCDNIE(InfoExtractor):
'duration': 1673,
'title': 'D24-6000-074-cetstud',
'timestamp': 1718902233,
'thumbnail': 'https://store.cloudycdn.services/tmsp00060/assets/media/788392/placeholder1718903938.jpg',
'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/788392/placeholder1718903938.jpg',
},
'params': {'format': 'bv'},
}, {
'url': 'https://embed.backscreen.com/ltv/media/32j_z25-0600-127?',
'md5': '9b6fa09ac1a4de53d4f42b94affc3b42',
'info_dict': {
'id': '32j_z25-0600-127',
'ext': 'mp4',
'title': 'Z25-0600-127-DZ',
'duration': 1906,
'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/977427/placeholder1746633646.jpg',
'timestamp': 1746632402,
'upload_date': '20250507',
},
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.tavaklase.lv/video/es-esmu-mina-um-2/',
@ -60,17 +72,17 @@ class CloudyCDNIE(InfoExtractor):
'ext': 'mp4',
'upload_date': '20230223',
'duration': 629,
'thumbnail': 'https://store.cloudycdn.services/tmsp00120/assets/media/518407/placeholder1678748124.jpg',
'thumbnail': 'https://store.bstrm.net/tmsp00120/assets/media/518407/placeholder1678748124.jpg',
'timestamp': 1677181513,
'title': 'LIB-2',
},
}]
def _real_extract(self, url):
site_id, video_id = self._match_valid_url(url).group('site_id', 'id')
domain, site_id, video_id = self._match_valid_url(url).group('domain', 'site_id', 'id')
data = self._download_json(
f'https://player.cloudycdn.services/player/{site_id}/media/{video_id}/',
f'https://player.{domain}/player/{site_id}/media/{video_id}/',
video_id, data=urlencode_postdata({
'version': '6.4.0',
'referer': url,

View File

@ -1,5 +1,6 @@
import base64
import collections
import contextlib
import functools
import getpass
import http.client
@ -101,6 +102,7 @@ from ..utils import (
xpath_with_ns,
)
from ..utils._utils import _request_dump_filename
from ..utils.jslib import devalue
class InfoExtractor:
@ -262,6 +264,9 @@ class InfoExtractor:
* http_chunk_size Chunk size for HTTP downloads
* ffmpeg_args Extra arguments for ffmpeg downloader (input)
* ffmpeg_args_out Extra arguments for ffmpeg downloader (output)
* ws (NiconicoLiveFD only) WebSocketResponse
* ws_url (NiconicoLiveFD only) Websockets URL
* max_quality (NiconicoLiveFD only) Max stream quality string
* is_dash_periods Whether the format is a result of merging
multiple DASH periods.
RTMP formats can also have the additional fields: page_url,
@ -1778,6 +1783,59 @@ class InfoExtractor:
r'<script[^>]+id=[\'"]__NEXT_DATA__[\'"][^>]*>', webpage, 'next.js data',
video_id, end_pattern='</script>', fatal=fatal, default=default, **kw)
def _search_nextjs_v13_data(self, webpage, video_id, fatal=True):
"""Parses Next.js app router flight data that was introduced in Next.js v13"""
nextjs_data = {}
if not fatal and not isinstance(webpage, str):
return nextjs_data
def flatten(flight_data):
if not isinstance(flight_data, list):
return
if len(flight_data) == 4 and flight_data[0] == '$':
_, name, _, data = flight_data
if not isinstance(data, dict):
return
children = data.pop('children', None)
if data and isinstance(name, str) and re.fullmatch(r'\$L[0-9a-f]+', name):
# It is useful hydration JSON data
nextjs_data[name[2:]] = data
flatten(children)
return
for f in flight_data:
flatten(f)
flight_text = ''
# The pattern for the surrounding JS/tag should be strict as it's a hardcoded string in the next.js source
# Ref: https://github.com/vercel/next.js/blob/5a4a08fdc/packages/next/src/server/app-render/use-flight-response.tsx#L189
for flight_segment in re.findall(r'<script\b[^>]*>self\.__next_f\.push\((\[.+?\])\)</script>', webpage):
segment = self._parse_json(flight_segment, video_id, fatal=fatal, errnote=None if fatal else False)
# Some earlier versions of next.js "optimized" away this array structure; this is unsupported
# Ref: https://github.com/vercel/next.js/commit/0123a9d5c9a9a77a86f135b7ae30b46ca986d761
if not isinstance(segment, list) or len(segment) != 2:
self.write_debug(
f'{video_id}: Unsupported next.js flight data structure detected', only_once=True)
continue
# Only use the relevant payload type (1 == data)
# Ref: https://github.com/vercel/next.js/blob/5a4a08fdc/packages/next/src/server/app-render/use-flight-response.tsx#L11-L14
payload_type, chunk = segment
if payload_type == 1:
flight_text += chunk
for f in flight_text.splitlines():
prefix, _, body = f.lstrip().partition(':')
if not re.fullmatch(r'[0-9a-f]+', prefix):
continue
# The body still isn't guaranteed to be valid JSON, so parsing should always be non-fatal
if body.startswith('[') and body.endswith(']'):
flatten(self._parse_json(body, video_id, fatal=False, errnote=False))
elif body.startswith('{') and body.endswith('}'):
data = self._parse_json(body, video_id, fatal=False, errnote=False)
if data is not None:
nextjs_data[prefix] = data
return nextjs_data
def _search_nuxt_data(self, webpage, video_id, context_name='__NUXT__', *, fatal=True, traverse=('data', 0)):
"""Parses Nuxt.js metadata. This works as long as the function __NUXT__ invokes is a pure function"""
rectx = re.escape(context_name)
@ -1795,6 +1853,63 @@ class InfoExtractor:
ret = self._parse_json(js, video_id, transform_source=functools.partial(js_to_json, vars=args), fatal=fatal)
return traverse_obj(ret, traverse) or {}
def _resolve_nuxt_array(self, array, video_id, *, fatal=True, default=NO_DEFAULT):
"""Resolves Nuxt rich JSON payload arrays"""
# Ref: https://github.com/nuxt/nuxt/commit/9e503be0f2a24f4df72a3ccab2db4d3e63511f57
# https://github.com/nuxt/nuxt/pull/19205
if default is not NO_DEFAULT:
fatal = False
if not isinstance(array, list) or not array:
error_msg = 'Unable to resolve Nuxt JSON data: invalid input'
if fatal:
raise ExtractorError(error_msg, video_id=video_id)
elif default is NO_DEFAULT:
self.report_warning(error_msg, video_id=video_id)
return {} if default is NO_DEFAULT else default
def indirect_reviver(data):
return data
def json_reviver(data):
return json.loads(data)
gen = devalue.parse_iter(array, revivers={
'NuxtError': indirect_reviver,
'EmptyShallowRef': json_reviver,
'EmptyRef': json_reviver,
'ShallowRef': indirect_reviver,
'ShallowReactive': indirect_reviver,
'Ref': indirect_reviver,
'Reactive': indirect_reviver,
})
while True:
try:
error_msg = f'Error resolving Nuxt JSON: {gen.send(None)}'
if fatal:
raise ExtractorError(error_msg, video_id=video_id)
elif default is NO_DEFAULT:
self.report_warning(error_msg, video_id=video_id, only_once=True)
else:
self.write_debug(f'{video_id}: {error_msg}', only_once=True)
except StopIteration as error:
return error.value or ({} if default is NO_DEFAULT else default)
def _search_nuxt_json(self, webpage, video_id, *, fatal=True, default=NO_DEFAULT):
"""Parses metadata from Nuxt rich JSON payloads embedded in HTML"""
passed_default = default is not NO_DEFAULT
array = self._search_json(
r'<script\b[^>]+\bid="__NUXT_DATA__"[^>]*>', webpage,
'Nuxt JSON data', video_id, contains_pattern=r'\[(?s:.+)\]',
fatal=fatal, default=NO_DEFAULT if not passed_default else None)
if not array:
return default if passed_default else {}
return self._resolve_nuxt_array(array, video_id, fatal=fatal, default=default)
@staticmethod
def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
@ -2068,21 +2183,33 @@ class InfoExtractor:
raise ExtractorError(errnote, video_id=video_id)
self.report_warning(f'{errnote}{bug_reports_message()}')
return [], {}
res = self._download_webpage_handle(
m3u8_url, video_id,
note='Downloading m3u8 information' if note is None else note,
errnote='Failed to download m3u8 information' if errnote is None else errnote,
if note is None:
note = 'Downloading m3u8 information'
if errnote is None:
errnote = 'Failed to download m3u8 information'
response = self._request_webpage(
m3u8_url, video_id, note=note, errnote=errnote,
fatal=fatal, data=data, headers=headers, query=query)
if res is False:
if response is False:
return [], {}
m3u8_doc, urlh = res
m3u8_url = urlh.url
with contextlib.closing(response):
prefix = response.read(512)
if not prefix.startswith(b'#EXTM3U'):
msg = 'Response data has no m3u header'
if fatal:
raise ExtractorError(msg, video_id=video_id)
self.report_warning(f'{msg}{bug_reports_message()}', video_id=video_id)
return [], {}
content = self._webpage_read_content(
response, m3u8_url, video_id, note=note, errnote=errnote,
fatal=fatal, prefix=prefix, data=data)
if content is False:
return [], {}
return self._parse_m3u8_formats_and_subtitles(
m3u8_doc, m3u8_url, ext=ext, entry_protocol=entry_protocol,
content, response.url, ext=ext, entry_protocol=entry_protocol,
preference=preference, quality=quality, m3u8_id=m3u8_id,
note=note, errnote=errnote, fatal=fatal, live=live, data=data,
headers=headers, query=query, video_id=video_id)

View File

@ -1,49 +0,0 @@
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/(?P<id>(?:show|movie)s/[^/]+/[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ctv.ca/shows/your-morning/wednesday-december-23-2020-s5e88',
'info_dict': {
'id': '2102249',
'ext': 'flv',
'title': 'Wednesday, December 23, 2020',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'Your Morning delivers original perspectives and unique insights into the headlines of the day.',
'timestamp': 1608732000,
'upload_date': '20201223',
'series': 'Your Morning',
'season': '2020-2021',
'season_number': 5,
'episode_number': 88,
'tags': ['Your Morning'],
'categories': ['Talk Show'],
'duration': 7467.126,
},
}, {
'url': 'https://www.ctv.ca/movies/adam-sandlers-eight-crazy-nights/adam-sandlers-eight-crazy-nights',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
content = self._download_json(
'https://www.ctv.ca/space-graphql/graphql', display_id, query={
'query': '''{
resolvedPath(path: "/%s") {
lastSegment {
content {
... on AxisContent {
axisId
videoPlayerDestCode
}
}
}
}
}''' % display_id, # noqa: UP031
})['data']['resolvedPath']['lastSegment']['content']
video_id = content['axisId']
return self.url_result(
'9c9media:{}:{}'.format(content['videoPlayerDestCode'], video_id),
'NineCNineMedia', video_id)

View File

@ -11,8 +11,14 @@ from ..utils.traversal import traverse_obj
class DangalPlayBaseIE(InfoExtractor):
_NETRC_MACHINE = 'dangalplay'
_REGION = 'IN'
_OTV_USER_ID = None
_LOGIN_HINT = 'Pass credentials as -u "token" -p "USER_ID" where USER_ID is the `otv_user_id` in browser local storage'
_LOGIN_HINT = (
'Pass credentials as -u "token" -p "USER_ID" '
'(where USER_ID is the value of "otv_user_id" in your browser local storage). '
'Your login region can be optionally suffixed to the username as @REGION '
'(where REGION is the two-letter "region" code found in your browser local storage), '
'e.g.: -u "token@IN" -p "USER_ID"')
_API_BASE = 'https://ottapi.dangalplay.com'
_AUTH_TOKEN = 'jqeGWxRKK7FK5zEk3xCM' # from https://www.dangalplay.com/main.48ad19e24eb46acccef3.js
_SECRET_KEY = 'f53d31a4377e4ef31fa0' # same as above
@ -20,8 +26,12 @@ class DangalPlayBaseIE(InfoExtractor):
def _perform_login(self, username, password):
if self._OTV_USER_ID:
return
if username != 'token' or not re.fullmatch(r'[\da-f]{32}', password):
mobj = re.fullmatch(r'token(?:@(?P<region>[A-Z]{2}))?', username)
if not mobj or not re.fullmatch(r'[\da-f]{32}', password):
raise ExtractorError(self._LOGIN_HINT, expected=True)
if region := mobj.group('region'):
self._REGION = region
self.write_debug(f'Setting login region to "{self._REGION}"')
self._OTV_USER_ID = password
def _real_initialize(self):
@ -52,7 +62,7 @@ class DangalPlayBaseIE(InfoExtractor):
f'{self._API_BASE}/{path}', display_id, note, fatal=fatal,
headers={'Accept': 'application/json'}, query={
'auth_token': self._AUTH_TOKEN,
'region': 'IN',
'region': self._REGION,
**query,
})
@ -106,7 +116,7 @@ class DangalPlayIE(DangalPlayBaseIE):
'catalog_id': catalog_id,
'content_id': content_id,
'category': '',
'region': 'IN',
'region': self._REGION,
'auth_token': self._AUTH_TOKEN,
'id': self._OTV_USER_ID,
'md5': hashlib.md5(unhashed.encode()).hexdigest(),
@ -129,11 +139,14 @@ class DangalPlayIE(DangalPlayBaseIE):
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 422:
error_info = traverse_obj(e.cause.response.read().decode(), ({json.loads}, 'error', {dict})) or {}
if error_info.get('code') == '1016':
error_code = error_info.get('code')
if error_code == '1016':
self.raise_login_required(
f'Your token has expired or is invalid. {self._LOGIN_HINT}', method=None)
elif msg := error_info.get('message'):
raise ExtractorError(msg)
elif error_code == '4028':
self.raise_login_required(
f'Your login region is unspecified or incorrect. {self._LOGIN_HINT}', method=None)
raise ExtractorError(join_nonempty(error_code, error_info.get('message'), delim=': '))
raise
m3u8_url = traverse_obj(details, (

View File

@ -17,8 +17,140 @@ from ..utils import (
from ..utils.traversal import traverse_obj
class FloatplaneIE(InfoExtractor):
class FloatplaneBaseIE(InfoExtractor):
def _real_extract(self, url):
post_id = self._match_id(url)
post_data = self._download_json(
f'{self._BASE_URL}/api/v3/content/post', post_id, query={'id': post_id},
note='Downloading post data', errnote='Unable to download post data',
impersonate=self._IMPERSONATE_TARGET)
if not any(traverse_obj(post_data, ('metadata', ('hasVideo', 'hasAudio')))):
raise ExtractorError('Post does not contain a video or audio track', expected=True)
uploader_url = format_field(
post_data, [('creator', 'urlname')], f'{self._BASE_URL}/channel/%s/home') or None
common_info = {
'uploader_url': uploader_url,
'channel_url': urljoin(f'{uploader_url}/', traverse_obj(post_data, ('channel', 'urlname'))),
'availability': self._availability(needs_subscription=True),
**traverse_obj(post_data, {
'uploader': ('creator', 'title', {str}),
'uploader_id': ('creator', 'id', {str}),
'channel': ('channel', 'title', {str}),
'channel_id': ('channel', 'id', {str}),
'release_timestamp': ('releaseDate', {parse_iso8601}),
}),
}
items = []
for media in traverse_obj(post_data, (('videoAttachments', 'audioAttachments'), ...)):
media_id = media['id']
media_typ = media.get('type') or 'video'
metadata = self._download_json(
f'{self._BASE_URL}/api/v3/content/{media_typ}', media_id, query={'id': media_id},
note=f'Downloading {media_typ} metadata', impersonate=self._IMPERSONATE_TARGET)
stream = self._download_json(
f'{self._BASE_URL}/api/v2/cdn/delivery', media_id, query={
'type': 'vod' if media_typ == 'video' else 'aod',
'guid': metadata['guid'],
}, note=f'Downloading {media_typ} stream data',
impersonate=self._IMPERSONATE_TARGET)
path_template = traverse_obj(stream, ('resource', 'uri', {str}))
def format_path(params):
path = path_template
for i, val in (params or {}).items():
path = path.replace(f'{{qualityLevelParams.{i}}}', val)
return path
formats = []
for quality in traverse_obj(stream, ('resource', 'data', 'qualityLevels', ...)):
url = urljoin(stream['cdn'], format_path(traverse_obj(
stream, ('resource', 'data', 'qualityLevelParams', quality['name'], {dict}))))
format_id = traverse_obj(quality, ('name', {str}))
hls_aes = {}
m3u8_data = None
# If we need impersonation for the API, then we need it for HLS keys too: extract in advance
if self._IMPERSONATE_TARGET is not None:
m3u8_data = self._download_webpage(
url, media_id, fatal=False, impersonate=self._IMPERSONATE_TARGET, headers=self._HEADERS,
note=join_nonempty('Downloading', format_id, 'm3u8 information', delim=' '),
errnote=join_nonempty('Failed to download', format_id, 'm3u8 information', delim=' '))
if not m3u8_data:
continue
key_url = self._search_regex(
r'#EXT-X-KEY:METHOD=AES-128,URI="(https?://[^"]+)"',
m3u8_data, 'HLS AES key URI', default=None)
if key_url:
urlh = self._request_webpage(
key_url, media_id, fatal=False, impersonate=self._IMPERSONATE_TARGET, headers=self._HEADERS,
note=join_nonempty('Downloading', format_id, 'HLS AES key', delim=' '),
errnote=join_nonempty('Failed to download', format_id, 'HLS AES key', delim=' '))
if urlh:
hls_aes['key'] = urlh.read().hex()
formats.append({
**traverse_obj(quality, {
'format_note': ('label', {str}),
'width': ('width', {int}),
'height': ('height', {int}),
}),
**parse_codecs(quality.get('codecs')),
'url': url,
'ext': determine_ext(url.partition('/chunk.m3u8')[0], 'mp4'),
'format_id': format_id,
'hls_media_playlist_data': m3u8_data,
'hls_aes': hls_aes or None,
})
items.append({
**common_info,
'id': media_id,
**traverse_obj(metadata, {
'title': ('title', {str}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
'formats': formats,
})
post_info = {
**common_info,
'id': post_id,
'display_id': post_id,
**traverse_obj(post_data, {
'title': ('title', {str}),
'description': ('text', {clean_html}),
'like_count': ('likes', {int_or_none}),
'dislike_count': ('dislikes', {int_or_none}),
'comment_count': ('comments', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
'http_headers': self._HEADERS,
}
if len(items) > 1:
return self.playlist_result(items, **post_info)
post_info.update(items[0])
return post_info
class FloatplaneIE(FloatplaneBaseIE):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/post/(?P<id>\w+)'
_BASE_URL = 'https://www.floatplane.com'
_IMPERSONATE_TARGET = None
_HEADERS = {
'Origin': _BASE_URL,
'Referer': f'{_BASE_URL}/',
}
_TESTS = [{
'url': 'https://www.floatplane.com/post/2Yf3UedF7C',
'info_dict': {
@ -170,105 +302,9 @@ class FloatplaneIE(InfoExtractor):
}]
def _real_initialize(self):
if not self._get_cookies('https://www.floatplane.com').get('sails.sid'):
if not self._get_cookies(self._BASE_URL).get('sails.sid'):
self.raise_login_required()
def _real_extract(self, url):
post_id = self._match_id(url)
post_data = self._download_json(
'https://www.floatplane.com/api/v3/content/post', post_id, query={'id': post_id},
note='Downloading post data', errnote='Unable to download post data')
if not any(traverse_obj(post_data, ('metadata', ('hasVideo', 'hasAudio')))):
raise ExtractorError('Post does not contain a video or audio track', expected=True)
uploader_url = format_field(
post_data, [('creator', 'urlname')], 'https://www.floatplane.com/channel/%s/home') or None
common_info = {
'uploader_url': uploader_url,
'channel_url': urljoin(f'{uploader_url}/', traverse_obj(post_data, ('channel', 'urlname'))),
'availability': self._availability(needs_subscription=True),
**traverse_obj(post_data, {
'uploader': ('creator', 'title', {str}),
'uploader_id': ('creator', 'id', {str}),
'channel': ('channel', 'title', {str}),
'channel_id': ('channel', 'id', {str}),
'release_timestamp': ('releaseDate', {parse_iso8601}),
}),
}
items = []
for media in traverse_obj(post_data, (('videoAttachments', 'audioAttachments'), ...)):
media_id = media['id']
media_typ = media.get('type') or 'video'
metadata = self._download_json(
f'https://www.floatplane.com/api/v3/content/{media_typ}', media_id, query={'id': media_id},
note=f'Downloading {media_typ} metadata')
stream = self._download_json(
'https://www.floatplane.com/api/v2/cdn/delivery', media_id, query={
'type': 'vod' if media_typ == 'video' else 'aod',
'guid': metadata['guid'],
}, note=f'Downloading {media_typ} stream data')
path_template = traverse_obj(stream, ('resource', 'uri', {str}))
def format_path(params):
path = path_template
for i, val in (params or {}).items():
path = path.replace(f'{{qualityLevelParams.{i}}}', val)
return path
formats = []
for quality in traverse_obj(stream, ('resource', 'data', 'qualityLevels', ...)):
url = urljoin(stream['cdn'], format_path(traverse_obj(
stream, ('resource', 'data', 'qualityLevelParams', quality['name'], {dict}))))
formats.append({
**traverse_obj(quality, {
'format_id': ('name', {str}),
'format_note': ('label', {str}),
'width': ('width', {int}),
'height': ('height', {int}),
}),
**parse_codecs(quality.get('codecs')),
'url': url,
'ext': determine_ext(url.partition('/chunk.m3u8')[0], 'mp4'),
})
items.append({
**common_info,
'id': media_id,
**traverse_obj(metadata, {
'title': ('title', {str}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
'formats': formats,
})
post_info = {
**common_info,
'id': post_id,
'display_id': post_id,
**traverse_obj(post_data, {
'title': ('title', {str}),
'description': ('text', {clean_html}),
'like_count': ('likes', {int_or_none}),
'dislike_count': ('dislikes', {int_or_none}),
'comment_count': ('comments', {int_or_none}),
'thumbnail': ('thumbnail', 'path', {url_or_none}),
}),
}
if len(items) > 1:
return self.playlist_result(items, **post_info)
post_info.update(items[0])
return post_info
class FloatplaneChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|beta)\.)?floatplane\.com/channel/(?P<id>[\w-]+)/home(?:/(?P<channel>[\w-]+))?'

View File

@ -1,4 +1,3 @@
import json
import re
import urllib.parse
@ -19,7 +18,11 @@ from ..utils import (
unsmuggle_url,
url_or_none,
)
from ..utils.traversal import find_element, traverse_obj
from ..utils.traversal import (
find_element,
get_first,
traverse_obj,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
@ -121,9 +124,10 @@ class FranceTVIE(InfoExtractor):
elif code := traverse_obj(dinfo, ('code', {int})):
if code == 2009:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
elif code in (2015, 2017):
elif code in (2015, 2017, 2019):
# 2015: L'accès à cette vidéo est impossible. (DRM-only)
# 2017: Cette vidéo n'est pas disponible depuis le site web mobile (b/c DRM)
# 2019: L'accès à cette vidéo est incompatible avec votre configuration. (DRM-only)
drm_formats = True
continue
self.report_warning(
@ -258,7 +262,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
_TESTS = [{
'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html',
'info_dict': {
'id': 'ec217ecc-0733-48cf-ac06-af1347b849d1', # old: c5bda21d-2c6f-4470-8849-3d8327adb2ba'
'id': 'b2cf9fd8-e971-4757-8651-848f2772df61', # old: ec217ecc-0733-48cf-ac06-af1347b849d1
'ext': 'mp4',
'title': '13h15, le dimanche... - Les mystères de Jésus',
'timestamp': 1502623500,
@ -269,7 +273,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'params': {
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
'skip': 'Unfortunately, this video is no longer available',
}, {
# geo-restricted
'url': 'https://www.france.tv/enfants/six-huit-ans/foot2rue/saison-1/3066387-duel-au-vieux-port.html',
@ -287,7 +291,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1441,
},
'skip': 'No longer available',
'skip': 'Unfortunately, this video is no longer available',
}, {
# geo-restricted livestream (workflow == 'token-akamai')
'url': 'https://www.france.tv/france-4/direct.html',
@ -308,6 +312,19 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'live_status': 'is_live',
},
'params': {'skip_download': 'livestream'},
}, {
# Not geo-restricted
'url': 'https://www.france.tv/france-2/la-maison-des-maternelles/5574051-nous-sommes-amis-et-nous-avons-fait-un-enfant-ensemble.html',
'info_dict': {
'id': 'b448bfe4-9fe7-11ee-97d8-2ba3426fa3df',
'ext': 'mp4',
'title': 'Nous sommes amis et nous avons fait un enfant ensemble - Émission du jeudi 21 décembre 2023',
'duration': 1065,
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1703147921,
'upload_date': '20231221',
},
'params': {'skip_download': 'm3u8'},
}, {
# france3
'url': 'https://www.france.tv/france-3/des-chiffres-et-des-lettres/139063-emission-du-mardi-9-mai-2017.html',
@ -342,30 +359,16 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'only_matching': True,
}]
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
nextjs_data = self._search_nextjs_v13_data(webpage, display_id)
nextjs_data = traverse_obj(
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json}, ..., 'children', ..., ..., 'children', ..., ..., 'children'))
if traverse_obj(nextjs_data, (..., ..., 'children', ..., 'isLive', {bool}, any)):
if get_first(nextjs_data, ('isLive', {bool})):
# For livestreams we need the id of the stream instead of the currently airing episode id
video_id = traverse_obj(nextjs_data, (
..., ..., 'children', ..., 'children', ..., 'children', ..., 'children', ..., ...,
'children', ..., ..., 'children', ..., ..., 'children', (..., (..., ...)),
'options', 'id', {str}, any))
video_id = get_first(nextjs_data, ('options', 'id', {str}))
else:
video_id = traverse_obj(nextjs_data, (
..., ..., ..., 'children',
lambda _, v: v['video']['url'] == urllib.parse.urlparse(url).path,
'video', ('playerReplayId', 'siId'), {str}, any))
video_id = get_first(nextjs_data, ('video', ('playerReplayId', 'siId'), {str}))
if not video_id:
raise ExtractorError('Unable to extract video ID')

View File

@ -1481,30 +1481,6 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['SenateISVP'],
},
{
# Limelight embeds (1 channel embed + 4 media embeds)
'url': 'http://www.sedona.com/FacilitatorTraining2017',
'info_dict': {
'id': 'FacilitatorTraining2017',
'title': 'Facilitator Training 2017',
},
'playlist_mincount': 5,
},
{
# Limelight embed (LimelightPlayerUtil.embed)
'url': 'https://tv5.ca/videos?v=xuu8qowr291ri',
'info_dict': {
'id': '95d035dc5c8a401588e9c0e6bd1e9c92',
'ext': 'mp4',
'title': '07448641',
'timestamp': 1499890639,
'upload_date': '20170712',
},
'params': {
'skip_download': True,
},
'add_ie': ['LimelightMedia'],
},
{
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
'info_dict': {

View File

@ -5,16 +5,11 @@ import hashlib
import hmac
import json
import os
import re
import urllib.parse
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
remove_end,
traverse_obj,
)
from ..utils import ExtractorError, int_or_none
from ..utils.traversal import get_first, traverse_obj
class GoPlayIE(InfoExtractor):
@ -27,10 +22,10 @@ class GoPlayIE(InfoExtractor):
'info_dict': {
'id': '2baa4560-87a0-421b-bffc-359914e3c387',
'ext': 'mp4',
'title': 'S22 - Aflevering 1',
'title': 'De Slimste Mens ter Wereld - S22 - Aflevering 1',
'description': r're:In aflevering 1 nemen Daan Alferink, Tess Elst en Xander De Rycke .{66}',
'series': 'De Slimste Mens ter Wereld',
'episode': 'Episode 1',
'episode': 'Wordt aangekondigd',
'season_number': 22,
'episode_number': 1,
'season': 'Season 22',
@ -52,7 +47,7 @@ class GoPlayIE(InfoExtractor):
'info_dict': {
'id': 'ecb79672-92b9-4cd9-a0d7-e2f0250681ee',
'ext': 'mp4',
'title': 'S11 - Aflevering 1',
'title': 'De Mol - S11 - Aflevering 1',
'description': r're:Tien kandidaten beginnen aan hun verovering van Amerika en ontmoeten .{102}',
'episode': 'Episode 1',
'series': 'De Mol',
@ -75,21 +70,13 @@ class GoPlayIE(InfoExtractor):
if not self._id_token:
raise self.raise_login_required(method='password')
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
nextjs_data = traverse_obj(
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json}, ...))
meta = traverse_obj(nextjs_data, (
..., ..., 'children', ..., ..., 'children',
lambda _, v: v['video']['path'] == urllib.parse.urlparse(url).path, 'video', any))
nextjs_data = self._search_nextjs_v13_data(webpage, display_id)
meta = get_first(nextjs_data, (
lambda k, v: k in ('video', 'meta') and v['path'] == urllib.parse.urlparse(url).path))
video_id = meta['uuid']
info_dict = traverse_obj(meta, {
@ -98,19 +85,18 @@ class GoPlayIE(InfoExtractor):
})
if traverse_obj(meta, ('program', 'subtype')) != 'movie':
for season_data in traverse_obj(nextjs_data, (..., 'children', ..., 'playlists', ...)):
episode_data = traverse_obj(
season_data, ('videos', lambda _, v: v['videoId'] == video_id, any))
for season_data in traverse_obj(nextjs_data, (..., 'playlists', ..., {dict})):
episode_data = traverse_obj(season_data, ('videos', lambda _, v: v['videoId'] == video_id, any))
if not episode_data:
continue
episode_title = traverse_obj(
episode_data, 'contextualTitle', 'episodeTitle', expected_type=str)
season_number = traverse_obj(season_data, ('season', {int_or_none}))
info_dict.update({
'title': episode_title or info_dict.get('title'),
'series': remove_end(info_dict.get('title'), f' - {episode_title}'),
'season_number': traverse_obj(season_data, ('season', {int_or_none})),
'episode': traverse_obj(episode_data, ('episodeTitle', {str})),
'episode_number': traverse_obj(episode_data, ('episodeNumber', {int_or_none})),
'season_number': season_number,
'series': self._search_regex(
fr'^(.+)? - S{season_number} - ', info_dict.get('title'), 'series', default=None),
})
break

View File

@ -1,3 +1,4 @@
import functools
import hashlib
import hmac
import json
@ -9,77 +10,125 @@ from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
OnDemandPagedList,
determine_ext,
filter_dict,
int_or_none,
join_nonempty,
jwt_decode_hs256,
parse_iso8601,
str_or_none,
traverse_obj,
url_or_none,
)
from ..utils.traversal import require, traverse_obj
class HotStarBaseIE(InfoExtractor):
_TOKEN_NAME = 'userUP'
_BASE_URL = 'https://www.hotstar.com'
_API_URL = 'https://api.hotstar.com'
_API_URL_V2 = 'https://apix.hotstar.com/v2'
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
_FREE_HEADERS = {
'user-agent': 'Hotstar;in.startv.hotstar/25.06.30.0.11580 (Android/12)',
'x-hs-client': 'platform:android;app_id:in.startv.hotstar;app_version:25.06.30.0;os:Android;os_version:12;schema_version:0.0.1523',
'x-hs-platform': 'android',
}
_SUB_HEADERS = {
'user-agent': 'Disney+;in.startv.hotstar.dplus.tv/23.08.14.4.2915 (Android/13)',
'x-hs-client': 'platform:androidtv;app_id:in.startv.hotstar.dplus.tv;app_version:23.08.14.4;os:Android;os_version:13;schema_version:0.0.970',
'x-hs-platform': 'androidtv',
}
def _has_active_subscription(self, cookies, server_time):
expiry = traverse_obj(cookies, (
self._TOKEN_NAME, 'value', {jwt_decode_hs256}, 'sub', {json.loads},
'subscriptions', 'in', ..., 'expiry', {parse_iso8601}, all, {max})) or 0
return expiry > server_time
def _call_api_v1(self, path, *args, **kwargs):
return self._download_json(
f'{self._API_URL}/o/v1/{path}', *args, **kwargs,
headers={'x-country-code': 'IN', 'x-platform-code': 'PCTV'})
def _call_api_impl(self, path, video_id, query, st=None, cookies=None):
def _call_api_impl(self, path, video_id, query, cookies=None, st=None):
st = int_or_none(st) or int(time.time())
exp = st + 6000
auth = f'st={st}~exp={exp}~acl=/*'
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
if cookies and cookies.get('userUP'):
token = cookies.get('userUP').value
else:
token = self._download_json(
f'{self._API_URL}/um/v3/users',
video_id, note='Downloading token',
data=json.dumps({'device_ids': [{'id': str(uuid.uuid4()), 'type': 'device_id'}]}).encode(),
headers={
'hotstarauth': auth,
'x-hs-platform': 'PCTV', # or 'web'
'Content-Type': 'application/json',
})['user_identity']
response = self._download_json(
f'{self._API_URL}/{path}', video_id, query=query,
headers={
f'{self._API_URL_V2}/{path}', video_id, query=query,
headers=filter_dict({
**(self._SUB_HEADERS if self._has_active_subscription(cookies, st) else self._FREE_HEADERS),
'hotstarauth': auth,
'x-hs-appversion': '6.72.2',
'x-hs-platform': 'web',
'x-hs-usertoken': token,
})
'x-hs-usertoken': traverse_obj(cookies, (self._TOKEN_NAME, 'value')),
'x-hs-device-id': traverse_obj(cookies, ('deviceId', 'value')) or str(uuid.uuid4()),
'content-type': 'application/json',
}))
if response['message'] != "Playback URL's fetched successfully":
raise ExtractorError(
response['message'], expected=True)
return response['data']
if not traverse_obj(response, ('success', {dict})):
raise ExtractorError('API call was unsuccessful')
return response['success']
def _call_api_v2(self, path, video_id, st=None, cookies=None):
return self._call_api_impl(
f'{path}/content/{video_id}', video_id, st=st, cookies=cookies, query={
'desired-config': 'audio_channel:stereo|container:fmp4|dynamic_range:hdr|encryption:plain|ladder:tv|package:dash|resolution:fhd|subs-tag:HotstarVIP|video_codec:h265',
'device-id': cookies.get('device_id').value if cookies.get('device_id') else str(uuid.uuid4()),
'os-name': 'Windows',
'os-version': '10',
})
def _call_api_v2(self, path, video_id, content_type, cookies=None, st=None):
return self._call_api_impl(f'{path}', video_id, query={
'content_id': video_id,
'filters': f'content_type={content_type}',
'client_capabilities': json.dumps({
'package': ['dash', 'hls'],
'container': ['fmp4', 'fmp4br', 'ts'],
'ads': ['non_ssai', 'ssai'],
'audio_channel': ['stereo', 'dolby51', 'atmos'],
'encryption': ['plain', 'widevine'], # wv only so we can raise appropriate error
'video_codec': ['h264', 'h265'],
'video_codec_non_secure': ['h264', 'h265', 'vp9'],
'ladder': ['phone', 'tv', 'full'],
'resolution': ['hd', '4k'],
'true_resolution': ['hd', '4k'],
'dynamic_range': ['sdr', 'hdr'],
}, separators=(',', ':')),
'drm_parameters': json.dumps({
'widevine_security_level': ['SW_SECURE_DECODE', 'SW_SECURE_CRYPTO'],
'hdcp_version': ['HDCP_V2_2', 'HDCP_V2_1', 'HDCP_V2', 'HDCP_V1'],
}, separators=(',', ':')),
}, cookies=cookies, st=st)
def _playlist_entries(self, path, item_id, root=None, **kwargs):
results = self._call_api_v1(path, item_id, **kwargs)['body']['results']
for video in traverse_obj(results, (('assets', None), 'items', ...)):
if video.get('contentId'):
yield self.url_result(
HotStarIE._video_url(video['contentId'], root=root), HotStarIE, video['contentId'])
@staticmethod
def _parse_metadata_v1(video_data):
return traverse_obj(video_data, {
'id': ('contentId', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'duration': ('duration', {int_or_none}),
'timestamp': (('broadcastDate', 'startDate'), {int_or_none}, any),
'release_year': ('year', {int_or_none}),
'channel': ('channelName', {str}),
'channel_id': ('channelId', {int}, {str_or_none}),
'series': ('showName', {str}),
'season': ('seasonName', {str}),
'season_number': ('seasonNo', {int_or_none}),
'season_id': ('seasonId', {int}, {str_or_none}),
'episode': ('title', {str}),
'episode_number': ('episodeNo', {int_or_none}),
})
def _fetch_page(self, path, item_id, name, query, root, page):
results = self._call_api_v1(
path, item_id, note=f'Downloading {name} page {page + 1} JSON', query={
**query,
'tao': page * self._PAGE_SIZE,
'tas': self._PAGE_SIZE,
})['body']['results']
for video in traverse_obj(results, (('assets', None), 'items', lambda _, v: v['contentId'])):
yield self.url_result(
HotStarIE._video_url(video['contentId'], root=root), HotStarIE, **self._parse_metadata_v1(video))
class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar'
IE_DESC = 'JioHotstar'
_VALID_URL = r'''(?x)
https?://(?:www\.)?hotstar\.com(?:/in)?/(?!in/)
(?:
@ -114,15 +163,16 @@ class HotStarIE(HotStarBaseIE):
'upload_date': '20190501',
'duration': 1219,
'channel': 'StarPlus',
'channel_id': '3',
'channel_id': '821',
'series': 'Ek Bhram - Sarvagun Sampanna',
'season': 'Chapter 1',
'season_number': 1,
'season_id': '6771',
'season_id': '1260004607',
'episode': 'Janhvi Targets Suman',
'episode_number': 8,
},
}, {
'params': {'skip_download': 'm3u8'},
}, { # Metadata call gets HTTP Error 504 with tas=10000
'url': 'https://www.hotstar.com/in/shows/anupama/1260022017/anupama-anuj-share-a-moment/1000282843',
'info_dict': {
'id': '1000282843',
@ -134,14 +184,14 @@ class HotStarIE(HotStarBaseIE):
'channel': 'StarPlus',
'series': 'Anupama',
'season_number': 1,
'season_id': '7399',
'season_id': '1260022018',
'upload_date': '20230307',
'episode': 'Anupama, Anuj Share a Moment',
'episode_number': 853,
'duration': 1272,
'channel_id': '3',
'duration': 1266,
'channel_id': '821',
},
'skip': 'HTTP Error 504: Gateway Time-out', # XXX: Investigate 504 errors on some episodes
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.hotstar.com/in/shows/kana-kaanum-kaalangal/1260097087/back-to-school/1260097320',
'info_dict': {
@ -154,14 +204,15 @@ class HotStarIE(HotStarBaseIE):
'channel': 'Hotstar Specials',
'series': 'Kana Kaanum Kaalangal',
'season_number': 1,
'season_id': '9441',
'season_id': '1260097089',
'upload_date': '20220421',
'episode': 'Back To School',
'episode_number': 1,
'duration': 1810,
'channel_id': '54',
'channel_id': '1260003991',
},
}, {
'params': {'skip_download': 'm3u8'},
}, { # Metadata call gets HTTP Error 504 with tas=10000
'url': 'https://www.hotstar.com/in/clips/e3-sairat-kahani-pyaar-ki/1000262286',
'info_dict': {
'id': '1000262286',
@ -173,6 +224,7 @@ class HotStarIE(HotStarBaseIE):
'timestamp': 1622943900,
'duration': 5395,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.hotstar.com/in/movies/premam/1000091195',
'info_dict': {
@ -180,12 +232,13 @@ class HotStarIE(HotStarBaseIE):
'ext': 'mp4',
'title': 'Premam',
'release_year': 2015,
'description': 'md5:d833c654e4187b5e34757eafb5b72d7f',
'description': 'md5:096cd8aaae8dab56524823dc19dfa9f7',
'timestamp': 1462149000,
'upload_date': '20160502',
'episode': 'Premam',
'duration': 8994,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157',
'only_matching': True,
@ -208,6 +261,13 @@ class HotStarIE(HotStarBaseIE):
None: 'content',
}
_CONTENT_TYPE = {
'movie': 'MOVIE',
'episode': 'EPISODE',
'match': 'SPORT',
'content': 'CLIPS',
}
_IGNORE_MAP = {
'res': 'resolution',
'vcodec': 'video_codec',
@ -229,38 +289,50 @@ class HotStarIE(HotStarBaseIE):
def _real_extract(self, url):
video_id, video_type = self._match_valid_url(url).group('id', 'type')
video_type = self._TYPE.get(video_type, video_type)
video_type = self._TYPE[video_type]
cookies = self._get_cookies(url) # Cookies before any request
if not cookies or not cookies.get(self._TOKEN_NAME):
self.raise_login_required()
video_data = traverse_obj(
self._call_api_v1(
f'{video_type}/detail', video_id, fatal=False, query={'tas': 10000, 'contentId': video_id}),
('body', 'results', 'item', {dict})) or {}
if not self.get_param('allow_unplayable_formats') and video_data.get('drmProtected'):
self._call_api_v1(f'{video_type}/detail', video_id, fatal=False, query={
'tas': 5, # See https://github.com/yt-dlp/yt-dlp/issues/7946
'contentId': video_id,
}), ('body', 'results', 'item', {dict})) or {}
if video_data.get('drmProtected'):
self.report_drm(video_id)
# See https://github.com/yt-dlp/yt-dlp/issues/396
st = self._download_webpage_handle(f'{self._BASE_URL}/in', video_id)[1].headers.get('x-origin-date')
geo_restricted = False
formats, subs = [], {}
formats, subs, has_drm = [], {}, False
headers = {'Referer': f'{self._BASE_URL}/in'}
content_type = traverse_obj(video_data, ('contentType', {str})) or self._CONTENT_TYPE[video_type]
# change to v2 in the future
playback_sets = self._call_api_v2('play/v1/playback', video_id, st=st, cookies=cookies)['playBackSets']
for playback_set in playback_sets:
if not isinstance(playback_set, dict):
continue
tags = str_or_none(playback_set.get('tagsCombination')) or ''
# See https://github.com/yt-dlp/yt-dlp/issues/396
st = self._request_webpage(
f'{self._BASE_URL}/in', video_id, 'Fetching server time').get_header('x-origin-date')
watch = self._call_api_v2('pages/watch', video_id, content_type, cookies, st)
player_config = traverse_obj(watch, (
'page', 'spaces', 'player', 'widget_wrappers', lambda _, v: v['template'] == 'PlayerWidget',
'widget', 'data', 'player_config', {dict}, any, {require('player config')}))
for playback_set in traverse_obj(player_config, (
('media_asset', 'media_asset_v2'),
('primary', 'fallback'),
all, lambda _, v: url_or_none(v['content_url']),
)):
tags = str_or_none(playback_set.get('playback_tags')) or ''
if any(f'{prefix}:{ignore}' in tags
for key, prefix in self._IGNORE_MAP.items()
for ignore in self._configuration_arg(key)):
continue
format_url = url_or_none(playback_set.get('playbackUrl'))
if not format_url:
tag_dict = dict((*t.split(':', 1), None)[:2] for t in tags.split(';'))
if tag_dict.get('encryption') not in ('plain', None):
has_drm = True
continue
format_url = re.sub(r'(?<=//staragvod)(\d)', r'web\1', format_url)
format_url = re.sub(r'(?<=//staragvod)(\d)', r'web\1', playback_set['content_url'])
ext = determine_ext(format_url)
current_formats, current_subs = [], {}
@ -280,14 +352,12 @@ class HotStarIE(HotStarBaseIE):
'height': int_or_none(playback_set.get('height')),
}]
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 403:
if isinstance(e.cause, HTTPError) and e.cause.status in (403, 474):
geo_restricted = True
else:
self.write_debug(e)
continue
tag_dict = dict((*t.split(':', 1), None)[:2] for t in tags.split(';'))
if tag_dict.get('encryption') not in ('plain', None):
for f in current_formats:
f['has_drm'] = True
for f in current_formats:
for k, v in self._TAG_FIELDS.items():
if not f.get(k):
@ -299,6 +369,11 @@ class HotStarIE(HotStarBaseIE):
'stereo': 2,
'dolby51': 6,
}.get(tag_dict.get('audio_channel'))
if (
'Audio_Description' in f['format_id']
or 'Audio Description' in (f.get('format_note') or '')
):
f['source_preference'] = -99 + (f.get('source_preference') or -1)
f['format_note'] = join_nonempty(
tag_dict.get('ladder'),
tag_dict.get('audio_channel') if f.get('acodec') != 'none' else None,
@ -308,29 +383,22 @@ class HotStarIE(HotStarBaseIE):
formats.extend(current_formats)
subs = self._merge_subtitles(subs, current_subs)
if not formats and geo_restricted:
self.raise_geo_restricted(countries=['IN'], metadata_available=True)
if not formats:
if geo_restricted:
self.raise_geo_restricted(countries=['IN'], metadata_available=True)
elif has_drm:
self.report_drm(video_id)
elif not self._has_active_subscription(cookies, st):
self.raise_no_formats('Your account does not have access to this content', expected=True)
self._remove_duplicate_formats(formats)
for f in formats:
f.setdefault('http_headers', {}).update(headers)
return {
**self._parse_metadata_v1(video_data),
'id': video_id,
'title': video_data.get('title'),
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(traverse_obj(video_data, 'broadcastDate', 'startDate')),
'release_year': int_or_none(video_data.get('year')),
'formats': formats,
'subtitles': subs,
'channel': video_data.get('channelName'),
'channel_id': str_or_none(video_data.get('channelId')),
'series': video_data.get('showName'),
'season': video_data.get('seasonName'),
'season_number': int_or_none(video_data.get('seasonNo')),
'season_id': str_or_none(video_data.get('seasonId')),
'episode': video_data.get('title'),
'episode_number': int_or_none(video_data.get('episodeNo')),
}
@ -371,64 +439,6 @@ class HotStarPrefixIE(InfoExtractor):
return self.url_result(HotStarIE._video_url(video_id, video_type), HotStarIE, video_id)
class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)(?:/[^/]+){2}/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': {
'id': '3_2_26',
},
'playlist_mincount': 20,
}, {
'url': 'https://www.hotstar.com/shows/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'only_matching': True,
}, {
'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True,
}, {
'url': 'https://www.hotstar.com/in/tv/karthika-deepam/15457/list/popular-clips/t-3_2_1272',
'only_matching': True,
}]
def _real_extract(self, url):
id_ = self._match_id(url)
return self.playlist_result(
self._playlist_entries('tray/find', id_, query={'tas': 10000, 'uqId': id_}), id_)
class HotStarSeasonIE(HotStarBaseIE):
IE_NAME = 'hotstar:season'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)/[^/]+/\w+)/seasons/[^/]+/ss-(?P<id>\w+)'
_TESTS = [{
'url': 'https://www.hotstar.com/tv/radhakrishn/1260000646/seasons/season-2/ss-8028',
'info_dict': {
'id': '8028',
},
'playlist_mincount': 35,
}, {
'url': 'https://www.hotstar.com/in/tv/ishqbaaz/9567/seasons/season-2/ss-4357',
'info_dict': {
'id': '4357',
},
'playlist_mincount': 30,
}, {
'url': 'https://www.hotstar.com/in/tv/bigg-boss/14714/seasons/season-4/ss-8208/',
'info_dict': {
'id': '8208',
},
'playlist_mincount': 19,
}, {
'url': 'https://www.hotstar.com/in/shows/bigg-boss/14714/seasons/season-4/ss-8208/',
'only_matching': True,
}]
def _real_extract(self, url):
url, season_id = self._match_valid_url(url).groups()
return self.playlist_result(self._playlist_entries(
'season/asset', season_id, url, query={'tao': 0, 'tas': 0, 'size': 10000, 'id': season_id}), season_id)
class HotStarSeriesIE(HotStarBaseIE):
IE_NAME = 'hotstar:series'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com(?:/in)?/(?:tv|shows)/[^/]+/(?P<id>\d+))/?(?:[#?]|$)'
@ -443,25 +453,29 @@ class HotStarSeriesIE(HotStarBaseIE):
'info_dict': {
'id': '1260050431',
},
'playlist_mincount': 43,
'playlist_mincount': 42,
}, {
'url': 'https://www.hotstar.com/in/tv/mahabharat/435/',
'info_dict': {
'id': '435',
},
'playlist_mincount': 267,
}, {
}, { # HTTP Error 504 with tas=10000 (possibly because total size is over 1000 items?)
'url': 'https://www.hotstar.com/in/shows/anupama/1260022017/',
'info_dict': {
'id': '1260022017',
},
'playlist_mincount': 940,
'playlist_mincount': 1601,
}]
_PAGE_SIZE = 100
def _real_extract(self, url):
url, series_id = self._match_valid_url(url).groups()
id_ = self._call_api_v1(
url, series_id = self._match_valid_url(url).group('url', 'id')
eid = self._call_api_v1(
'show/detail', series_id, query={'contentId': series_id})['body']['results']['item']['id']
return self.playlist_result(self._playlist_entries(
'tray/g/1/items', series_id, url, query={'tao': 0, 'tas': 10000, 'etid': 0, 'eid': id_}), series_id)
entries = OnDemandPagedList(functools.partial(
self._fetch_page, 'tray/g/1/items', series_id,
'series', {'etid': 0, 'eid': eid}, url), self._PAGE_SIZE)
return self.playlist_result(entries, series_id)

View File

@ -7,12 +7,13 @@ import urllib.parse
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
int_or_none,
parse_duration,
str_or_none,
try_get,
unescapeHTML,
unified_strdate,
update_url,
update_url_query,
url_or_none,
)
@ -22,8 +23,8 @@ from ..utils.traversal import traverse_obj
class HuyaLiveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|m\.)?huya\.com/(?!(?:video/play/))(?P<id>[^/#?&]+)(?:\D|$)'
IE_NAME = 'huya:live'
IE_DESC = 'huya.com'
TESTS = [{
IE_DESC = '虎牙直播'
_TESTS = [{
'url': 'https://www.huya.com/572329',
'info_dict': {
'id': '572329',
@ -149,63 +150,94 @@ class HuyaVideoIE(InfoExtractor):
'id': '1002412640',
'ext': 'mp4',
'title': '8月3日',
'thumbnail': r're:https?://.*\.jpg',
'duration': 14,
'categories': ['主机游戏'],
'duration': 14.0,
'uploader': '虎牙-ATS欧卡车队青木',
'uploader_id': '1564376151',
'upload_date': '20240803',
'view_count': int,
'comment_count': int,
'like_count': int,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1722675433,
},
},
{
}, {
'url': 'https://www.huya.com/video/play/556054543.html',
'info_dict': {
'id': '556054543',
'ext': 'mp4',
'title': '我不挑事 也不怕事',
'thumbnail': r're:https?://.*\.jpg',
'duration': 1864,
'categories': ['英雄联盟'],
'description': 'md5:58184869687d18ce62dc7b4b2ad21201',
'duration': 1864.0,
'uploader': '卡尔',
'uploader_id': '367138632',
'upload_date': '20210811',
'view_count': int,
'comment_count': int,
'like_count': int,
'tags': 'count:4',
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1628675950,
},
}, {
# Only m3u8 available
'url': 'https://www.huya.com/video/play/1063345618.html',
'info_dict': {
'id': '1063345618',
'ext': 'mp4',
'title': '峡谷第一中黑铁上钻石顶级教学对抗elo',
'categories': ['英雄联盟'],
'comment_count': int,
'duration': 21603.0,
'like_count': int,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1749668803,
'upload_date': '20250611',
'uploader': '北枫CC',
'uploader_id': '2183525275',
'view_count': int,
},
}]
def _real_extract(self, url: str):
video_id = self._match_id(url)
video_data = self._download_json(
'https://liveapi.huya.com/moment/getMomentContent', video_id,
query={'videoId': video_id})['data']['moment']['videoInfo']
moment = self._download_json(
'https://liveapi.huya.com/moment/getMomentContent',
video_id, query={'videoId': video_id})['data']['moment']
formats = []
for definition in traverse_obj(video_data, ('definitions', lambda _, v: url_or_none(v['url']))):
formats.append({
'url': definition['url'],
**traverse_obj(definition, {
'format_id': ('defName', {str}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
for definition in traverse_obj(moment, (
'videoInfo', 'definitions', lambda _, v: url_or_none(v['m3u8']),
)):
fmts = self._extract_m3u8_formats(definition['m3u8'], video_id, 'mp4', fatal=False)
for fmt in fmts:
fmt.update(**traverse_obj(definition, {
'filesize': ('size', {int_or_none}),
}),
})
'format_id': ('defName', {str}),
'height': ('height', {int_or_none}),
'quality': ('definition', {int_or_none}),
'width': ('width', {int_or_none}),
}))
formats.extend(fmts)
return {
'id': video_id,
'formats': formats,
**traverse_obj(video_data, {
**traverse_obj(moment, {
'comment_count': ('commentCount', {int_or_none}),
'description': ('content', {clean_html}, filter),
'like_count': ('favorCount', {int_or_none}),
'timestamp': ('cTime', {int_or_none}),
}),
**traverse_obj(moment, ('videoInfo', {
'title': ('videoTitle', {str}),
'thumbnail': ('videoCover', {url_or_none}),
'categories': ('category', {str}, filter, all, filter),
'duration': ('videoDuration', {parse_duration}),
'tags': ('tags', ..., {str}, filter, all, filter),
'thumbnail': (('videoBigCover', 'videoCover'), {url_or_none}, {update_url(query=None)}, any),
'uploader': ('nickName', {str}),
'uploader_id': ('uid', {str_or_none}),
'upload_date': ('videoUploadTime', {unified_strdate}),
'view_count': ('videoPlayNum', {int_or_none}),
'comment_count': ('videoCommentNum', {int_or_none}),
'like_count': ('favorCount', {int_or_none}),
}),
})),
}

View File

@ -1,32 +1,66 @@
from .common import InfoExtractor
from ..utils import js_to_json, traverse_obj
from ..utils import (
ExtractorError,
clean_html,
url_or_none,
)
from ..utils.traversal import subs_list_to_dict, traverse_obj
class MonsterSirenHypergryphMusicIE(InfoExtractor):
IE_NAME = 'monstersiren'
IE_DESC = '塞壬唱片'
_API_BASE = 'https://monster-siren.hypergryph.com/api'
_VALID_URL = r'https?://monster-siren\.hypergryph\.com/music/(?P<id>\d+)'
_TESTS = [{
'url': 'https://monster-siren.hypergryph.com/music/514562',
'info_dict': {
'id': '514562',
'ext': 'wav',
'artists': ['塞壬唱片-MSR'],
'album': 'Flame Shadow',
'title': 'Flame Shadow',
'album': 'Flame Shadow',
'artists': ['塞壬唱片-MSR'],
'description': 'md5:19e2acfcd1b65b41b29e8079ab948053',
'thumbnail': r're:https?://web\.hycdn\.cn/siren/pic/.+\.jpg',
},
}, {
'url': 'https://monster-siren.hypergryph.com/music/514518',
'info_dict': {
'id': '514518',
'ext': 'wav',
'title': 'Heavenly Me (Instrumental)',
'album': 'Heavenly Me',
'artists': ['塞壬唱片-MSR', 'AIYUE blessed : 理名'],
'description': 'md5:ce790b41c932d1ad72eb791d1d8ae598',
'thumbnail': r're:https?://web\.hycdn\.cn/siren/pic/.+\.jpg',
},
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
json_data = self._search_json(
r'window\.g_initialProps\s*=', webpage, 'data', audio_id, transform_source=js_to_json)
song = self._download_json(f'{self._API_BASE}/song/{audio_id}', audio_id)
if traverse_obj(song, 'code') != 0:
msg = traverse_obj(song, ('msg', {str}, filter))
raise ExtractorError(
msg or 'API returned an error response', expected=bool(msg))
album = None
if album_id := traverse_obj(song, ('data', 'albumCid', {str})):
album = self._download_json(
f'{self._API_BASE}/album/{album_id}/detail', album_id, fatal=False)
return {
'id': audio_id,
'title': traverse_obj(json_data, ('player', 'songDetail', 'name')),
'url': traverse_obj(json_data, ('player', 'songDetail', 'sourceUrl')),
'ext': 'wav',
'vcodec': 'none',
'artists': traverse_obj(json_data, ('player', 'songDetail', 'artists', ...)),
'album': traverse_obj(json_data, ('musicPlay', 'albumDetail', 'name')),
**traverse_obj(song, ('data', {
'title': ('name', {str}),
'artists': ('artists', ..., {str}),
'subtitles': ({'url': 'lyricUrl'}, all, {subs_list_to_dict(lang='en')}),
'url': ('sourceUrl', {url_or_none}),
})),
**traverse_obj(album, ('data', {
'album': ('name', {str}),
'description': ('intro', {clean_html}),
'thumbnail': ('coverUrl', {url_or_none}),
})),
}

View File

@ -1,408 +0,0 @@
import base64
import itertools
import json
import random
import re
import string
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
jwt_decode_hs256,
parse_age_limit,
try_call,
url_or_none,
)
from ..utils.traversal import traverse_obj
class JioCinemaBaseIE(InfoExtractor):
_NETRC_MACHINE = 'jiocinema'
_GEO_BYPASS = False
_ACCESS_TOKEN = None
_REFRESH_TOKEN = None
_GUEST_TOKEN = None
_USER_ID = None
_DEVICE_ID = None
_API_HEADERS = {'Origin': 'https://www.jiocinema.com', 'Referer': 'https://www.jiocinema.com/'}
_APP_NAME = {'appName': 'RJIL_JioCinema'}
_APP_VERSION = {'appVersion': '5.0.0'}
_API_SIGNATURES = 'o668nxgzwff'
_METADATA_API_BASE = 'https://content-jiovoot.voot.com/psapi'
_ACCESS_HINT = 'the `accessToken` from your browser local storage'
_LOGIN_HINT = (
'Log in with "-u phone -p <PHONE_NUMBER>" to authenticate with OTP, '
f'or use "-u token -p <ACCESS_TOKEN>" to log in with {_ACCESS_HINT}. '
'If you have previously logged in with yt-dlp and your session '
'has been cached, you can use "-u device -p <DEVICE_ID>"')
def _cache_token(self, token_type):
assert token_type in ('access', 'refresh', 'all')
if token_type in ('access', 'all'):
self.cache.store(
JioCinemaBaseIE._NETRC_MACHINE, f'{JioCinemaBaseIE._DEVICE_ID}-access', JioCinemaBaseIE._ACCESS_TOKEN)
if token_type in ('refresh', 'all'):
self.cache.store(
JioCinemaBaseIE._NETRC_MACHINE, f'{JioCinemaBaseIE._DEVICE_ID}-refresh', JioCinemaBaseIE._REFRESH_TOKEN)
def _call_api(self, url, video_id, note='Downloading API JSON', headers={}, data={}):
return self._download_json(
url, video_id, note, data=json.dumps(data, separators=(',', ':')).encode(), headers={
'Content-Type': 'application/json',
'Accept': 'application/json',
**self._API_HEADERS,
**headers,
}, expected_status=(400, 403, 474))
def _call_auth_api(self, service, endpoint, note, headers={}, data={}):
return self._call_api(
f'https://auth-jiocinema.voot.com/{service}service/apis/v4/{endpoint}',
None, note=note, headers=headers, data=data)
def _refresh_token(self):
if not JioCinemaBaseIE._REFRESH_TOKEN or not JioCinemaBaseIE._DEVICE_ID:
raise ExtractorError('User token has expired', expected=True)
response = self._call_auth_api(
'token', 'refreshtoken', 'Refreshing token',
headers={'accesstoken': self._ACCESS_TOKEN}, data={
**self._APP_NAME,
'deviceId': self._DEVICE_ID,
'refreshToken': self._REFRESH_TOKEN,
**self._APP_VERSION,
})
refresh_token = response.get('refreshTokenId')
if refresh_token and refresh_token != JioCinemaBaseIE._REFRESH_TOKEN:
JioCinemaBaseIE._REFRESH_TOKEN = refresh_token
self._cache_token('refresh')
JioCinemaBaseIE._ACCESS_TOKEN = response['authToken']
self._cache_token('access')
def _fetch_guest_token(self):
JioCinemaBaseIE._DEVICE_ID = ''.join(random.choices(string.digits, k=10))
guest_token = self._call_auth_api(
'token', 'guest', 'Downloading guest token', data={
**self._APP_NAME,
'deviceType': 'phone',
'os': 'ios',
'deviceId': self._DEVICE_ID,
'freshLaunch': False,
'adId': self._DEVICE_ID,
**self._APP_VERSION,
})
self._GUEST_TOKEN = guest_token['authToken']
self._USER_ID = guest_token['userId']
def _call_login_api(self, endpoint, guest_token, data, note):
return self._call_auth_api(
'user', f'loginotp/{endpoint}', note, headers={
**self.geo_verification_headers(),
'accesstoken': self._GUEST_TOKEN,
**self._APP_NAME,
**traverse_obj(guest_token, 'data', {
'deviceType': ('deviceType', {str}),
'os': ('os', {str}),
})}, data=data)
def _is_token_expired(self, token):
return (try_call(lambda: jwt_decode_hs256(token)['exp']) or 0) <= int(time.time() - 180)
def _perform_login(self, username, password):
if self._ACCESS_TOKEN and not self._is_token_expired(self._ACCESS_TOKEN):
return
UUID_RE = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
if username.lower() == 'token':
if try_call(lambda: jwt_decode_hs256(password)):
JioCinemaBaseIE._ACCESS_TOKEN = password
refresh_hint = 'the `refreshToken` UUID from your browser local storage'
refresh_token = self._configuration_arg('refresh_token', [''], ie_key=JioCinemaIE)[0]
if not refresh_token:
self.to_screen(
'To extend the life of your login session, in addition to your access token, '
'you can pass --extractor-args "jiocinema:refresh_token=REFRESH_TOKEN" '
f'where REFRESH_TOKEN is {refresh_hint}')
elif re.fullmatch(UUID_RE, refresh_token):
JioCinemaBaseIE._REFRESH_TOKEN = refresh_token
else:
self.report_warning(f'Invalid refresh_token value. Use {refresh_hint}')
else:
raise ExtractorError(
f'The password given could not be decoded as a token; use {self._ACCESS_HINT}', expected=True)
elif username.lower() == 'device' and re.fullmatch(rf'(?:{UUID_RE}|\d+)', password):
JioCinemaBaseIE._REFRESH_TOKEN = self.cache.load(JioCinemaBaseIE._NETRC_MACHINE, f'{password}-refresh')
JioCinemaBaseIE._ACCESS_TOKEN = self.cache.load(JioCinemaBaseIE._NETRC_MACHINE, f'{password}-access')
if not JioCinemaBaseIE._REFRESH_TOKEN or not JioCinemaBaseIE._ACCESS_TOKEN:
raise ExtractorError(f'Failed to load cached tokens for device ID "{password}"', expected=True)
elif username.lower() == 'phone' and re.fullmatch(r'\+?\d+', password):
self._fetch_guest_token()
guest_token = jwt_decode_hs256(self._GUEST_TOKEN)
initial_data = {
'number': base64.b64encode(password.encode()).decode(),
**self._APP_VERSION,
}
response = self._call_login_api('send', guest_token, initial_data, 'Requesting OTP')
if not traverse_obj(response, ('OTPInfo', {dict})):
raise ExtractorError('There was a problem with the phone number login attempt')
is_iphone = guest_token.get('os') == 'ios'
response = self._call_login_api('verify', guest_token, {
'deviceInfo': {
'consumptionDeviceName': 'iPhone' if is_iphone else 'Android',
'info': {
'platform': {'name': 'iPhone OS' if is_iphone else 'Android'},
'androidId': self._DEVICE_ID,
'type': 'iOS' if is_iphone else 'Android',
},
},
**initial_data,
'otp': self._get_tfa_info('the one-time password sent to your phone'),
}, 'Submitting OTP')
if traverse_obj(response, 'code') == 1043:
raise ExtractorError('Wrong OTP', expected=True)
JioCinemaBaseIE._REFRESH_TOKEN = response['refreshToken']
JioCinemaBaseIE._ACCESS_TOKEN = response['authToken']
else:
raise ExtractorError(self._LOGIN_HINT, expected=True)
user_token = jwt_decode_hs256(JioCinemaBaseIE._ACCESS_TOKEN)['data']
JioCinemaBaseIE._USER_ID = user_token['userId']
JioCinemaBaseIE._DEVICE_ID = user_token['deviceId']
if JioCinemaBaseIE._REFRESH_TOKEN and username != 'device':
self._cache_token('all')
if self.get_param('cachedir') is not False:
self.to_screen(
f'NOTE: For subsequent logins you can use "-u device -p {JioCinemaBaseIE._DEVICE_ID}"')
elif not JioCinemaBaseIE._REFRESH_TOKEN:
JioCinemaBaseIE._REFRESH_TOKEN = self.cache.load(
JioCinemaBaseIE._NETRC_MACHINE, f'{JioCinemaBaseIE._DEVICE_ID}-refresh')
if JioCinemaBaseIE._REFRESH_TOKEN:
self._cache_token('access')
self.to_screen(f'Logging in as device ID "{JioCinemaBaseIE._DEVICE_ID}"')
if self._is_token_expired(JioCinemaBaseIE._ACCESS_TOKEN):
self._refresh_token()
class JioCinemaIE(JioCinemaBaseIE):
IE_NAME = 'jiocinema'
_VALID_URL = r'https?://(?:www\.)?jiocinema\.com/?(?:movies?/[^/?#]+/|tv-shows/(?:[^/?#]+/){3})(?P<id>\d{3,})'
_TESTS = [{
'url': 'https://www.jiocinema.com/tv-shows/agnisakshi-ek-samjhauta/1/pradeep-to-stop-the-wedding/3759931',
'info_dict': {
'id': '3759931',
'ext': 'mp4',
'title': 'Pradeep to stop the wedding?',
'description': 'md5:75f72d1d1a66976633345a3de6d672b1',
'episode': 'Pradeep to stop the wedding?',
'episode_number': 89,
'season': 'Agnisakshi…Ek Samjhauta-S1',
'season_number': 1,
'series': 'Agnisakshi Ek Samjhauta',
'duration': 1238.0,
'thumbnail': r're:https?://.+\.jpg',
'age_limit': 13,
'season_id': '3698031',
'upload_date': '20230606',
'timestamp': 1686009600,
'release_date': '20230607',
'genres': ['Drama'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.jiocinema.com/movies/bhediya/3754021/watch',
'info_dict': {
'id': '3754021',
'ext': 'mp4',
'title': 'Bhediya',
'description': 'md5:a6bf2900371ac2fc3f1447401a9f7bb0',
'episode': 'Bhediya',
'duration': 8500.0,
'thumbnail': r're:https?://.+\.jpg',
'age_limit': 13,
'upload_date': '20230525',
'timestamp': 1685026200,
'release_date': '20230524',
'genres': ['Comedy'],
},
'params': {'skip_download': 'm3u8'},
}]
def _extract_formats_and_subtitles(self, playback, video_id):
m3u8_url = traverse_obj(playback, (
'data', 'playbackUrls', lambda _, v: v['streamtype'] == 'hls', 'url', {url_or_none}, any))
if not m3u8_url: # DRM-only content only serves dash urls
self.report_drm(video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, m3u8_id='hls')
self._remove_duplicate_formats(formats)
return {
# '/_definst_/smil:vod/' m3u8 manifests claim to have 720p+ formats but max out at 480p
'formats': traverse_obj(formats, (
lambda _, v: '/_definst_/smil:vod/' not in v['url'] or v['height'] <= 480)),
'subtitles': subtitles,
}
def _real_extract(self, url):
video_id = self._match_id(url)
if not self._ACCESS_TOKEN and self._is_token_expired(self._GUEST_TOKEN):
self._fetch_guest_token()
elif self._ACCESS_TOKEN and self._is_token_expired(self._ACCESS_TOKEN):
self._refresh_token()
playback = self._call_api(
f'https://apis-jiovoot.voot.com/playbackjv/v3/{video_id}', video_id,
'Downloading playback JSON', headers={
**self.geo_verification_headers(),
'accesstoken': self._ACCESS_TOKEN or self._GUEST_TOKEN,
**self._APP_NAME,
'deviceid': self._DEVICE_ID,
'uniqueid': self._USER_ID,
'x-apisignatures': self._API_SIGNATURES,
'x-platform': 'androidweb',
'x-platform-token': 'web',
}, data={
'4k': False,
'ageGroup': '18+',
'appVersion': '3.4.0',
'bitrateProfile': 'xhdpi',
'capability': {
'drmCapability': {
'aesSupport': 'yes',
'fairPlayDrmSupport': 'none',
'playreadyDrmSupport': 'none',
'widevineDRMSupport': 'none',
},
'frameRateCapability': [{
'frameRateSupport': '30fps',
'videoQuality': '1440p',
}],
},
'continueWatchingRequired': False,
'dolby': False,
'downloadRequest': False,
'hevc': False,
'kidsSafe': False,
'manufacturer': 'Windows',
'model': 'Windows',
'multiAudioRequired': True,
'osVersion': '10',
'parentalPinValid': True,
'x-apisignatures': self._API_SIGNATURES,
})
status_code = traverse_obj(playback, ('code', {int}))
if status_code == 474:
self.raise_geo_restricted(countries=['IN'])
elif status_code == 1008:
error_msg = 'This content is only available for premium users'
if self._ACCESS_TOKEN:
raise ExtractorError(error_msg, expected=True)
self.raise_login_required(f'{error_msg}. {self._LOGIN_HINT}', method=None)
elif status_code == 400:
raise ExtractorError('The requested content is not available', expected=True)
elif status_code is not None and status_code != 200:
raise ExtractorError(
f'JioCinema says: {traverse_obj(playback, ("message", {str})) or status_code}')
metadata = self._download_json(
f'{self._METADATA_API_BASE}/voot/v1/voot-web/content/query/asset-details',
video_id, fatal=False, query={
'ids': f'include:{video_id}',
'responseType': 'common',
'devicePlatformType': 'desktop',
})
return {
'id': video_id,
'http_headers': self._API_HEADERS,
**self._extract_formats_and_subtitles(playback, video_id),
**traverse_obj(playback, ('data', {
# fallback metadata
'title': ('name', {str}),
'description': ('fullSynopsis', {str}),
'series': ('show', 'name', {str}, filter),
'season': ('tournamentName', {str}, {lambda x: x if x != 'Season 0' else None}),
'season_number': ('episode', 'season', {int_or_none}, filter),
'episode': ('fullTitle', {str}),
'episode_number': ('episode', 'episodeNo', {int_or_none}, filter),
'age_limit': ('ageNemonic', {parse_age_limit}),
'duration': ('totalDuration', {float_or_none}),
'thumbnail': ('images', {url_or_none}),
})),
**traverse_obj(metadata, ('result', 0, {
'title': ('fullTitle', {str}),
'description': ('fullSynopsis', {str}),
'series': ('showName', {str}, filter),
'season': ('seasonName', {str}, filter),
'season_number': ('season', {int_or_none}),
'season_id': ('seasonId', {str}, filter),
'episode': ('fullTitle', {str}),
'episode_number': ('episode', {int_or_none}),
'timestamp': ('uploadTime', {int_or_none}),
'release_date': ('telecastDate', {str}),
'age_limit': ('ageNemonic', {parse_age_limit}),
'duration': ('duration', {float_or_none}),
'genres': ('genres', ..., {str}),
'thumbnail': ('seo', 'ogImage', {url_or_none}),
})),
}
class JioCinemaSeriesIE(JioCinemaBaseIE):
IE_NAME = 'jiocinema:series'
_VALID_URL = r'https?://(?:www\.)?jiocinema\.com/tv-shows/(?P<slug>[\w-]+)/(?P<id>\d{3,})'
_TESTS = [{
'url': 'https://www.jiocinema.com/tv-shows/naagin/3499917',
'info_dict': {
'id': '3499917',
'title': 'naagin',
},
'playlist_mincount': 120,
}, {
'url': 'https://www.jiocinema.com/tv-shows/mtv-splitsvilla-x5/3499820',
'info_dict': {
'id': '3499820',
'title': 'mtv-splitsvilla-x5',
},
'playlist_mincount': 310,
}]
def _entries(self, series_id):
seasons = traverse_obj(self._download_json(
f'{self._METADATA_API_BASE}/voot/v1/voot-web/view/show/{series_id}', series_id,
'Downloading series metadata JSON', query={'responseType': 'common'}), (
'trays', lambda _, v: v['trayId'] == 'season-by-show-multifilter',
'trayTabs', lambda _, v: v['id']))
for season_num, season in enumerate(seasons, start=1):
season_id = season['id']
label = season.get('label') or season_num
for page_num in itertools.count(1):
episodes = traverse_obj(self._download_json(
f'{self._METADATA_API_BASE}/voot/v1/voot-web/content/generic/series-wise-episode',
season_id, f'Downloading season {label} page {page_num} JSON', query={
'sort': 'episode:asc',
'id': season_id,
'responseType': 'common',
'page': page_num,
}), ('result', lambda _, v: v['id'] and url_or_none(v['slug'])))
if not episodes:
break
for episode in episodes:
yield self.url_result(
episode['slug'], JioCinemaIE, **traverse_obj(episode, {
'video_id': 'id',
'video_title': ('fullTitle', {str}),
'season_number': ('season', {int_or_none}),
'episode_number': ('episode', {int_or_none}),
}))
def _real_extract(self, url):
slug, series_id = self._match_valid_url(url).group('slug', 'id')
return self.playlist_result(self._entries(series_id), series_id, slug)

View File

@ -1,112 +0,0 @@
import datetime as dt
import urllib.parse
from .common import InfoExtractor
from ..utils import (
clean_html,
datetime_from_str,
unified_timestamp,
urljoin,
)
class JoqrAgIE(InfoExtractor):
IE_DESC = '超!A&G+ 文化放送 (f.k.a. AGQR) Nippon Cultural Broadcasting, Inc. (JOQR)'
_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/(?:player|inc-player-hls)\.php',
r'https?://(?:www\.)?joqr\.co\.jp/ag/',
r'https?://(?:www\.)?joqr\.co\.jp/qr/ag(?:daily|regular)program/?(?:$|[#?])']
_TESTS = [{
'url': 'https://www.uniqueradio.jp/agplayer5/player.php',
'info_dict': {
'id': 'live',
'title': str,
'channel': '超!A&G+',
'description': str,
'live_status': 'is_live',
'release_timestamp': int,
},
'params': {
'skip_download': True,
'ignore_no_formats_error': True,
},
}, {
'url': 'https://www.uniqueradio.jp/agplayer5/inc-player-hls.php',
'only_matching': True,
}, {
'url': 'https://www.joqr.co.jp/ag/article/103760/',
'only_matching': True,
}, {
'url': 'http://www.joqr.co.jp/qr/agdailyprogram/',
'only_matching': True,
}, {
'url': 'http://www.joqr.co.jp/qr/agregularprogram/',
'only_matching': True,
}]
def _extract_metadata(self, variable, html):
return clean_html(urllib.parse.unquote_plus(self._search_regex(
rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
html, 'metadata', group='value', default=''))) or None
def _extract_start_timestamp(self, video_id, is_live):
def extract_start_time_from(date_str):
dt_ = datetime_from_str(date_str) + dt.timedelta(hours=9)
date = dt_.strftime('%Y%m%d')
start_time = self._search_regex(
r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+\s*(\d{1,2}:\d{1,2})',
self._download_webpage(
f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,
note=f'Downloading program list of {date}', fatal=False,
errnote=f'Failed to download program list of {date}') or '',
'start time', default=None)
if start_time:
return unified_timestamp(f'{dt_.strftime("%Y/%m/%d")} {start_time} +09:00')
return None
start_timestamp = extract_start_time_from('today')
if not start_timestamp:
return None
if not is_live or start_timestamp < datetime_from_str('now').timestamp():
return start_timestamp
else:
return extract_start_time_from('yesterday')
def _real_extract(self, url):
video_id = 'live'
metadata = self._download_webpage(
'https://www.uniqueradio.jp/aandg', video_id,
note='Downloading metadata', errnote='Failed to download metadata')
title = self._extract_metadata('Program_name', metadata)
if not title or title == '放送休止':
formats = []
live_status = 'is_upcoming'
release_timestamp = self._extract_start_timestamp(video_id, False)
msg = 'This stream is not currently live'
if release_timestamp:
msg += (' and will start at '
+ dt.datetime.fromtimestamp(release_timestamp).strftime('%Y-%m-%d %H:%M:%S'))
self.raise_no_formats(msg, expected=True)
else:
m3u8_path = self._search_regex(
r'<source\s[^>]*\bsrc="([^"]+)"',
self._download_webpage(
'https://www.uniqueradio.jp/agplayer5/inc-player-hls.php', video_id,
note='Downloading player data', errnote='Failed to download player data'),
'm3u8 url')
formats = self._extract_m3u8_formats(
urljoin('https://www.uniqueradio.jp/', m3u8_path), video_id)
live_status = 'is_live'
release_timestamp = self._extract_start_timestamp(video_id, True)
return {
'id': video_id,
'title': title,
'channel': '超!A&G+',
'description': self._extract_metadata('Program_text', metadata),
'formats': formats,
'live_status': live_status,
'release_timestamp': release_timestamp,
}

View File

@ -1,12 +1,12 @@
import functools
import urllib.parse
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
UserNotLive,
determine_ext,
float_or_none,
int_or_none,
merge_dicts,
parse_iso8601,
str_or_none,
traverse_obj,
@ -16,21 +16,17 @@ from ..utils import (
class KickBaseIE(InfoExtractor):
def _real_initialize(self):
self._request_webpage(
HEADRequest('https://kick.com/'), None, 'Setting up session', fatal=False, impersonate=True)
xsrf_token = self._get_cookies('https://kick.com/').get('XSRF-TOKEN')
if not xsrf_token:
self.write_debug('kick.com did not set XSRF-TOKEN cookie')
KickBaseIE._API_HEADERS = {
'Authorization': f'Bearer {xsrf_token.value}',
'X-XSRF-TOKEN': xsrf_token.value,
} if xsrf_token else {}
@functools.cached_property
def _api_headers(self):
token = traverse_obj(
self._get_cookies('https://kick.com/'),
('session_token', 'value', {urllib.parse.unquote}))
return {'Authorization': f'Bearer {token}'} if token else {}
def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs):
return self._download_json(
f'https://kick.com/api/{path}', display_id, note=note,
headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs)
headers={**self._api_headers, **headers}, impersonate=True, **kwargs)
class KickIE(KickBaseIE):

View File

@ -1,358 +0,0 @@
import re
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
determine_ext,
float_or_none,
int_or_none,
smuggle_url,
try_get,
unsmuggle_url,
)
class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
@classmethod
def _extract_embed_urls(cls, url, webpage):
lm = {
'Media': 'media',
'Channel': 'channel',
'ChannelList': 'channel_list',
}
def smuggle(url):
return smuggle_url(url, {'source_url': url})
entries = []
for kind, video_id in re.findall(
r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})',
webpage):
entries.append(cls.url_result(
smuggle(f'limelight:{lm[kind]}:{video_id}'),
f'Limelight{kind}', video_id))
for mobj in re.finditer(
# As per [1] class attribute should be exactly equal to
# LimelightEmbeddedPlayerFlash but numerous examples seen
# that don't exactly match it (e.g. [2]).
# 1. http://support.3playmedia.com/hc/en-us/articles/227732408-Limelight-Embedding-the-Captions-Plugin-with-the-Limelight-Player-on-Your-Webpage
# 2. http://www.sedona.com/FacilitatorTraining2017
r'''(?sx)
<object[^>]+class=(["\'])(?:(?!\1).)*\bLimelightEmbeddedPlayerFlash\b(?:(?!\1).)*\1[^>]*>.*?
<param[^>]+
name=(["\'])flashVars\2[^>]+
value=(["\'])(?:(?!\3).)*(?P<kind>media|channel(?:List)?)Id=(?P<id>[a-z0-9]{32})
''', webpage):
kind, video_id = mobj.group('kind'), mobj.group('id')
entries.append(cls.url_result(
smuggle(f'limelight:{kind}:{video_id}'),
f'Limelight{kind.capitalize()}', video_id))
# http://support.3playmedia.com/hc/en-us/articles/115009517327-Limelight-Embedding-the-Audio-Description-Plugin-with-the-Limelight-Player-on-Your-Web-Page)
for video_id in re.findall(
r'(?s)LimelightPlayerUtil\.embed\s*\(\s*{.*?\bmediaId["\']\s*:\s*["\'](?P<id>[a-z0-9]{32})',
webpage):
entries.append(cls.url_result(
smuggle(f'limelight:media:{video_id}'),
LimelightMediaIE.ie_key(), video_id))
return entries
def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
headers = {}
if referer:
headers['Referer'] = referer
try:
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, f'Downloading PlaylistService {method} JSON',
fatal=fatal, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 403:
error = self._parse_json(e.cause.response.read().decode(), item_id)['detail']['contentAccessPermission']
if error == 'CountryDisabled':
self.raise_geo_restricted()
raise ExtractorError(error, expected=True)
raise
def _extract(self, item_id, pc_method, mobile_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer)
mobile = self._call_playlist_service(
item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile
def _extract_info(self, pc, mobile, i, referer):
get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
pc_item = get_item(pc, 'playlistItems')
mobile_item = get_item(mobile, 'mediaList')
video_id = pc_item.get('mediaId') or mobile_item['mediaId']
title = pc_item.get('title') or mobile_item['title']
formats = []
urls = []
for stream in pc_item.get('streams', []):
stream_url = stream.get('url')
if not stream_url or stream_url in urls:
continue
if not self.get_param('allow_unplayable_formats') and stream.get('drmProtected'):
continue
urls.append(stream_url)
ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id='hds', fatal=False))
else:
fmt = {
'url': stream_url,
'abr': float_or_none(stream.get('audioBitRate')),
'fps': float_or_none(stream.get('videoFrameRate')),
'ext': ext,
}
width = int_or_none(stream.get('videoWidthInPixels'))
height = int_or_none(stream.get('videoHeightInPixels'))
vbr = float_or_none(stream.get('videoBitRate'))
if width or height or vbr:
fmt.update({
'width': width,
'height': height,
'vbr': vbr,
})
else:
fmt['vcodec'] = 'none'
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', stream_url)
if rtmp:
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_format_id = format_id.replace('rtmp', 'http')
CDN_HOSTS = (
('delvenetworks.com', 'cpl.delvenetworks.com'),
('video.llnw.net', 's2.content.video.llnw.net'),
)
for cdn_host, http_host in CDN_HOSTS:
if cdn_host not in rtmp.group('host').lower():
continue
http_url = 'http://{}/{}'.format(http_host, rtmp.group('playpath')[4:])
urls.append(http_url)
if self._is_valid_url(http_url, video_id, http_format_id):
http_fmt = fmt.copy()
http_fmt.update({
'url': http_url,
'format_id': http_format_id,
})
formats.append(http_fmt)
break
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': format_id,
})
formats.append(fmt)
for mobile_url in mobile_item.get('mobileUrls', []):
media_url = mobile_url.get('mobileUrl')
format_id = mobile_url.get('targetMediaPlatform')
if not media_url or media_url in urls:
continue
if (format_id in ('Widevine', 'SmoothStreaming')
and not self.get_param('allow_unplayable_formats', False)):
continue
urls.append(media_url)
ext = determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': media_url,
'format_id': format_id,
'quality': -10,
'ext': ext,
})
subtitles = {}
for flag in mobile_item.get('flags'):
if flag == 'ClosedCaptions':
closed_captions = self._call_playlist_service(
video_id, 'getClosedCaptionsDetailsByMediaId',
False, referer) or []
for cc in closed_captions:
cc_url = cc.get('webvttFileUrl')
if not cc_url:
continue
lang = cc.get('languageCode') or self._search_regex(r'/([a-z]{2})\.vtt', cc_url, 'lang', default='en')
subtitles.setdefault(lang, []).append({
'url': cc_url,
})
break
get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
return {
'id': video_id,
'title': title,
'description': get_meta('description'),
'formats': formats,
'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
'subtitles': subtitles,
}
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
_VALID_URL = r'''(?x)
(?:
limelight:media:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bmediaId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': {
'id': '3ffd040b522b4485b6d84effc750cd86',
'ext': 'mp4',
'title': 'HaP and the HB Prince Trailer',
'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 144.23,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# video with subtitles
'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
'md5': '2fa3bad9ac321e23860ca23bc2c69e3d',
'info_dict': {
'id': 'a3e00274d4564ec4a9b29b9466432335',
'ext': 'mp4',
'title': '3Play Media Overview Video',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 78.101,
# TODO: extract all languages that were accessible via API
# 'subtitles': 'mincount:9',
'subtitles': 'mincount:1',
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'media'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
pc, mobile = self._extract(
video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', source_url)
return self._extract_info(pc, mobile, 0, source_url)
class LimelightChannelIE(LimelightBaseIE):
IE_NAME = 'limelight:channel'
_VALID_URL = r'''(?x)
(?:
limelight:channel:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code',
'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
},
'playlist_mincount': 3,
}, {
'url': 'http://assets.delvenetworks.com/player/loader.swf?channelId=ab6a524c379342f9b23642917020c082',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
pc, mobile = self._extract(
channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
source_url)
entries = [
self._extract_info(pc, mobile, i, source_url)
for i in range(len(pc['playlistItems']))]
return self.playlist_result(
entries, channel_id, pc.get('title'), mobile.get('description'))
class LimelightChannelListIE(LimelightBaseIE):
IE_NAME = 'limelight:channel_list'
_VALID_URL = r'''(?x)
(?:
limelight:channel_list:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelListId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
'info_dict': {
'id': '301b117890c4465c8179ede21fd92e2b',
'title': 'Website - Hero Player',
},
'playlist_mincount': 2,
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?channelListId=301b117890c4465c8179ede21fd92e2b',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel_list'
def _real_extract(self, url):
channel_list_id = self._match_id(url)
channel_list = self._call_playlist_service(
channel_list_id, 'getMobileChannelListById')
entries = [
self.url_result('limelight:channel:{}'.format(channel['id']), 'LimelightChannel')
for channel in channel_list['channelList']]
return self.playlist_result(
entries, channel_list_id, channel_list['title'])

View File

@ -134,7 +134,7 @@ class LRTRadioIE(LRTBaseIE):
def _real_extract(self, url):
video_id, path = self._match_valid_url(url).group('id', 'path')
media = self._download_json(
'https://www.lrt.lt/radioteka/api/media', video_id,
'https://www.lrt.lt/rest-api/media', video_id,
query={'url': f'/mediateka/irasas/{video_id}/{path}'})
return {

View File

@ -167,11 +167,11 @@ class LSMLTVEmbedIE(InfoExtractor):
'duration': 1442,
'upload_date': '20231121',
'title': 'D23-6000-105_cetstud',
'thumbnail': 'https://store.cloudycdn.services/tmsp00060/assets/media/660858/placeholder1700589200.jpg',
'thumbnail': 'https://store.bstrm.net/tmsp00060/assets/media/660858/placeholder1700589200.jpg',
},
}, {
'url': 'https://ltv.lsm.lv/embed?enablesdkjs=1&c=eyJpdiI6IncwVzZmUFk2MU12enVWK1I3SUcwQ1E9PSIsInZhbHVlIjoid3FhV29vamc3T2sxL1RaRmJ5Rm1GTXozU0o2dVczdUtLK0cwZEZJMDQ2a3ZIRG5DK2pneGlnbktBQy9uazVleHN6VXhxdWIweWNvcHRDSnlISlNYOHlVZ1lpcTUrcWZSTUZPQW14TVdkMW9aOUtRWVNDcFF4eWpHNGcrT0VZbUNFQStKQk91cGpndW9FVjJIa0lpbkh3PT0iLCJtYWMiOiIyZGI1NDJlMWRlM2QyMGNhOGEwYTM2MmNlN2JlOGRhY2QyYjdkMmEzN2RlOTEzYTVkNzI1ODlhZDlhZjU4MjQ2IiwidGFnIjoiIn0=',
'md5': 'a1711e190fe680fdb68fd8413b378e87',
'md5': 'f236cef2fd5953612754e4e66be51e7a',
'info_dict': {
'id': 'wUnFArIPDSY',
'ext': 'mp4',
@ -198,6 +198,8 @@ class LSMLTVEmbedIE(InfoExtractor):
'uploader_url': 'https://www.youtube.com/@LTV16plus',
'like_count': int,
'description': 'md5:7ff0c42ba971e3c13e4b8a2ff03b70b5',
'media_type': 'livestream',
'timestamp': 1652550741,
},
}]
@ -208,7 +210,7 @@ class LSMLTVEmbedIE(InfoExtractor):
r'window\.ltvEmbedPayload\s*=', webpage, 'embed json', video_id)
embed_type = traverse_obj(data, ('source', 'name', {str}))
if embed_type == 'telia':
if embed_type in ('backscreen', 'telia'): # 'telia' only for backwards compat
ie_key = 'CloudyCDN'
embed_url = traverse_obj(data, ('source', 'embed_url', {url_or_none}))
elif embed_type == 'youtube':
@ -226,9 +228,9 @@ class LSMLTVEmbedIE(InfoExtractor):
class LSMReplayIE(InfoExtractor):
_VALID_URL = r'https?://replay\.lsm\.lv/[^/?#]+/(?:ieraksts|statja)/[^/?#]+/(?P<id>\d+)'
_VALID_URL = r'https?://replay\.lsm\.lv/[^/?#]+/(?:skaties/|klausies/)?(?:ieraksts|statja)/[^/?#]+/(?P<id>\d+)'
_TESTS = [{
'url': 'https://replay.lsm.lv/lv/ieraksts/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija',
'url': 'https://replay.lsm.lv/lv/skaties/ieraksts/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija',
'md5': '64f72a360ca530d5ed89c77646c9eee5',
'info_dict': {
'id': '46k_d23-6000-105',
@ -241,20 +243,23 @@ class LSMReplayIE(InfoExtractor):
'thumbnail': 'https://ltv.lsm.lv/storage/media/8/7/large/5/1f9604e1.jpg',
},
}, {
'url': 'https://replay.lsm.lv/lv/ieraksts/lr/183522/138-nepilniga-kompensejamo-zalu-sistema-pat-menesiem-dzena-pacientus-pa-aptiekam',
'md5': '719b33875cd1429846eeeaeec6df2830',
'url': 'https://replay.lsm.lv/lv/klausies/ieraksts/lr/183522/138-nepilniga-kompensejamo-zalu-sistema-pat-menesiem-dzena-pacientus-pa-aptiekam',
'md5': '84feb80fd7e6ec07744726a9f01cda4d',
'info_dict': {
'id': 'a342781',
'ext': 'mp3',
'id': '183522',
'ext': 'm4a',
'duration': 1823,
'title': '#138 Nepilnīgā kompensējamo zāļu sistēma pat mēnešiem dzenā pacientus pa aptiekām',
'thumbnail': 'https://pic.latvijasradio.lv/public/assets/media/9/d/large_fd4675ac.jpg',
'upload_date': '20231102',
'timestamp': 1698921060,
'timestamp': 1698913860,
'description': 'md5:7bac3b2dd41e44325032943251c357b1',
},
}, {
'url': 'https://replay.lsm.lv/ru/statja/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija',
'url': 'https://replay.lsm.lv/ru/skaties/statja/ltv/355067/v-kengaragse-nacalas-ukladka-relsov',
'only_matching': True,
}, {
'url': 'https://replay.lsm.lv/lv/ieraksts/ltv/311130/4-studija-zolitudes-tragedija-un-incupes-stacija',
'only_matching': True,
}]
@ -267,12 +272,24 @@ class LSMReplayIE(InfoExtractor):
data = self._search_nuxt_data(
self._fix_nuxt_data(webpage), video_id, context_name='__REPLAY__')
playback_type = data['playback']['type']
if playback_type == 'playable_audio_lr':
playback_data = {
'formats': self._extract_m3u8_formats(data['playback']['service']['hls_url'], video_id),
}
elif playback_type == 'embed':
playback_data = {
'_type': 'url_transparent',
'url': data['playback']['service']['url'],
}
else:
raise ExtractorError(f'Unsupported playback type "{playback_type}"')
return {
'_type': 'url_transparent',
'id': video_id,
**playback_data,
**traverse_obj(data, {
'url': ('playback', 'service', 'url', {url_or_none}),
'title': ('mediaItem', 'title'),
'description': ('mediaItem', ('lead', 'body')),
'duration': ('mediaItem', 'duration', {int_or_none}),

107
yt_dlp/extractor/mave.py Normal file
View File

@ -0,0 +1,107 @@
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
parse_iso8601,
urljoin,
)
from ..utils.traversal import require, traverse_obj
class MaveIE(InfoExtractor):
_VALID_URL = r'https?://(?P<channel>[\w-]+)\.mave\.digital/(?P<id>ep-\d+)'
_TESTS = [{
'url': 'https://ochenlichnoe.mave.digital/ep-25',
'md5': 'aa3e513ef588b4366df1520657cbc10c',
'info_dict': {
'id': '4035f587-914b-44b6-aa5a-d76685ad9bc2',
'ext': 'mp3',
'display_id': 'ochenlichnoe-ep-25',
'title': 'Между мной и миром: психология самооценки',
'description': 'md5:4b7463baaccb6982f326bce5c700382a',
'uploader': 'Самарский университет',
'channel': 'Очень личное',
'channel_id': 'ochenlichnoe',
'channel_url': 'https://ochenlichnoe.mave.digital/',
'view_count': int,
'like_count': int,
'dislike_count': int,
'duration': 3744,
'thumbnail': r're:https://.+/storage/podcasts/.+\.jpg',
'series': 'Очень личное',
'series_id': '2e0c3749-6df2-4946-82f4-50691419c065',
'season': 'Season 3',
'season_number': 3,
'episode': 'Episode 3',
'episode_number': 3,
'timestamp': 1747817300,
'upload_date': '20250521',
},
}, {
'url': 'https://budem.mave.digital/ep-12',
'md5': 'e1ce2780fcdb6f17821aa3ca3e8c919f',
'info_dict': {
'id': '41898bb5-ff57-4797-9236-37a8e537aa21',
'ext': 'mp3',
'display_id': 'budem-ep-12',
'title': 'Екатерина Михайлова: "Горе от ума" не про женщин написана',
'description': 'md5:fa3bdd59ee829dfaf16e3efcb13f1d19',
'uploader': 'Полина Цветкова+Евгения Акопова',
'channel': 'Все там будем',
'channel_id': 'budem',
'channel_url': 'https://budem.mave.digital/',
'view_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 18,
'duration': 3664,
'thumbnail': r're:https://.+/storage/podcasts/.+\.jpg',
'series': 'Все там будем',
'series_id': 'fe9347bf-c009-4ebd-87e8-b06f2f324746',
'season': 'Season 2',
'season_number': 2,
'episode': 'Episode 5',
'episode_number': 5,
'timestamp': 1735538400,
'upload_date': '20241230',
},
}]
_API_BASE_URL = 'https://api.mave.digital/'
def _real_extract(self, url):
channel_id, slug = self._match_valid_url(url).group('channel', 'id')
display_id = f'{channel_id}-{slug}'
webpage = self._download_webpage(url, display_id)
data = traverse_obj(
self._search_nuxt_json(webpage, display_id),
('data', lambda _, v: v['activeEpisodeData'], any, {require('podcast data')}))
return {
'display_id': display_id,
'channel_id': channel_id,
'channel_url': f'https://{channel_id}.mave.digital/',
'vcodec': 'none',
'thumbnail': re.sub(r'_\d+(?=\.(?:jpg|png))', '', self._og_search_thumbnail(webpage, default='')) or None,
**traverse_obj(data, ('activeEpisodeData', {
'url': ('audio', {urljoin(self._API_BASE_URL)}),
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {clean_html}),
'duration': ('duration', {int_or_none}),
'season_number': ('season', {int_or_none}),
'episode_number': ('number', {int_or_none}),
'view_count': ('listenings', {int_or_none}),
'like_count': ('reactions', lambda _, v: v['type'] == 'like', 'count', {int_or_none}, any),
'dislike_count': ('reactions', lambda _, v: v['type'] == 'dislike', 'count', {int_or_none}, any),
'age_limit': ('is_explicit', {bool}, {lambda x: 18 if x else None}),
'timestamp': ('publish_date', {parse_iso8601}),
})),
**traverse_obj(data, ('podcast', 'podcast', {
'series_id': ('id', {str}),
'series': ('title', {str}),
'channel': ('title', {str}),
'uploader': ('author', {str}),
})),
}

View File

@ -0,0 +1,37 @@
from .common import InfoExtractor
from ..utils import parse_qs, url_or_none
from ..utils.traversal import require, traverse_obj
class Mir24TvIE(InfoExtractor):
IE_NAME = 'mir24.tv'
_VALID_URL = r'https?://(?:www\.)?mir24\.tv/news/(?P<id>[0-9]+)/[^/?#]+'
_TESTS = [{
'url': 'https://mir24.tv/news/16635210/dni-kultury-rossii-otkrylis-v-uzbekistane.-na-prazdnichnom-koncerte-vystupili-zvezdy-rossijskoj-estrada',
'info_dict': {
'id': '16635210',
'title': 'Дни культуры России открылись в Узбекистане. На праздничном концерте выступили звезды российской эстрады',
'ext': 'mp4',
'thumbnail': r're:https://images\.mir24\.tv/.+\.jpg',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, impersonate=True)
iframe_url = self._search_regex(
r'<iframe\b[^>]+\bsrc=["\'](https?://mir24\.tv/players/[^"\']+)',
webpage, 'iframe URL')
m3u8_url = traverse_obj(iframe_url, (
{parse_qs}, 'source', -1, {self._proto_relative_url}, {url_or_none}, {require('m3u8 URL')}))
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4', m3u8_id='hls')
return {
'id': video_id,
'title': self._og_search_title(webpage, default=None) or self._html_extract_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': formats,
'subtitles': subtitles,
}

134
yt_dlp/extractor/mixlr.py Normal file
View File

@ -0,0 +1,134 @@
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import int_or_none, parse_iso8601, url_or_none, urlhandle_detect_ext
from ..utils.traversal import traverse_obj
class MixlrIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<username>[\w-]+)\.mixlr\.com/events/(?P<id>\d+)'
_TESTS = [{
'url': 'https://suncity-104-9fm.mixlr.com/events/4387115',
'info_dict': {
'id': '4387115',
'ext': 'mp3',
'title': r're:SUNCITY 104.9FM\'s live audio \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'uploader': 'suncity-104-9fm',
'like_count': int,
'thumbnail': r're:https://imagecdn\.mixlr\.com/cdn-cgi/image/[^/?#]+/cd5b34d05fa2cee72d80477724a2f02e.png',
'timestamp': 1751943773,
'upload_date': '20250708',
'release_timestamp': 1751943764,
'release_date': '20250708',
'live_status': 'is_live',
},
}, {
'url': 'https://brcountdown.mixlr.com/events/4395480',
'info_dict': {
'id': '4395480',
'ext': 'aac',
'title': r're:Beats Revolution Countdown Episodio 461 \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'description': 'md5:5cacd089723f7add3f266bd588315bb3',
'uploader': 'brcountdown',
'like_count': int,
'thumbnail': r're:https://imagecdn\.mixlr\.com/cdn-cgi/image/[^/?#]+/c48727a59f690b87a55d47d123ba0d6d.jpg',
'timestamp': 1752354007,
'upload_date': '20250712',
'release_timestamp': 1752354000,
'release_date': '20250712',
'live_status': 'is_live',
},
}, {
'url': 'https://www.brcountdown.mixlr.com/events/4395480',
'only_matching': True,
}]
def _real_extract(self, url):
username, event_id = self._match_valid_url(url).group('username', 'id')
broadcast_info = self._download_json(
f'https://api.mixlr.com/v3/channels/{username}/events/{event_id}', event_id)
formats = []
format_url = traverse_obj(
broadcast_info, ('included', 0, 'attributes', 'progressive_stream_url', {url_or_none}))
if format_url:
urlh = self._request_webpage(
HEADRequest(format_url), event_id, fatal=False, note='Checking stream')
if urlh and urlh.status == 200:
ext = urlhandle_detect_ext(urlh)
if ext == 'octet-stream':
self.report_warning(
'The server did not return a valid file extension for the stream URL. '
'Assuming an mp3 stream; postprocessing may fail if this is incorrect')
ext = 'mp3'
formats.append({
'url': format_url,
'ext': ext,
'vcodec': 'none',
})
release_timestamp = traverse_obj(
broadcast_info, ('data', 'attributes', 'starts_at', {str}))
if not formats and release_timestamp:
self.raise_no_formats(f'This event will start at {release_timestamp}', expected=True)
return {
'id': event_id,
'uploader': username,
'formats': formats,
'release_timestamp': parse_iso8601(release_timestamp),
**traverse_obj(broadcast_info, ('included', 0, 'attributes', {
'title': ('title', {str}),
'timestamp': ('started_at', {parse_iso8601}),
'concurrent_view_count': ('concurrent_view_count', {int_or_none}),
'like_count': ('heart_count', {int_or_none}),
'is_live': ('live', {bool}),
})),
**traverse_obj(broadcast_info, ('data', 'attributes', {
'title': ('title', {str}),
'description': ('description', {str}),
'timestamp': ('started_at', {parse_iso8601}),
'concurrent_view_count': ('concurrent_view_count', {int_or_none}),
'like_count': ('heart_count', {int_or_none}),
'thumbnail': ('artwork_url', {url_or_none}),
'uploader_id': ('broadcaster_id', {str}),
})),
}
class MixlrRecoringIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<username>[\w-]+)\.mixlr\.com/recordings/(?P<id>\d+)'
_TESTS = [{
'url': 'https://biblewayng.mixlr.com/recordings/2375193',
'info_dict': {
'id': '2375193',
'ext': 'mp3',
'title': "God's Jewels and Their Resting Place Bro. Adeniji",
'description': 'Preached February 21, 2024 in the evening',
'uploader_id': '8659190',
'duration': 10968,
'thumbnail': r're:https://imagecdn\.mixlr\.com/cdn-cgi/image/[^/?#]+/ceca120ef707f642abeea6e29cd74238.jpg',
'timestamp': 1708544542,
'upload_date': '20240221',
},
}]
def _real_extract(self, url):
username, recording_id = self._match_valid_url(url).group('username', 'id')
recording_info = self._download_json(
f'https://api.mixlr.com/v3/channels/{username}/recordings/{recording_id}', recording_id)
return {
'id': recording_id,
**traverse_obj(recording_info, ('data', 'attributes', {
'ext': ('file_format', {str}),
'url': ('url', {url_or_none}),
'title': ('title', {str}),
'description': ('description', {str}),
'timestamp': ('created_at', {parse_iso8601}),
'duration': ('duration', {int_or_none}),
'thumbnail': ('artwork_url', {url_or_none}),
'uploader_id': ('user_id', {str}),
})),
}

View File

@ -1,53 +1,72 @@
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
clean_html,
parse_iso8601,
parse_qs,
url_or_none,
)
from ..utils.traversal import require, traverse_obj
class NewsPicksIE(InfoExtractor):
_VALID_URL = r'https?://newspicks\.com/movie-series/(?P<channel_id>\d+)\?movieId=(?P<id>\d+)'
_VALID_URL = r'https?://newspicks\.com/movie-series/(?P<id>[^?/#]+)'
_TESTS = [{
'url': 'https://newspicks.com/movie-series/11?movieId=1813',
'url': 'https://newspicks.com/movie-series/11/?movieId=1813',
'info_dict': {
'id': '1813',
'title': '日本の課題を破壊せよ【ゲスト:成田悠輔】',
'description': 'md5:09397aad46d6ded6487ff13f138acadf',
'channel': 'HORIE ONE',
'channel_id': '11',
'release_date': '20220117',
'thumbnail': r're:https://.+jpg',
'ext': 'mp4',
'title': '日本の課題を破壊せよ【ゲスト:成田悠輔】',
'cast': 'count:4',
'description': 'md5:09397aad46d6ded6487ff13f138acadf',
'duration': 2940,
'release_date': '20220117',
'release_timestamp': 1642424400,
'series': 'HORIE ONE',
'series_id': '11',
'thumbnail': r're:https?://resources\.newspicks\.com/.+\.(?:jpe?g|png)',
'timestamp': 1642424420,
'upload_date': '20220117',
},
}, {
'url': 'https://newspicks.com/movie-series/158/?movieId=3932',
'info_dict': {
'id': '3932',
'ext': 'mp4',
'title': '【検証】専門家は、KADOKAWAをどう見るか',
'cast': 'count:3',
'description': 'md5:2c2d4bf77484a4333ec995d676f9a91d',
'duration': 1320,
'release_date': '20240622',
'release_timestamp': 1719088080,
'series': 'NPレポート',
'series_id': '158',
'thumbnail': r're:https?://resources\.newspicks\.com/.+\.(?:jpe?g|png)',
'timestamp': 1719086400,
'upload_date': '20240622',
},
}]
def _real_extract(self, url):
video_id, channel_id = self._match_valid_url(url).group('id', 'channel_id')
series_id = self._match_id(url)
video_id = traverse_obj(parse_qs(url), ('movieId', -1, {str}, {require('movie ID')}))
webpage = self._download_webpage(url, video_id)
entries = self._parse_html5_media_entries(
url, webpage.replace('movie-for-pc', 'movie'), video_id, 'hls')
if not entries:
raise ExtractorError('No HTML5 media elements found')
info = entries[0]
title = self._html_search_meta('og:title', webpage, fatal=False)
description = self._html_search_meta(
('og:description', 'twitter:title'), webpage, fatal=False)
channel = self._html_search_regex(
r'value="11".+?<div\s+class="title">(.+?)</div', webpage, 'channel name', fatal=False)
if not title or not channel:
title, channel = re.split(r'\s*|\s*', self._html_extract_title(webpage))
fragment = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['fragment']
m3u8_url = traverse_obj(fragment, ('movie', 'movieUrl', {url_or_none}, {require('m3u8 URL')}))
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, 'mp4')
release_date = self._search_regex(
r'<span\s+class="on-air-date">\s*(\d+)年(\d+)月(\d+)日\s*</span>',
webpage, 'release date', fatal=False, group=(1, 2, 3))
info.update({
return {
'id': video_id,
'title': title,
'description': description,
'channel': channel,
'channel_id': channel_id,
'release_date': ('%04d%02d%02d' % tuple(map(int, release_date))) if release_date else None,
})
return info
'formats': formats,
'series': traverse_obj(fragment, ('series', 'title', {str})),
'series_id': series_id,
'subtitles': subtitles,
**traverse_obj(fragment, ('movie', {
'title': ('title', {str}),
'cast': ('relatedUsers', ..., 'displayName', {str}, filter, all, filter),
'description': ('explanation', {clean_html}),
'release_timestamp': ('onAirStartDate', {parse_iso8601}),
'thumbnail': (('image', 'coverImageUrl'), {url_or_none}, any),
'timestamp': ('published', {parse_iso8601}),
})),
}

View File

@ -8,6 +8,8 @@ from ..utils import (
get_element_by_class,
int_or_none,
join_nonempty,
make_archive_id,
orderedSet,
parse_duration,
remove_end,
traverse_obj,
@ -16,6 +18,7 @@ from ..utils import (
unified_timestamp,
url_or_none,
urljoin,
variadic,
)
@ -495,7 +498,7 @@ class NhkForSchoolBangumiIE(InfoExtractor):
chapters = None
if chapter_durations and chapter_titles and len(chapter_durations) == len(chapter_titles):
start_time = chapter_durations
end_time = chapter_durations[1:] + [duration]
end_time = [*chapter_durations[1:], duration]
chapters = [{
'start_time': s,
'end_time': e,
@ -591,102 +594,179 @@ class NhkRadiruIE(InfoExtractor):
IE_DESC = 'NHK らじる (Radiru/Rajiru)'
_VALID_URL = r'https?://www\.nhk\.or\.jp/radio/(?:player/ondemand|ondemand/detail)\.html\?p=(?P<site>[\da-zA-Z]+)_(?P<corner>[\da-zA-Z]+)(?:_(?P<headline>[\da-zA-Z]+))?'
_TESTS = [{
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=0449_01_4003239',
'skip': 'Episode expired on 2024-06-09',
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=LG96ZW5KZ4_01_4251382',
'skip': 'Episode expires on 2025-07-14',
'info_dict': {
'title': 'ジャズ・トゥナイト ジャズ「Night and Day」特集',
'id': '0449_01_4003239',
'title': 'クラシックの庭\u3000特集「ドボルザークを聴く」(1)交響曲を中心に',
'id': 'LG96ZW5KZ4_01_4251382',
'ext': 'm4a',
'uploader': 'NHK FM 東京',
'description': 'md5:ad05f3c3f3f6e99b2e69f9b5e49551dc',
'series': 'ジャズ・トゥナイト',
'channel': 'NHK FM 東京',
'thumbnail': 'https://www.nhk.or.jp/prog/img/449/g449.jpg',
'upload_date': '20240601',
'series_id': '0449_01',
'release_date': '20240601',
'timestamp': 1717257600,
'release_timestamp': 1717250400,
'description': 'md5:652d3c38a25b77959c716421eba1617a',
'uploader': 'NHK FM・東京',
'channel': 'NHK FM・東京',
'duration': 6597.0,
'thumbnail': 'https://www.nhk.jp/static/assets/images/radioseries/rs/LG96ZW5KZ4/LG96ZW5KZ4-eyecatch_a67c6e949325016c0724f2ed3eec8a2f.jpg',
'categories': ['音楽', 'クラシック・オペラ'],
'cast': ['田添菜穂子'],
'series': 'クラシックの庭',
'series_id': 'LG96ZW5KZ4',
'episode': '特集「ドボルザークを聴く」(1)交響曲を中心に',
'episode_id': 'QP1Q2ZXZY3',
'timestamp': 1751871000,
'upload_date': '20250707',
'release_timestamp': 1751864403,
'release_date': '20250707',
},
}, {
# playlist, airs every weekday so it should _hopefully_ be okay forever
'url': 'https://www.nhk.or.jp/radio/ondemand/detail.html?p=0458_01',
'url': 'https://www.nhk.or.jp/radio/ondemand/detail.html?p=Z9L1V2M24L_01',
'info_dict': {
'id': '0458_01',
'id': 'Z9L1V2M24L_01',
'title': 'ベストオブクラシック',
'description': '世界中の上質な演奏会をじっくり堪能する本格派クラシック番組。',
'thumbnail': 'https://www.nhk.or.jp/prog/img/458/g458.jpg',
'series_id': '0458_01',
'thumbnail': 'https://www.nhk.jp/static/assets/images/radioseries/rs/Z9L1V2M24L/Z9L1V2M24L-eyecatch_83ed28b4782907998875965fee60a351.jpg',
'series_id': 'Z9L1V2M24L_01',
'uploader': 'NHK FM',
'channel': 'NHK FM',
'series': 'ベストオブクラシック',
},
'playlist_mincount': 3,
}, {
# one with letters in the id
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=F683_01_3910688',
'note': 'Expires on 2025-03-31',
'info_dict': {
'id': 'F683_01_3910688',
'ext': 'm4a',
'title': '夏目漱石「文鳥」第1回',
'series': '【らじる文庫】夏目漱石「文鳥」全4回',
'series_id': 'F683_01',
'description': '朗読:浅井理アナウンサー',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/F683/img/roudoku_05_rod_640.jpg',
'upload_date': '20240106',
'release_date': '20240106',
'uploader': 'NHK R1',
'release_timestamp': 1704511800,
'channel': 'NHK R1',
'timestamp': 1704512700,
},
'expected_warnings': ['Unable to download JSON metadata',
'Failed to get extended metadata. API returned Error 1: Invalid parameters'],
}, {
# news
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=F261_01_4012173',
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=18439M2W42_02_4251212',
'skip': 'Expires on 2025-07-15',
'info_dict': {
'id': 'F261_01_4012173',
'id': '18439M2W42_02_4251212',
'ext': 'm4a',
'channel': 'NHKラジオ第1',
'title': 'マイあさ! 午前5時のNHKニュース 2025年7月8日',
'uploader': 'NHKラジオ第1',
'channel': 'NHKラジオ第1',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/18439M2W42/img/series_945_thumbnail.jpg',
'series': 'NHKラジオニュース',
'title': '午前時のNHKニュース',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/F261/img/RADIONEWS_640.jpg',
'release_timestamp': 1718290800,
'release_date': '20240613',
'timestamp': 1718291400,
'upload_date': '20240613',
'timestamp': 1751919420,
'upload_date': '20250707',
'release_timestamp': 1751918400,
'release_date': '20250707',
},
}, {
# fallback when extended metadata fails
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=2834_01_4009298',
'skip': 'Expires on 2024-06-07',
'url': 'https://www.nhk.or.jp/radio/player/ondemand.html?p=J8792PY43V_20_4253945',
'skip': 'Expires on 2025-09-01',
'info_dict': {
'id': '2834_01_4009298',
'title': 'まち☆キラ!開成町特集',
'id': 'J8792PY43V_20_4253945',
'ext': 'm4a',
'release_date': '20240531',
'upload_date': '20240531',
'series': 'はま☆キラ!',
'thumbnail': 'https://www.nhk.or.jp/prog/img/2834/g2834.jpg',
'channel': 'NHK R1,FM',
'description': '',
'timestamp': 1717123800,
'uploader': 'NHK R1,FM',
'release_timestamp': 1717120800,
'series_id': '2834_01',
'title': '「後絶たない筋肉増強剤の使用」ワールドリポート',
'description': '大濱 敦(ソウル支局)',
'uploader': 'NHK R1',
'channel': 'NHK R1',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/J8792PY43V/img/corner/box_31_thumbnail.jpg',
'series': 'マイあさ! ワールドリポート',
'series_id': 'J8792PY43V_20',
'timestamp': 1751837100,
'upload_date': '20250706',
'release_timestamp': 1751835600,
'release_date': '20250706',
},
'expected_warnings': ['Failed to get extended metadata. API returned empty list.'],
'expected_warnings': ['Failed to download extended metadata: HTTP Error 404: Not Found'],
}]
_API_URL_TMPL = None
# The `_format_*` and `_make_*` functions are ported from: https://www.nhk.or.jp/radio/assets/js/timetable_detail_new.js
def _format_act_list(self, act_list):
role_groups = {}
for act in traverse_obj(act_list, (..., {dict})):
role = act.get('role')
if role not in role_groups:
role_groups[role] = []
role_groups[role].append(act)
formatted_roles = []
for role, acts in role_groups.items():
for i, act in enumerate(acts):
res = f'{role}' if i == 0 and role is not None else ''
if title := act.get('title'):
res += f'{title}'
formatted_roles.append(join_nonempty(res, act.get('name'), delim=''))
return join_nonempty(*formatted_roles, delim='')
def _make_artists(self, track, key):
artists = []
for artist in traverse_obj(track, (key, ..., {dict})):
if res := join_nonempty(*traverse_obj(artist, ((
('role', filter, {'{}'.format}),
('part', filter, {'{}'.format}),
('name', filter),
), {str})), delim=''):
artists.append(res)
return ''.join(artists) or None
def _make_duration(self, track, key):
d = traverse_obj(track, (key, {parse_duration}))
if d is None:
return None
hours, remainder = divmod(d, 3600)
minutes, seconds = divmod(remainder, 60)
res = ''
if hours > 0:
res += f'{int(hours)}時間'
if minutes > 0:
res += f'{int(minutes)}'
res += f'{int(seconds):02}秒)'
return res
def _format_music_list(self, music_list):
tracks = []
for track in traverse_obj(music_list, (..., {dict})):
track_details = traverse_obj(track, ((
('name', filter, {'{}'.format}),
('lyricist', filter, {'{}:作詞'.format}),
('composer', filter, {'{}:作曲'.format}),
('arranger', filter, {'{}:編曲'.format}),
), {str}))
track_details.append(self._make_artists(track, 'byArtist'))
track_details.append(self._make_duration(track, 'duration'))
if label := join_nonempty('label', 'code', delim=' ', from_dict=track):
track_details.append(f'{label}')
if location := traverse_obj(track, ('location', {str})):
track_details.append(f'{location}')
tracks.append(join_nonempty(*track_details, delim='\n'))
return '\n\n'.join(tracks)
def _format_description(self, response):
detailed_description = traverse_obj(response, ('detailedDescription', {dict})) or {}
return join_nonempty(
join_nonempty('epg80', 'epg200', delim='\n\n', from_dict=detailed_description),
traverse_obj(response, ('misc', 'actList', {self._format_act_list})),
traverse_obj(response, ('misc', 'musicList', {self._format_music_list})),
delim='\n\n')
def _get_thumbnails(self, data, keys, name=None, preference=-1):
thumbnails = []
for size, thumb in traverse_obj(data, (
*variadic(keys, (str, bytes, dict, set)), {dict.items},
lambda _, v: v[0] != 'copyright' and url_or_none(v[1]['url']),
)):
thumbnails.append({
'url': thumb['url'],
'width': int_or_none(thumb.get('width')),
'height': int_or_none(thumb.get('height')),
'preference': preference,
'id': join_nonempty(name, size),
})
preference -= 1
return thumbnails
def _extract_extended_metadata(self, episode_id, aa_vinfo):
service, _, area = traverse_obj(aa_vinfo, (2, {str}, {lambda x: (x or '').partition(',')}))
date_id = aa_vinfo[3]
detail_url = try_call(
lambda: self._API_URL_TMPL.format(area=area, service=service, dateid=aa_vinfo[3]))
lambda: self._API_URL_TMPL.format(broadcastEventId=join_nonempty(service, area, date_id)))
if not detail_url:
return {}
@ -699,36 +779,37 @@ class NhkRadiruIE(InfoExtractor):
if error := traverse_obj(response, ('error', {dict})):
self.report_warning(
'Failed to get extended metadata. API returned '
f'Error {join_nonempty("code", "message", from_dict=error, delim=": ")}')
f'Error {join_nonempty("statuscode", "message", from_dict=error, delim=": ")}')
return {}
full_meta = traverse_obj(response, ('list', service, 0, {dict}))
if not full_meta:
self.report_warning('Failed to get extended metadata. API returned empty list.')
return {}
station = traverse_obj(response, ('publishedOn', 'broadcastDisplayName', {str}))
station = ' '.join(traverse_obj(full_meta, (('service', 'area'), 'name', {str}))) or None
thumbnails = [{
'id': str(id_),
'preference': 1 if id_.startswith('thumbnail') else -2 if id_.startswith('logo') else -1,
**traverse_obj(thumb, {
'url': 'url',
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
} for id_, thumb in traverse_obj(full_meta, ('images', {dict.items}, lambda _, v: v[1]['url']))]
thumbnails = []
thumbnails.extend(self._get_thumbnails(response, ('about', 'eyecatch')))
for num, dct in enumerate(traverse_obj(response, ('about', 'eyecatchList', ...))):
thumbnails.extend(self._get_thumbnails(dct, None, join_nonempty('list', num), -2))
thumbnails.extend(
self._get_thumbnails(response, ('about', 'partOfSeries', 'eyecatch'), 'series', -3))
return filter_dict({
'description': self._format_description(response),
'cast': traverse_obj(response, ('misc', 'actList', ..., 'name', {str})),
'thumbnails': thumbnails,
**traverse_obj(response, {
'title': ('name', {str}),
'timestamp': ('endDate', {unified_timestamp}),
'release_timestamp': ('startDate', {unified_timestamp}),
'duration': ('duration', {parse_duration}),
}),
**traverse_obj(response, ('identifierGroup', {
'series': ('radioSeriesName', {str}),
'series_id': ('radioSeriesId', {str}),
'episode': ('radioEpisodeName', {str}),
'episode_id': ('radioEpisodeId', {str}),
'categories': ('genre', ..., ['name1', 'name2'], {str}, all, {orderedSet}),
})),
'channel': station,
'uploader': station,
'description': join_nonempty(
'subtitle', 'content', 'act', 'music', delim='\n\n', from_dict=full_meta),
'thumbnails': thumbnails,
**traverse_obj(full_meta, {
'title': ('title', {str}),
'timestamp': ('end_time', {unified_timestamp}),
'release_timestamp': ('start_time', {unified_timestamp}),
}),
})
def _extract_episode_info(self, episode, programme_id, series_meta):
@ -782,7 +863,9 @@ class NhkRadiruIE(InfoExtractor):
site_id, corner_id, headline_id = self._match_valid_url(url).group('site', 'corner', 'headline')
programme_id = f'{site_id}_{corner_id}'
if site_id == 'F261': # XXX: News programmes use old API (for now?)
# XXX: News programmes use the old API
# Can't move this to NhkRadioNewsPageIE because news items still use the normal URL format
if site_id == '18439M2W42':
meta = self._download_json(
'https://www.nhk.or.jp/s-media/news/news-site/list/v1/all.json', programme_id)['main']
series_meta = traverse_obj(meta, {
@ -843,8 +926,8 @@ class NhkRadioNewsPageIE(InfoExtractor):
'url': 'https://www.nhk.or.jp/radionews/',
'playlist_mincount': 5,
'info_dict': {
'id': 'F261_01',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/F261/img/RADIONEWS_640.jpg',
'id': '18439M2W42_01',
'thumbnail': 'https://www.nhk.or.jp/radioondemand/json/18439M2W42/img/series_945_thumbnail.jpg',
'description': 'md5:bf2c5b397e44bc7eb26de98d8f15d79d',
'channel': 'NHKラジオ第1',
'uploader': 'NHKラジオ第1',
@ -853,7 +936,7 @@ class NhkRadioNewsPageIE(InfoExtractor):
}]
def _real_extract(self, url):
return self.url_result('https://www.nhk.or.jp/radio/ondemand/detail.html?p=F261_01', NhkRadiruIE)
return self.url_result('https://www.nhk.or.jp/radio/ondemand/detail.html?p=18439M2W42_01', NhkRadiruIE)
class NhkRadiruLiveIE(InfoExtractor):
@ -863,11 +946,12 @@ class NhkRadiruLiveIE(InfoExtractor):
# radio 1, no area specified
'url': 'https://www.nhk.or.jp/radio/player/?ch=r1',
'info_dict': {
'id': 'r1-tokyo',
'title': 're:^ネットラジオ第1 東京.+$',
'id': 'bs-r1-130',
'title': 're:^NHKラジオ第1・東京.+$',
'ext': 'm4a',
'thumbnail': 'https://www.nhk.or.jp/common/img/media/r1-200x200.png',
'thumbnail': 'https://www.nhk.jp/assets/images/broadcastservice/bs/r1/r1-logo.svg',
'live_status': 'is_live',
'_old_archive_ids': ['nhkradirulive r1-tokyo'],
},
}, {
# radio 2, area specified
@ -875,26 +959,28 @@ class NhkRadiruLiveIE(InfoExtractor):
'url': 'https://www.nhk.or.jp/radio/player/?ch=r2',
'params': {'extractor_args': {'nhkradirulive': {'area': ['fukuoka']}}},
'info_dict': {
'id': 'r2-fukuoka',
'title': 're:^ネットラジオ第2 福岡.+$',
'id': 'bs-r2-400',
'title': 're:^NHKラジオ第2.+$',
'ext': 'm4a',
'thumbnail': 'https://www.nhk.or.jp/common/img/media/r2-200x200.png',
'thumbnail': 'https://www.nhk.jp/assets/images/broadcastservice/bs/r2/r2-logo.svg',
'live_status': 'is_live',
'_old_archive_ids': ['nhkradirulive r2-fukuoka'],
},
}, {
# fm, area specified
'url': 'https://www.nhk.or.jp/radio/player/?ch=fm',
'params': {'extractor_args': {'nhkradirulive': {'area': ['sapporo']}}},
'info_dict': {
'id': 'fm-sapporo',
'title': 're:^NHKネットラジオFM 札幌.+$',
'id': 'bs-r3-010',
'title': 're:^NHK FM・札幌.+$',
'ext': 'm4a',
'thumbnail': 'https://www.nhk.or.jp/common/img/media/fm-200x200.png',
'thumbnail': 'https://www.nhk.jp/assets/images/broadcastservice/bs/r3/r3-logo.svg',
'live_status': 'is_live',
'_old_archive_ids': ['nhkradirulive fm-sapporo'],
},
}]
_NOA_STATION_IDS = {'r1': 'n1', 'r2': 'n2', 'fm': 'n3'}
_NOA_STATION_IDS = {'r1': 'r1', 'r2': 'r2', 'fm': 'r3'}
def _real_extract(self, url):
station = self._match_id(url)
@ -911,12 +997,15 @@ class NhkRadiruLiveIE(InfoExtractor):
noa_info = self._download_json(
f'https:{config.find(".//url_program_noa").text}'.format(area=data.find('areakey').text),
station, note=f'Downloading {area} station metadata', fatal=False)
present_info = traverse_obj(noa_info, ('nowonair_list', self._NOA_STATION_IDS.get(station), 'present'))
broadcast_service = traverse_obj(noa_info, (self._NOA_STATION_IDS.get(station), 'publishedOn'))
return {
'title': ' '.join(traverse_obj(present_info, (('service', 'area'), 'name', {str}))),
'id': join_nonempty(station, area),
'thumbnails': traverse_obj(present_info, ('service', 'images', ..., {
**traverse_obj(broadcast_service, {
'title': ('broadcastDisplayName', {str}),
'id': ('id', {str}),
}),
'_old_archive_ids': [make_archive_id(self, join_nonempty(station, area))],
'thumbnails': traverse_obj(broadcast_service, ('logo', ..., {
'url': 'url',
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),

View File

@ -4,16 +4,15 @@ import itertools
import json
import re
import time
import urllib.parse
from .common import InfoExtractor, SearchInfoExtractor
from ..networking import Request
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
determine_ext,
extract_attributes,
float_or_none,
int_or_none,
parse_bitrate,
@ -22,9 +21,8 @@ from ..utils import (
parse_qs,
parse_resolution,
qualities,
remove_start,
str_or_none,
unescapeHTML,
truncate_string,
unified_timestamp,
update_url_query,
url_basename,
@ -32,7 +30,11 @@ from ..utils import (
urlencode_postdata,
urljoin,
)
from ..utils.traversal import find_element, require, traverse_obj
from ..utils.traversal import (
find_element,
require,
traverse_obj,
)
class NiconicoBaseIE(InfoExtractor):
@ -806,41 +808,39 @@ class NiconicoLiveIE(NiconicoBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(f'https://live.nicovideo.jp/watch/{video_id}', video_id)
webpage = self._download_webpage(url, video_id, expected_status=404)
if err_msg := traverse_obj(webpage, ({find_element(cls='message')}, {clean_html})):
raise ExtractorError(err_msg, expected=True)
embedded_data = self._parse_json(unescapeHTML(self._search_regex(
r'<script\s+id="embedded-data"\s*data-props="(.+?)"', webpage, 'embedded data')), video_id)
ws_url = traverse_obj(embedded_data, ('site', 'relive', 'webSocketUrl'))
if not ws_url:
raise ExtractorError('The live hasn\'t started yet or already ended.', expected=True)
ws_url = update_url_query(ws_url, {
'frontend_id': traverse_obj(embedded_data, ('site', 'frontendId')) or '9',
})
hostname = remove_start(urllib.parse.urlparse(urlh.url).hostname, 'sp.')
embedded_data = traverse_obj(webpage, (
{find_element(tag='script', id='embedded-data', html=True)},
{extract_attributes}, 'data-props', {json.loads}))
frontend_id = traverse_obj(embedded_data, ('site', 'frontendId', {str_or_none}), default='9')
ws_url = traverse_obj(embedded_data, (
'site', 'relive', 'webSocketUrl', {url_or_none}, {require('websocket URL')}))
ws_url = update_url_query(ws_url, {'frontend_id': frontend_id})
ws = self._request_webpage(
Request(ws_url, headers={'Origin': f'https://{hostname}'}),
video_id=video_id, note='Connecting to WebSocket server')
ws_url, video_id, 'Connecting to WebSocket server',
headers={'Origin': 'https://live.nicovideo.jp'})
self.write_debug('Sending HLS server request')
ws.send(json.dumps({
'type': 'startWatching',
'data': {
'reconnect': False,
'room': {
'commentable': True,
'protocol': 'webSocket',
},
'stream': {
'quality': 'abr',
'protocol': 'hls',
'latency': 'high',
'accessRightMethod': 'single_cookie',
'chasePlay': False,
'latency': 'high',
'protocol': 'hls',
'quality': 'abr',
},
'room': {
'protocol': 'webSocket',
'commentable': True,
},
'reconnect': False,
},
'type': 'startWatching',
}))
while True:
@ -860,17 +860,15 @@ class NiconicoLiveIE(NiconicoBaseIE):
raise ExtractorError('Disconnected at middle of extraction')
elif data.get('type') == 'error':
self.write_debug(recv)
message = traverse_obj(data, ('body', 'code')) or recv
message = traverse_obj(data, ('body', 'code', {str_or_none}), default=recv)
raise ExtractorError(message)
elif self.get_param('verbose', False):
if len(recv) > 100:
recv = recv[:100] + '...'
self.write_debug(f'Server said: {recv}')
self.write_debug(f'Server response: {truncate_string(recv, 100)}')
title = traverse_obj(embedded_data, ('program', 'title')) or self._html_search_meta(
('og:title', 'twitter:title'), webpage, 'live title', fatal=False)
raw_thumbs = traverse_obj(embedded_data, ('program', 'thumbnail')) or {}
raw_thumbs = traverse_obj(embedded_data, ('program', 'thumbnail', {dict})) or {}
thumbnails = []
for name, value in raw_thumbs.items():
if not isinstance(value, dict):
@ -897,31 +895,30 @@ class NiconicoLiveIE(NiconicoBaseIE):
cookie['domain'], cookie['name'], cookie['value'],
expire_time=unified_timestamp(cookie.get('expires')), path=cookie['path'], secure=cookie['secure'])
fmt_common = {
'live_latency': 'high',
'origin': hostname,
'protocol': 'niconico_live',
'video_id': video_id,
'ws': ws,
}
q_iter = (q for q in qualities[1:] if not q.startswith('audio_')) # ignore initial 'abr'
a_map = {96: 'audio_low', 192: 'audio_high'}
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True)
for fmt in formats:
fmt['protocol'] = 'niconico_live'
if fmt.get('acodec') == 'none':
fmt['format_id'] = next(q_iter, fmt['format_id'])
elif fmt.get('vcodec') == 'none':
abr = parse_bitrate(fmt['url'].lower())
fmt.update({
'abr': abr,
'acodec': 'mp4a.40.2',
'format_id': a_map.get(abr, fmt['format_id']),
})
fmt.update(fmt_common)
return {
'id': video_id,
'title': title,
'downloader_options': {
'max_quality': traverse_obj(embedded_data, ('program', 'stream', 'maxQuality', {str})) or 'normal',
'ws': ws,
'ws_url': ws_url,
},
**traverse_obj(embedded_data, {
'view_count': ('program', 'statistics', 'watchCount'),
'comment_count': ('program', 'statistics', 'commentCount'),

View File

@ -1,6 +1,5 @@
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
traverse_obj,
@ -61,10 +60,10 @@ class NineGagIE(InfoExtractor):
post = self._download_json(
'https://9gag.com/v1/post', post_id, query={
'id': post_id,
})['data']['post']
}, impersonate=True)['data']['post']
if post.get('type') != 'Animated':
raise ExtractorError(
self.raise_no_formats(
'The given url does not contain a video',
expected=True)

View File

@ -1,6 +1,3 @@
import json
import re
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
@ -11,7 +8,12 @@ from ..utils import (
str_or_none,
url_or_none,
)
from ..utils.traversal import require, traverse_obj, value
from ..utils.traversal import (
get_first,
require,
traverse_obj,
value,
)
class NineNowIE(InfoExtractor):
@ -101,20 +103,11 @@ class NineNowIE(InfoExtractor):
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId={}'
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv and yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url):
display_id, video_type = self._match_valid_url(url).group('id', 'type')
webpage = self._download_webpage(url, display_id)
common_data = traverse_obj(
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json},
lambda _, v: v['payload'][video_type]['slug'] == display_id,
'payload', any, {require('video data')}))
common_data = get_first(self._search_nextjs_v13_data(webpage, display_id), ('payload', {dict}))
if traverse_obj(common_data, (video_type, 'video', 'drm', {bool})):
self.report_drm(display_id)

View File

@ -1,100 +0,0 @@
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
smuggle_url,
try_get,
)
class NoovoIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?noovo\.ca/videos/(?P<id>[^/]+/[^/?#&]+)'
_TESTS = [{
# clip
'url': 'http://noovo.ca/videos/rpm-plus/chrysler-imperial',
'info_dict': {
'id': '5386045029001',
'ext': 'mp4',
'title': 'Chrysler Imperial',
'description': 'md5:de3c898d1eb810f3e6243e08c8b4a056',
'timestamp': 1491399228,
'upload_date': '20170405',
'uploader_id': '618566855001',
'series': 'RPM+',
},
'params': {
'skip_download': True,
},
}, {
# episode
'url': 'http://noovo.ca/videos/l-amour-est-dans-le-pre/episode-13-8',
'info_dict': {
'id': '5395865725001',
'title': 'Épisode 13 : Les retrouvailles',
'description': 'md5:888c3330f0c1b4476c5bc99a1c040473',
'ext': 'mp4',
'timestamp': 1492019320,
'upload_date': '20170412',
'uploader_id': '618566855001',
'series': "L'amour est dans le pré",
'season_number': 5,
'episode': 'Épisode 13',
'episode_number': 13,
},
'params': {
'skip_download': True,
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/618566855001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
brightcove_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'brightcove id')
data = self._parse_json(
self._search_regex(
r'(?s)dataLayer\.push\(\s*({.+?})\s*\);', webpage, 'data',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
title = try_get(
data, lambda x: x['video']['nom'],
str) or self._html_search_meta(
'dcterms.Title', webpage, 'title', fatal=True)
description = self._html_search_meta(
('dcterms.Description', 'description'), webpage, 'description')
series = try_get(
data, lambda x: x['emission']['nom']) or self._search_regex(
r'<div[^>]+class="banner-card__subtitle h4"[^>]*>([^<]+)',
webpage, 'series', default=None)
season_el = try_get(data, lambda x: x['emission']['saison'], dict) or {}
season = try_get(season_el, lambda x: x['nom'], str)
season_number = int_or_none(try_get(season_el, lambda x: x['numero']))
episode_el = try_get(season_el, lambda x: x['episode'], dict) or {}
episode = try_get(episode_el, lambda x: x['nom'], str)
episode_number = int_or_none(try_get(episode_el, lambda x: x['numero']))
return {
'_type': 'url_transparent',
'ie_key': BrightcoveNewIE.ie_key(),
'url': smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': ['CA']}),
'id': brightcove_id,
'title': title,
'description': description,
'series': series,
'season': season,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
}

View File

@ -0,0 +1,70 @@
from .common import InfoExtractor
from ..utils import clean_html, clean_podcast_url, int_or_none, str_or_none, url_or_none
from ..utils.traversal import traverse_obj
class PlayerFmIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?player\.fm/(?:series/)?[\w-]+/(?P<id>[\w-]+))'
_TESTS = [{
'url': 'https://player.fm/series/chapo-trap-house/movie-mindset-33-casino-feat-felix',
'info_dict': {
'ext': 'mp3',
'id': '478606546',
'display_id': 'movie-mindset-33-casino-feat-felix',
'thumbnail': r're:^https://.*\.(jpg|png)',
'title': 'Movie Mindset 33 - Casino feat. Felix',
'creators': ['Chapo Trap House'],
'description': r're:The first episode of this season of Movie Mindset is free .+ we feel about it\.',
'duration': 6830,
'timestamp': 1745406000,
'upload_date': '20250423',
},
}, {
'url': 'https://player.fm/series/nbc-nightly-news-with-tom-llamas/thursday-april-17-2025',
'info_dict': {
'ext': 'mp3',
'id': '477635490',
'display_id': 'thursday-april-17-2025',
'title': 'Thursday, April 17, 2025',
'thumbnail': r're:^https://.*\.(jpg|png)',
'duration': 1143,
'description': 'md5:4890b8cf9a55a787561cd5d59dfcda82',
'creators': ['NBC News'],
'timestamp': 1744941374,
'upload_date': '20250418',
},
}, {
'url': 'https://player.fm/series/soccer-101/ep-109-its-kicking-off-how-have-the-rules-for-kickoff-changed-what-are-the-best-approaches-to-getting-the-game-underway-and-how-could-we-improve-on-the-present-system-ack3NzL3yibvs4pf',
'info_dict': {
'ext': 'mp3',
'id': '481418710',
'thumbnail': r're:^https://.*\.(jpg|png)',
'title': r're:#109 It\'s kicking off! How have the rules for kickoff changed, .+ the present system\?',
'creators': ['TSS'],
'duration': 1510,
'display_id': 'md5:b52ecacaefab891b59db69721bfd9b13',
'description': 'md5:52a39e36d08d8919527454f152ad3c25',
'timestamp': 1659102055,
'upload_date': '20220729',
},
}]
def _real_extract(self, url):
display_id, url = self._match_valid_url(url).group('id', 'url')
data = self._download_json(f'{url}.json', display_id)
return {
'display_id': display_id,
'vcodec': 'none',
**traverse_obj(data, {
'id': ('id', {int}, {str_or_none}),
'url': ('url', {clean_podcast_url}),
'title': ('title', {str}),
'description': ('description', {clean_html}),
'duration': ('duration', {int_or_none}),
'thumbnail': (('image', ('series', 'image')), 'url', {url_or_none}, any),
'filesize': ('size', {int_or_none}),
'timestamp': ('publishedAt', {int_or_none}),
'creators': ('series', 'author', {str}, filter, all, filter),
}),
}

View File

@ -81,7 +81,7 @@ class RaiBaseIE(InfoExtractor):
# geo flag is a bit unreliable and not properly set all the time
geoprotection = xpath_text(relinker, './geoprotection', default='N') == 'Y'
ext = determine_ext(media_url)
ext = determine_ext(media_url).lower()
formats = []
if ext == 'mp3':
@ -108,7 +108,7 @@ class RaiBaseIE(InfoExtractor):
'format_id': join_nonempty('https', bitrate, delim='-'),
})
else:
raise ExtractorError('Unrecognized media file found')
raise ExtractorError(f'Unrecognized media extension "{ext}"')
if (not formats and geoprotection is True) or '/video_no_available.mp4' in media_url:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES, metadata_available=True)
@ -503,6 +503,28 @@ class RaiPlaySoundIE(RaiBaseIE):
'upload_date': '20211201',
},
'params': {'skip_download': True},
}, {
# case-sensitivity test for uppercase extension
'url': 'https://www.raiplaysound.it/audio/2020/05/Storia--Lunita-dItalia-e-lunificazione-della-Germania-b4c16390-7f3f-4282-b353-d94897dacb7c.html',
'md5': 'c69ebd69282f0effd7ef67b7e2f6c7d8',
'info_dict': {
'id': 'b4c16390-7f3f-4282-b353-d94897dacb7c',
'ext': 'mp3',
'title': "Storia | 01 L'unità d'Italia e l'unificazione della Germania",
'alt_title': 'md5:ed4ed82585c52057b71b43994a59b705',
'description': 'md5:92818b6f31b2c150567d56b75db2ea7f',
'uploader': 'rai radio 3',
'duration': 2439.0,
'thumbnail': 'https://www.raiplaysound.it/dl/img/2023/09/07/1694084898279_Maturadio-LOGO-2048x1152.jpg',
'creators': ['rai radio 3'],
'series': 'Maturadio',
'season': 'Season 9',
'season_number': 9,
'episode': "01. L'unità d'Italia e l'unificazione della Germania",
'episode_number': 1,
'timestamp': 1590400740,
'upload_date': '20200525',
},
}]
def _real_extract(self, url):
@ -765,7 +787,7 @@ class RaiCulturaIE(RaiNewsIE): # XXX: Do not subclass from concrete IE
class RaiSudtirolIE(RaiBaseIE):
_VALID_URL = r'https?://raisudtirol\.rai\.it/.+media=(?P<id>\w+)'
_VALID_URL = r'https?://rai(?:bz|sudtirol)\.rai\.it/.+media=(?P<id>\w+)'
_TESTS = [{
# mp4 file
'url': 'https://raisudtirol.rai.it/la/index.php?media=Ptv1619729460',
@ -791,6 +813,9 @@ class RaiSudtirolIE(RaiBaseIE):
'formats': 'count:6',
},
'params': {'skip_download': True},
}, {
'url': 'https://raibz.rai.it/de/index.php?media=Ptv1751660400',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -0,0 +1,41 @@
from .floatplane import FloatplaneBaseIE
class SaucePlusIE(FloatplaneBaseIE):
IE_DESC = 'Sauce+'
_VALID_URL = r'https?://(?:(?:www|beta)\.)?sauceplus\.com/post/(?P<id>\w+)'
_BASE_URL = 'https://www.sauceplus.com'
_HEADERS = {
'Origin': _BASE_URL,
'Referer': f'{_BASE_URL}/',
}
_IMPERSONATE_TARGET = True
_TESTS = [{
'url': 'https://www.sauceplus.com/post/YbBwIa2A5g',
'info_dict': {
'id': 'eit4Ugu5TL',
'ext': 'mp4',
'display_id': 'YbBwIa2A5g',
'title': 'Scare the Coyote - Episode 3',
'description': '',
'thumbnail': r're:^https?://.*\.jpe?g$',
'duration': 2975,
'comment_count': int,
'like_count': int,
'dislike_count': int,
'release_date': '20250627',
'release_timestamp': 1750993500,
'uploader': 'Scare The Coyote',
'uploader_id': '683e0a3269688656a5a49a44',
'uploader_url': 'https://www.sauceplus.com/channel/ScareTheCoyote/home',
'channel': 'Scare The Coyote',
'channel_id': '683e0a326968866ceba49a45',
'channel_url': 'https://www.sauceplus.com/channel/ScareTheCoyote/home/main',
'availability': 'subscriber_only',
},
'params': {'skip_download': 'm3u8'},
}]
def _real_initialize(self):
if not self._get_cookies(self._BASE_URL).get('__Host-sp-sess'):
self.raise_login_required()

View File

@ -213,7 +213,7 @@ class CieloTVItIE(SkyItIE): # XXX: Do not subclass from concrete IE
class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
IE_NAME = 'tv8.it'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?:show)?video/[0-9a-z-]+-(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?:show)?video/(?:[0-9a-z-]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.tv8.it/video/ogni-mattina-ucciso-asino-di-andrea-lo-cicero-630529',
'md5': '9ab906a3f75ea342ed928442f9dabd21',
@ -227,6 +227,19 @@ class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
'thumbnail': 'https://videoplatform.sky.it/still/2020/11/18/1605717753954_ogni-mattina-ucciso-asino-di-andrea-lo-cicero_videostill_1.jpg',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.tv8.it/video/964361',
'md5': '1e58e807154658a16edc29e45be38107',
'info_dict': {
'id': '964361',
'ext': 'mp4',
'title': 'GialappaShow - S.4 Ep.2',
'description': 'md5:60bb4ff5af18bbeeaedabc1de5f9e1e2',
'duration': 8030,
'thumbnail': 'https://videoplatform.sky.it/captures/494/2024/11/06/964361/964361_1730888412914_thumb_494.jpg',
'timestamp': 1730821499,
'upload_date': '20241105',
},
}]
_DOMAIN = 'mtv8'

View File

@ -242,7 +242,7 @@ class SoundcloudBaseIE(InfoExtractor):
format_urls.add(format_url)
formats.append({
'format_id': 'download',
'ext': urlhandle_detect_ext(urlh, default='mp3'),
'ext': urlhandle_detect_ext(urlh),
'filesize': int_or_none(urlh.headers.get('Content-Length')),
'url': format_url,
'quality': 10,

View File

@ -25,6 +25,7 @@ class SportDeutschlandIE(InfoExtractor):
'upload_date': '20230114',
'timestamp': 1673733618,
},
'skip': 'not found',
}, {
'url': 'https://sportdeutschland.tv/deutscherbadmintonverband/bwf-tour-1-runde-feld-1-yonex-gainward-german-open-2022-0',
'info_dict': {
@ -41,6 +42,7 @@ class SportDeutschlandIE(InfoExtractor):
'upload_date': '20220309',
'timestamp': 1646860727.0,
},
'skip': 'not found',
}, {
'url': 'https://sportdeutschland.tv/ggcbremen/formationswochenende-latein-2023',
'info_dict': {
@ -68,6 +70,7 @@ class SportDeutschlandIE(InfoExtractor):
'live_status': 'was_live',
},
}],
'skip': 'not found',
}, {
'url': 'https://sportdeutschland.tv/dtb/gymnastik-international-tag-1',
'info_dict': {
@ -82,13 +85,30 @@ class SportDeutschlandIE(InfoExtractor):
'live_status': 'is_live',
},
'skip': 'live',
}, {
'url': 'https://sportdeutschland.tv/rostock-griffins/gfl2-rostock-griffins-vs-elmshorn-fighting-pirates',
'md5': '35c11a19395c938cdd076b93bda54cde',
'info_dict': {
'id': '9f27a97d-1544-4d0b-aa03-48d92d17a03a',
'ext': 'mp4',
'title': 'GFL2: Rostock Griffins vs. Elmshorn Fighting Pirates',
'display_id': 'rostock-griffins/gfl2-rostock-griffins-vs-elmshorn-fighting-pirates',
'channel': 'Rostock Griffins',
'channel_url': 'https://sportdeutschland.tv/rostock-griffins',
'live_status': 'was_live',
'description': 'md5:60cb00067e55dafa27b0933a43d72862',
'channel_id': '9635f21c-3f67-4584-9ce4-796e9a47276b',
'timestamp': 1749913117,
'upload_date': '20250614',
},
}]
def _process_video(self, asset_id, video):
is_live = video['type'] == 'mux_live'
token = self._download_json(
f'https://api.sportdeutschland.tv/api/frontend/asset-token/{asset_id}',
video['id'], query={'type': video['type'], 'playback_id': video['src']})['token']
f'https://api.sportdeutschland.tv/api/web/personal/asset-token/{asset_id}',
video['id'], query={'type': video['type'], 'playback_id': video['src']},
headers={'Referer': 'https://sportdeutschland.tv/'})['token']
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://stream.mux.com/{video["src"]}.m3u8?token={token}', video['id'], live=is_live)

View File

@ -41,6 +41,7 @@ class SproutVideoIE(InfoExtractor):
'duration': 703,
'thumbnail': r're:https?://images\.sproutvideo\.com/.+\.jpg',
},
'skip': 'Account Disabled',
}, {
# http formats 'sd' and 'hd' are available
'url': 'https://videos.sproutvideo.com/embed/119cd6bc1a18e6cd98/30751a1761ae5b90',
@ -98,10 +99,17 @@ class SproutVideoIE(InfoExtractor):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, headers=traverse_obj(smuggled_data, {'Referer': 'referer'}))
url, video_id, headers=traverse_obj(smuggled_data, {'Referer': 'referer'}), impersonate=True)
data = self._search_json(
r'var\s+dat\s*=\s*["\']', webpage, 'data', video_id, contains_pattern=r'[A-Za-z0-9+/=]+',
end_pattern=r'["\'];', transform_source=lambda x: base64.b64decode(x).decode())
r'(?:var|const|let)\s+(?:dat|playerInfo)\s*=\s*["\']', webpage, 'player info', video_id,
contains_pattern=r'[A-Za-z0-9+/=]+', end_pattern=r'["\'];',
transform_source=lambda x: base64.b64decode(x).decode())
# SproutVideo may send player info for 'SMPTE Color Monitor Test' [a791d7b71b12ecc52e]
# e.g. if the user-agent we used with the webpage request is too old
video_uid = data['videoUid']
if video_id != video_uid:
raise ExtractorError(f'{self.IE_NAME} sent the wrong video data ({video_uid})')
formats, subtitles = [], {}
headers = {

View File

@ -6,6 +6,7 @@ from ..utils import ExtractorError, clean_html, int_or_none
class TFOIE(InfoExtractor):
_WORKING = False
_GEO_COUNTRIES = ['CA']
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {

View File

@ -0,0 +1,43 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
url_or_none,
)
from ..utils.traversal import (
find_element,
require,
traverse_obj,
)
class TheHighWireIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thehighwire\.com/ark-videos/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://thehighwire.com/ark-videos/the-deposition-of-stanley-plotkin/',
'info_dict': {
'id': 'the-deposition-of-stanley-plotkin',
'ext': 'mp4',
'title': 'THE DEPOSITION OF STANLEY PLOTKIN',
'description': 'md5:6d0be4f1181daaa10430fd8b945a5e54',
'thumbnail': r're:https?://static\.arkengine\.com/video/.+\.jpg',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = traverse_obj(webpage, (
{find_element(cls='ark-video-embed', html=True)},
{extract_attributes}, 'src', {url_or_none}, {require('embed URL')}))
embed_page = self._download_webpage(embed_url, display_id)
return {
'id': display_id,
**traverse_obj(webpage, {
'title': ({find_element(cls='section-header')}, {clean_html}),
'description': ({find_element(cls='episode-description__copy')}, {clean_html}),
}),
**self._parse_html5_media_entries(embed_url, embed_page, display_id, m3u8_id='hls')[0],
}

View File

@ -51,6 +51,7 @@ class TV5UnisBaseIE(InfoExtractor):
class TV5UnisVideoIE(TV5UnisBaseIE):
_WORKING = False
IE_NAME = 'tv5unis:video'
_VALID_URL = r'https?://(?:www\.)?tv5unis\.ca/videos/[^/]+/(?P<id>\d+)'
_TEST = {
@ -71,6 +72,7 @@ class TV5UnisVideoIE(TV5UnisBaseIE):
class TV5UnisIE(TV5UnisBaseIE):
_WORKING = False
IE_NAME = 'tv5unis'
_VALID_URL = r'https?://(?:www\.)?tv5unis\.ca/videos/(?P<id>[^/]+)(?:/saisons/(?P<season_number>\d+)/episodes/(?P<episode_number>\d+))?/?(?:[?#&]|$)'
_TESTS = [{

View File

@ -6,6 +6,7 @@ import re
import urllib.parse
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
UserNotLive,
@ -188,19 +189,39 @@ class TwitchBaseIE(InfoExtractor):
}] if thumbnail else None
def _extract_twitch_m3u8_formats(self, path, video_id, token, signature, live_from_start=False):
formats = self._extract_m3u8_formats(
f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={
'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true',
'p': random.randint(1000000, 10000000),
'platform': 'web',
'player': 'twitchweb',
'supported_codecs': 'av1,h265,h264',
'playlist_include_framerate': 'true',
'sig': signature,
'token': token,
})
try:
formats = self._extract_m3u8_formats(
f'{self._USHER_BASE}/{path}/{video_id}.m3u8', video_id, 'mp4', query={
'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true',
'p': random.randint(1000000, 10000000),
'platform': 'web',
'player': 'twitchweb',
'supported_codecs': 'av1,h265,h264',
'playlist_include_framerate': 'true',
'sig': signature,
'token': token,
})
except ExtractorError as e:
if (
not isinstance(e.cause, HTTPError)
or e.cause.status != 403
or e.cause.response.get_header('content-type') != 'application/json'
):
raise
error_info = traverse_obj(e.cause.response.read(), ({json.loads}, 0, {dict})) or {}
if error_info.get('error_code') in ('vod_manifest_restricted', 'unauthorized_entitlements'):
common_msg = 'access to this subscriber-only content'
if self._get_cookies('https://gql.twitch.tv').get('auth-token'):
raise ExtractorError(f'Your account does not have {common_msg}', expected=True)
self.raise_login_required(f'You must be logged into an account that has {common_msg}')
if error_msg := join_nonempty('error_code', 'error', from_dict=error_info, delim=': '):
raise ExtractorError(error_msg, expected=True)
raise
for fmt in formats:
if fmt.get('vcodec') and fmt['vcodec'].startswith('av01'):
# mpegts does not yet have proper support for av1

View File

@ -0,0 +1,32 @@
from .common import InfoExtractor
from .kaltura import KalturaIE
class UnitedNationsWebTvIE(InfoExtractor):
_VALID_URL = r'https?://webtv\.un\.org/(?:ar|zh|en|fr|ru|es)/asset/\w+/(?P<id>\w+)'
_TESTS = [{
'url': 'https://webtv.un.org/en/asset/k1o/k1o7stmi6p',
'md5': 'b2f8b3030063298ae841b4b7ddc01477',
'info_dict': {
'id': '1_o7stmi6p',
'ext': 'mp4',
'title': 'António Guterres (Secretary-General) on Israel and Iran - Security Council, 9939th meeting',
'thumbnail': 'http://cfvod.kaltura.com/p/2503451/sp/250345100/thumbnail/entry_id/1_o7stmi6p/version/100021',
'uploader_id': 'evgeniia.alisova@un.org',
'upload_date': '20250620',
'timestamp': 1750430976,
'duration': 234,
'view_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._html_search_regex(
r'partnerId:\s*(\d+)', webpage, 'partner_id')
entry_id = self._html_search_regex(
r'const\s+kentryID\s*=\s*["\'](\w+)["\']', webpage, 'kentry_id')
return self.url_result(f'kaltura:{partner_id}:{entry_id}', KalturaIE)

View File

@ -53,6 +53,10 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'(?:beta\.)?crunchyroll\.com',
r'viki\.com',
r'deezer\.com',
r'b-ch\.com',
r'ctv\.ca',
r'noovo\.ca',
r'tsn\.ca',
)
_TESTS = [{
@ -168,6 +172,18 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, {
'url': 'http://www.deezer.com/playlist/176747451',
'only_matching': True,
}, {
'url': 'https://www.b-ch.com/titles/8203/001',
'only_matching': True,
}, {
'url': 'https://www.ctv.ca/shows/masterchef-53506/the-audition-battles-s15e1',
'only_matching': True,
}, {
'url': 'https://www.noovo.ca/emissions/lamour-est-dans-le-pre/prets-pour-lamour-s10e1',
'only_matching': True,
}, {
'url': 'https://www.tsn.ca/video/relaxed-oilers-look-to-put-emotional-game-2-loss-in-the-rearview%7E3148747',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -21,6 +21,7 @@ from ..utils import (
js_to_json,
jwt_decode_hs256,
merge_dicts,
mimetype2ext,
parse_filesize,
parse_iso8601,
parse_qs,
@ -28,9 +29,11 @@ from ..utils import (
smuggle_url,
str_or_none,
traverse_obj,
try_call,
try_get,
unified_timestamp,
unsmuggle_url,
url_basename,
url_or_none,
urlencode_postdata,
urlhandle_detect_ext,
@ -45,14 +48,57 @@ class VimeoBaseInfoExtractor(InfoExtractor):
_REFERER_HINT = (
'Cannot download embed-only video without embedding URL. Please call yt-dlp '
'with the URL of the page that embeds this video.')
_IOS_CLIENT_AUTH = 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw=='
_IOS_CLIENT_HEADERS = {
_DEFAULT_CLIENT = 'android'
_DEFAULT_AUTHED_CLIENT = 'web'
_CLIENT_HEADERS = {
'Accept': 'application/vnd.vimeo.*+json; version=3.4.10',
'Accept-Language': 'en',
'User-Agent': 'Vimeo/11.10.0 (com.vimeo; build:250424.164813.0; iOS 18.4.1) Alamofire/5.9.0 VimeoNetworking/5.0.0',
}
_IOS_OAUTH_CACHE_KEY = 'oauth-token-ios'
_ios_oauth_token = None
_CLIENT_CONFIGS = {
'android': {
'CACHE_KEY': 'oauth-token-android',
'CACHE_ONLY': False,
'VIEWER_JWT': False,
'REQUIRES_AUTH': False,
'AUTH': 'NzRmYTg5YjgxMWExY2JiNzUwZDg1MjhkMTYzZjQ4YWYyOGEyZGJlMTp4OGx2NFd3QnNvY1lkamI2UVZsdjdDYlNwSDUrdm50YzdNNThvWDcwN1JrenJGZC9tR1lReUNlRjRSVklZeWhYZVpRS0tBcU9YYzRoTGY2Z1dlVkJFYkdJc0dMRHpoZWFZbU0reDRqZ1dkZ1diZmdIdGUrNUM5RVBySlM0VG1qcw==',
'USER_AGENT': 'com.vimeo.android.videoapp (OnePlus, ONEPLUS A6003, OnePlus, Android 14/34 Version 11.8.1) Kotlin VimeoNetworking/3.12.0',
'VIDEOS_FIELDS': (
'uri', 'name', 'description', 'type', 'link', 'player_embed_url', 'duration', 'width',
'language', 'height', 'embed', 'created_time', 'modified_time', 'release_time', 'content_rating',
'content_rating_class', 'rating_mod_locked', 'license', 'privacy', 'pictures', 'tags', 'stats',
'categories', 'uploader', 'metadata', 'user', 'files', 'download', 'app', 'play', 'status',
'resource_key', 'badge', 'upload', 'transcode', 'is_playable', 'has_audio',
),
},
'ios': {
'CACHE_KEY': 'oauth-token-ios',
'CACHE_ONLY': True,
'VIEWER_JWT': False,
'REQUIRES_AUTH': False,
'AUTH': 'MTMxNzViY2Y0NDE0YTQ5YzhjZTc0YmU0NjVjNDQxYzNkYWVjOWRlOTpHKzRvMmgzVUh4UkxjdU5FRW80cDNDbDhDWGR5dVJLNUJZZ055dHBHTTB4V1VzaG41bEx1a2hiN0NWYWNUcldSSW53dzRUdFRYZlJEZmFoTTArOTBUZkJHS3R4V2llYU04Qnl1bERSWWxUdXRidjNqR2J4SHFpVmtFSUcyRktuQw==',
'USER_AGENT': 'Vimeo/11.10.0 (com.vimeo; build:250424.164813.0; iOS 18.4.1) Alamofire/5.9.0 VimeoNetworking/5.0.0',
'VIDEOS_FIELDS': (
'uri', 'name', 'description', 'type', 'link', 'player_embed_url', 'duration',
'width', 'language', 'height', 'embed', 'created_time', 'modified_time', 'release_time',
'content_rating', 'content_rating_class', 'rating_mod_locked', 'license', 'config_url',
'embed_player_config_url', 'privacy', 'pictures', 'tags', 'stats', 'categories', 'uploader',
'metadata', 'user', 'files', 'download', 'app', 'play', 'status', 'resource_key', 'badge',
'upload', 'transcode', 'is_playable', 'has_audio',
),
},
'web': {
'VIEWER_JWT': True,
'REQUIRES_AUTH': True,
'USER_AGENT': None,
'VIDEOS_FIELDS': (
'config_url', 'created_time', 'description', 'license',
'metadata.connections.comments.total', 'metadata.connections.likes.total',
'release_time', 'stats.plays',
),
},
}
_oauth_tokens = {}
_viewer_info = None
@staticmethod
@ -80,7 +126,14 @@ class VimeoBaseInfoExtractor(InfoExtractor):
return self._viewer_info
@property
def _is_logged_in(self):
return 'vimeo' in self._get_cookies('https://vimeo.com')
def _perform_login(self, username, password):
if self._is_logged_in:
return
viewer = self._fetch_viewer_info()
data = {
'action': 'login',
@ -105,8 +158,8 @@ class VimeoBaseInfoExtractor(InfoExtractor):
raise ExtractorError('Unable to log in')
def _real_initialize(self):
if self._LOGIN_REQUIRED and not self._get_cookies('https://vimeo.com').get('vuid'):
self._raise_login_required()
if self._LOGIN_REQUIRED and not self._is_logged_in:
self.raise_login_required()
def _get_video_password(self):
password = self.get_param('videopassword')
@ -277,52 +330,95 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'_format_sort_fields': ('quality', 'res', 'fps', 'hdr:12', 'source'),
}
def _fetch_oauth_token(self):
if not self._ios_oauth_token:
self._ios_oauth_token = self.cache.load(self._NETRC_MACHINE, self._IOS_OAUTH_CACHE_KEY)
def _fetch_oauth_token(self, client):
client_config = self._CLIENT_CONFIGS[client]
if not self._ios_oauth_token:
self._ios_oauth_token = self._download_json(
if client_config['VIEWER_JWT']:
return f'jwt {self._fetch_viewer_info()["jwt"]}'
cache_key = client_config['CACHE_KEY']
if not self._oauth_tokens.get(cache_key):
self._oauth_tokens[cache_key] = self.cache.load(self._NETRC_MACHINE, cache_key)
if not self._oauth_tokens.get(cache_key):
if client_config['CACHE_ONLY']:
raise ExtractorError(
f'The {client} client is unable to fetch new OAuth tokens '
f'and is only intended for use with previously cached tokens', expected=True)
self._oauth_tokens[cache_key] = self._download_json(
'https://api.vimeo.com/oauth/authorize/client', None,
'Fetching OAuth token', 'Failed to fetch OAuth token',
f'Fetching {client} OAuth token', f'Failed to fetch {client} OAuth token',
headers={
'Authorization': f'Basic {self._IOS_CLIENT_AUTH}',
**self._IOS_CLIENT_HEADERS,
'Authorization': f'Basic {client_config["AUTH"]}',
'User-Agent': client_config['USER_AGENT'],
**self._CLIENT_HEADERS,
}, data=urlencode_postdata({
'grant_type': 'client_credentials',
'scope': 'private public create edit delete interact upload purchased stats',
'scope': 'private public create edit delete interact upload purchased stats video_files',
}, quote_via=urllib.parse.quote))['access_token']
self.cache.store(self._NETRC_MACHINE, self._IOS_OAUTH_CACHE_KEY, self._ios_oauth_token)
self.cache.store(self._NETRC_MACHINE, cache_key, self._oauth_tokens[cache_key])
return self._ios_oauth_token
return f'Bearer {self._oauth_tokens[cache_key]}'
def _get_requested_client(self):
default_client = self._DEFAULT_AUTHED_CLIENT if self._is_logged_in else self._DEFAULT_CLIENT
client = self._configuration_arg('client', [default_client], ie_key=VimeoIE)[0]
if client not in self._CLIENT_CONFIGS:
raise ExtractorError(
f'Unsupported API client "{client}" requested. '
f'Supported clients are: {", ".join(self._CLIENT_CONFIGS)}', expected=True)
return client
def _call_videos_api(self, video_id, unlisted_hash=None, path=None, *, force_client=None, query=None, **kwargs):
client = force_client or self._get_requested_client()
client_config = self._CLIENT_CONFIGS[client]
if client_config['REQUIRES_AUTH'] and not self._is_logged_in:
self.raise_login_required(f'The {client} client requires authentication')
def _call_videos_api(self, video_id, unlisted_hash=None, **kwargs):
return self._download_json(
join_nonempty(f'https://api.vimeo.com/videos/{video_id}', unlisted_hash, delim=':'),
video_id, 'Downloading API JSON', headers={
'Authorization': f'Bearer {self._fetch_oauth_token()}',
**self._IOS_CLIENT_HEADERS,
}, query={
'fields': ','.join((
'config_url', 'embed_player_config_url', 'player_embed_url', 'download', 'play',
'files', 'description', 'license', 'release_time', 'created_time', 'stats.plays',
'metadata.connections.comments.total', 'metadata.connections.likes.total')),
join_nonempty(
'https://api.vimeo.com/videos',
join_nonempty(video_id, unlisted_hash, delim=':'),
path, delim='/'),
video_id, f'Downloading {client} API JSON', f'Unable to download {client} API JSON',
headers=filter_dict({
'Authorization': self._fetch_oauth_token(client),
'User-Agent': client_config['USER_AGENT'],
**self._CLIENT_HEADERS,
}), query={
'fields': ','.join(client_config['VIDEOS_FIELDS']),
**(query or {}),
}, **kwargs)
def _extract_original_format(self, url, video_id, unlisted_hash=None, api_data=None):
def _extract_original_format(self, url, video_id, unlisted_hash=None):
# Original/source formats are only available when logged in
if not self._get_cookies('https://vimeo.com/').get('vimeo'):
return
if not self._is_logged_in:
return None
query = {'action': 'load_download_config'}
if unlisted_hash:
query['unlisted_hash'] = unlisted_hash
download_data = self._download_json(
url, video_id, 'Loading download config JSON', fatal=False,
query=query, headers={'X-Requested-With': 'XMLHttpRequest'},
expected_status=(403, 404)) or {}
source_file = download_data.get('source_file')
download_url = try_get(source_file, lambda x: x['download_url'])
policy = self._configuration_arg('original_format_policy', ['auto'], ie_key=VimeoIE)[0]
if policy == 'never':
return None
try:
download_data = self._download_json(
url, video_id, 'Loading download config JSON', query=filter_dict({
'action': 'load_download_config',
'unlisted_hash': unlisted_hash,
}), headers={
'Accept': 'application/json',
'X-Requested-With': 'XMLHttpRequest',
})
except ExtractorError as error:
self.write_debug(f'Unable to load download config JSON: {error.cause}')
download_data = None
source_file = traverse_obj(download_data, ('source_file', {dict})) or {}
download_url = traverse_obj(source_file, ('download_url', {url_or_none}))
if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
source_name = source_file.get('public_name', 'Original')
if self._is_valid_url(download_url, video_id, f'{source_name} video'):
@ -340,8 +436,27 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'quality': 1,
}
original_response = api_data or self._call_videos_api(
video_id, unlisted_hash, fatal=False, expected_status=(403, 404))
# Most web client API requests are subject to rate-limiting (429) when logged-in.
# Requesting only the 'privacy' field is NOT rate-limited,
# so first we should check if video even has 'download' formats available
try:
privacy_info = self._call_videos_api(
video_id, unlisted_hash, force_client='web', query={'fields': 'privacy'})
except ExtractorError as error:
self.write_debug(f'Unable to download privacy info: {error.cause}')
return None
if not traverse_obj(privacy_info, ('privacy', 'download', {bool})):
msg = f'{video_id}: Vimeo says this video is not downloadable'
if policy != 'always':
self.write_debug(
f'{msg}, so yt-dlp is not attempting to extract the original/source format. '
f'To try anyways, use --extractor-args "vimeo:original_format_policy=always"')
return None
self.write_debug(f'{msg}; attempting to extract original/source format anyways')
original_response = self._call_videos_api(
video_id, unlisted_hash, force_client='web', query={'fields': 'download'}, fatal=False)
for download_data in traverse_obj(original_response, ('download', ..., {dict})):
download_url = download_data.get('link')
if not download_url or download_data.get('quality') != 'source':
@ -919,25 +1034,125 @@ class VimeoIE(VimeoBaseInfoExtractor):
raise ExtractorError('Wrong video password', expected=True)
return checked
def _get_subtitles(self, video_id, unlisted_hash):
subs = {}
text_tracks = self._call_videos_api(
video_id, unlisted_hash, path='texttracks', query={
'include_transcript': 'true',
'fields': ','.join((
'active', 'display_language', 'id', 'language', 'link', 'name', 'type', 'uri',
)),
}, fatal=False)
for tt in traverse_obj(text_tracks, ('data', lambda _, v: url_or_none(v['link']))):
subs.setdefault(tt.get('language'), []).append({
'url': tt['link'],
'ext': 'vtt',
'name': tt.get('display_language'),
})
return subs
def _parse_api_response(self, video, video_id, unlisted_hash=None):
formats, subtitles = [], {}
seen_urls = set()
duration = traverse_obj(video, ('duration', {int_or_none}))
for file in traverse_obj(video, (
(('play', (None, 'progressive')), 'files', 'download'), lambda _, v: url_or_none(v['link']),
)):
format_url = file['link']
if format_url in seen_urls:
continue
seen_urls.add(format_url)
quality = file.get('quality')
ext = determine_ext(format_url)
if quality == 'hls' or ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
format_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
elif quality == 'dash' or ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
format_url, video_id, mpd_id='dash', fatal=False)
for fmt in fmts:
fmt['format_id'] = join_nonempty(
*fmt['format_id'].split('-', 2)[:2], int_or_none(fmt.get('tbr')))
else:
fmt = traverse_obj(file, {
'ext': ('type', {mimetype2ext(default='mp4')}),
'vcodec': ('codec', {str.lower}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
'filesize': ('size', {int_or_none}),
'fps': ('fps', {int_or_none}),
})
fmt.update({
'url': format_url,
'format_id': join_nonempty(
'http', traverse_obj(file, 'public_name', 'rendition'), quality),
'tbr': try_call(lambda: fmt['filesize'] * 8 / duration / 1024),
})
formats.append(fmt)
continue
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if traverse_obj(video, ('metadata', 'connections', 'texttracks', 'total', {int})):
self._merge_subtitles(self.extract_subtitles(video_id, unlisted_hash), target=subtitles)
return {
**traverse_obj(video, {
'title': ('name', {str}),
'uploader': ('user', 'name', {str}),
'uploader_id': ('user', 'link', {url_basename}),
'uploader_url': ('user', 'link', {url_or_none}),
'release_timestamp': ('live', 'scheduled_start_time', {int_or_none}),
'thumbnails': ('pictures', 'sizes', lambda _, v: url_or_none(v['link']), {
'url': 'link',
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
'id': video_id,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'live_status': {
'streaming': 'is_live',
'done': 'was_live',
}.get(traverse_obj(video, ('live', 'status', {str}))),
}
def _extract_from_api(self, video_id, unlisted_hash=None):
for retry in (False, True):
try:
video = self._call_videos_api(video_id, unlisted_hash)
break
except ExtractorError as e:
if (not retry and isinstance(e.cause, HTTPError) and e.cause.status == 400
and 'password' in traverse_obj(
self._webpage_read_content(e.cause.response, e.cause.response.url, video_id, fatal=False),
({json.loads}, 'invalid_parameters', ..., 'field'),
)):
if not isinstance(e.cause, HTTPError):
raise
response = traverse_obj(
self._webpage_read_content(e.cause.response, e.cause.response.url, video_id, fatal=False),
({json.loads}, {dict})) or {}
if (
not retry and e.cause.status == 400
and 'password' in traverse_obj(response, ('invalid_parameters', ..., 'field'))
):
self._verify_video_password(video_id)
continue
raise
elif e.cause.status == 404 and response.get('error_code') == 5460:
self.raise_login_required(join_nonempty(
traverse_obj(response, ('error', {str.strip})),
'Authentication may be needed due to your location.',
'If your IP address is located in Europe you could try using a VPN/proxy,',
f'or else u{self._login_hint()[1:]}',
delim=' '), method=None)
else:
raise
if config_url := traverse_obj(video, ('config_url', {url_or_none})):
info = self._parse_config(self._download_json(config_url, video_id), video_id)
else:
info = self._parse_api_response(video, video_id, unlisted_hash)
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
source_format = self._extract_original_format(
f'https://vimeo.com/{video_id}', video_id, unlisted_hash, api_data=video)
f'https://vimeo.com/{video_id}', video_id, unlisted_hash)
if source_format:
info['formats'].append(source_format)

View File

@ -1,5 +1,6 @@
import calendar
import copy
import dataclasses
import datetime as dt
import enum
import functools
@ -38,6 +39,60 @@ class _PoTokenContext(enum.Enum):
SUBS = 'subs'
class StreamingProtocol(enum.Enum):
HTTPS = 'https'
DASH = 'dash'
HLS = 'hls'
@dataclasses.dataclass
class BasePoTokenPolicy:
required: bool = False
# Try to fetch a PO Token even if it is not required.
recommended: bool = False
not_required_for_premium: bool = False
@dataclasses.dataclass
class GvsPoTokenPolicy(BasePoTokenPolicy):
not_required_with_player_token: bool = False
@dataclasses.dataclass
class PlayerPoTokenPolicy(BasePoTokenPolicy):
pass
@dataclasses.dataclass
class SubsPoTokenPolicy(BasePoTokenPolicy):
pass
WEB_PO_TOKEN_POLICIES = {
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'PLAYER_PO_TOKEN_POLICY': PlayerPoTokenPolicy(required=False),
# In rollout, currently detected via experiment
# Premium users DO require a PO Token for subtitles
'SUBS_PO_TOKEN_POLICY': SubsPoTokenPolicy(required=False),
}
# any clients starting with _ cannot be explicitly requested by the user
INNERTUBE_CLIENTS = {
'web': {
@ -48,8 +103,9 @@ INNERTUBE_CLIENTS = {
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True,
**WEB_PO_TOKEN_POLICIES,
'PLAYER_PARAMS': '8AEB',
},
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
'web_safari': {
@ -61,8 +117,9 @@ INNERTUBE_CLIENTS = {
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True,
**WEB_PO_TOKEN_POLICIES,
'PLAYER_PARAMS': '8AEB',
},
'web_embedded': {
'INNERTUBE_CONTEXT': {
@ -83,7 +140,24 @@ INNERTUBE_CLIENTS = {
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'SUPPORTS_COOKIES': True,
},
# This client now requires sign-in for every video
@ -95,7 +169,24 @@ INNERTUBE_CLIENTS = {
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'REQUIRE_AUTH': True,
'SUPPORTS_COOKIES': True,
},
@ -112,7 +203,24 @@ INNERTUBE_CLIENTS = {
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 3,
'REQUIRE_JS_PLAYER': False,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_with_player_token=True,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_with_player_token=True,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
not_required_with_player_token=True,
),
},
'PLAYER_PO_TOKEN_POLICY': PlayerPoTokenPolicy(required=False, recommended=True),
},
# YouTube Kids videos aren't returned on this client for some reason
'android_vr': {
@ -146,7 +254,21 @@ INNERTUBE_CLIENTS = {
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_with_player_token=True,
),
# HLS Livestreams require POT 30 seconds in
# TODO: Rolling out
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
not_required_with_player_token=True,
),
},
'PLAYER_PO_TOKEN_POLICY': PlayerPoTokenPolicy(required=False, recommended=True),
'REQUIRE_JS_PLAYER': False,
},
# mweb has 'ultralow' formats
@ -161,7 +283,24 @@ INNERTUBE_CLIENTS = {
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'GVS_PO_TOKEN_POLICY': {
StreamingProtocol.HTTPS: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.DASH: GvsPoTokenPolicy(
required=True,
recommended=True,
not_required_for_premium=True,
not_required_with_player_token=False,
),
StreamingProtocol.HLS: GvsPoTokenPolicy(
required=False,
recommended=True,
),
},
'SUPPORTS_COOKIES': True,
},
'tv': {
@ -174,6 +313,7 @@ INNERTUBE_CLIENTS = {
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
'SUPPORTS_COOKIES': True,
'PLAYER_PARAMS': '8AEB',
},
'tv_simply': {
'INNERTUBE_CONTEXT': {
@ -224,7 +364,11 @@ def build_innertube_clients():
for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()):
ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com')
ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
ytcfg.setdefault('PO_TOKEN_REQUIRED_CONTEXTS', [])
ytcfg.setdefault('GVS_PO_TOKEN_POLICY', {})
for protocol in StreamingProtocol:
ytcfg['GVS_PO_TOKEN_POLICY'].setdefault(protocol, GvsPoTokenPolicy())
ytcfg.setdefault('PLAYER_PO_TOKEN_POLICY', PlayerPoTokenPolicy())
ytcfg.setdefault('SUBS_PO_TOKEN_POLICY', SubsPoTokenPolicy())
ytcfg.setdefault('REQUIRE_AUTH', False)
ytcfg.setdefault('SUPPORTS_COOKIES', False)
ytcfg.setdefault('PLAYER_PARAMS', None)

View File

@ -317,17 +317,31 @@ class YoutubeTabBaseInfoExtractor(YoutubeBaseInfoExtractor):
content_id = view_model.get('contentId')
if not content_id:
return
content_type = view_model.get('contentType')
if content_type not in ('LOCKUP_CONTENT_TYPE_PLAYLIST', 'LOCKUP_CONTENT_TYPE_PODCAST'):
if content_type == 'LOCKUP_CONTENT_TYPE_VIDEO':
ie = YoutubeIE
url = f'https://www.youtube.com/watch?v={content_id}'
thumb_keys = (None,)
elif content_type in ('LOCKUP_CONTENT_TYPE_PLAYLIST', 'LOCKUP_CONTENT_TYPE_PODCAST'):
ie = YoutubeTabIE
url = f'https://www.youtube.com/playlist?list={content_id}'
thumb_keys = ('collectionThumbnailViewModel', 'primaryThumbnail')
else:
self.report_warning(
f'Unsupported lockup view model content type "{content_type}"{bug_reports_message()}', only_once=True)
f'Unsupported lockup view model content type "{content_type}"{bug_reports_message()}',
only_once=True)
return
return self.url_result(
f'https://www.youtube.com/playlist?list={content_id}', ie=YoutubeTabIE, video_id=content_id,
url, ie, content_id,
title=traverse_obj(view_model, (
'metadata', 'lockupMetadataViewModel', 'title', 'content', {str})),
thumbnails=self._extract_thumbnails(view_model, (
'contentImage', 'collectionThumbnailViewModel', 'primaryThumbnail', 'thumbnailViewModel', 'image'), final_key='sources'))
'contentImage', *thumb_keys, 'thumbnailViewModel', 'image'), final_key='sources'),
duration=traverse_obj(view_model, (
'contentImage', 'thumbnailViewModel', 'overlays', ..., 'thumbnailOverlayBadgeViewModel',
'thumbnailBadges', ..., 'thumbnailBadgeViewModel', 'text', {parse_duration}, any)))
def _rich_entries(self, rich_grid_renderer):
if lockup_view_model := traverse_obj(rich_grid_renderer, ('content', 'lockupViewModel', {dict})):

View File

@ -18,6 +18,9 @@ import urllib.parse
from ._base import (
INNERTUBE_CLIENTS,
BadgeType,
GvsPoTokenPolicy,
PlayerPoTokenPolicy,
StreamingProtocol,
YoutubeBaseInfoExtractor,
_PoTokenContext,
_split_innertube_client,
@ -26,7 +29,7 @@ from ._base import (
from .pot._director import initialize_pot_director
from .pot.provider import PoTokenContext, PoTokenRequest
from ..openload import PhantomJSwrapper
from ...jsinterp import JSInterpreter
from ...jsinterp import JSInterpreter, LocalNameSpace
from ...networking.exceptions import HTTPError
from ...utils import (
NO_DEFAULT,
@ -71,9 +74,11 @@ from ...utils import (
from ...utils.networking import clean_headers, clean_proxies, select_proxy
STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client'
STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token'
STREAMING_DATA_FETCH_SUBS_PO_TOKEN = '__yt_dlp_fetch_subs_po_token'
STREAMING_DATA_FETCH_GVS_PO_TOKEN = '__yt_dlp_fetch_gvs_po_token'
STREAMING_DATA_PLAYER_TOKEN_PROVIDED = '__yt_dlp_player_token_provided'
STREAMING_DATA_INNERTUBE_CONTEXT = '__yt_dlp_innertube_context'
STREAMING_DATA_IS_PREMIUM_SUBSCRIBER = '__yt_dlp_is_premium_subscriber'
PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide'
@ -253,6 +258,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'srt', 'vtt')
_DEFAULT_CLIENTS = ('tv', 'ios', 'web')
_DEFAULT_AUTHED_CLIENTS = ('tv', 'web')
# Premium does not require POT (except for subtitles)
_DEFAULT_PREMIUM_CLIENTS = ('tv', 'web')
_GEO_BYPASS = False
@ -1801,6 +1808,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'tablet': 'player-plasma-ias-tablet-en_US.vflset/base.js',
}
_INVERSE_PLAYER_JS_VARIANT_MAP = {v: k for k, v in _PLAYER_JS_VARIANT_MAP.items()}
_NSIG_FUNC_CACHE_ID = 'nsig func'
_DUMMY_STRING = 'dlp_wins'
@classmethod
def suitable(cls, url):
@ -1831,7 +1840,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if time.time() <= start_time + delay:
return
_, _, prs, player_url = self._download_player_responses(url, smuggled_data, video_id, webpage_url)
_, _, _, _, prs, player_url = self._initial_extract(
url, smuggled_data, webpage_url, 'web', video_id)
video_details = traverse_obj(prs, (..., 'videoDetails'), expected_type=dict)
microformats = traverse_obj(
prs, (..., 'microformat', 'playerMicroformatRenderer'),
@ -2204,7 +2214,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.to_screen(f'Extracted nsig function from {player_id}:\n{func_code[1]}\n')
try:
extract_nsig = self._cached(self._extract_n_function_from_code, 'nsig func', player_url)
extract_nsig = self._cached(self._extract_n_function_from_code, self._NSIG_FUNC_CACHE_ID, player_url)
ret = extract_nsig(jsi, func_code)(s)
except JSInterpreter.Exception as e:
try:
@ -2312,16 +2322,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
jsi = JSInterpreter(varcode)
interpret_global_var = self._cached(jsi.interpret_expression, 'js global list', player_url)
return varname, interpret_global_var(varvalue, {}, allow_recursion=10)
return varname, interpret_global_var(varvalue, LocalNameSpace(), allow_recursion=10)
def _fixup_n_function_code(self, argnames, nsig_code, jscode, player_url):
# Fixup global array
varname, global_list = self._interpret_player_js_global_var(jscode, player_url)
if varname and global_list:
nsig_code = f'var {varname}={json.dumps(global_list)}; {nsig_code}'
else:
varname = 'dlp_wins'
varname = self._DUMMY_STRING
global_list = []
# Fixup typeof check
undefined_idx = global_list.index('undefined') if 'undefined' in global_list else r'\d+'
fixed_code = re.sub(
fr'''(?x)
@ -2334,6 +2346,32 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.write_debug(join_nonempty(
'No typeof statement found in nsig function code',
player_url and f' player = {player_url}', delim='\n'), only_once=True)
# Fixup global funcs
jsi = JSInterpreter(fixed_code)
cache_id = (self._NSIG_FUNC_CACHE_ID, player_url)
try:
self._cached(
self._extract_n_function_from_code, *cache_id)(jsi, (argnames, fixed_code))(self._DUMMY_STRING)
except JSInterpreter.Exception:
self._player_cache.pop(cache_id, None)
global_funcnames = jsi._undefined_varnames
debug_names = []
jsi = JSInterpreter(jscode)
for func_name in global_funcnames:
try:
func_args, func_code = jsi.extract_function_code(func_name)
fixed_code = f'var {func_name} = function({", ".join(func_args)}) {{ {func_code} }}; {fixed_code}'
debug_names.append(func_name)
except Exception:
self.report_warning(join_nonempty(
f'Unable to extract global nsig function {func_name} from player JS',
player_url and f' player = {player_url}', delim='\n'), only_once=True)
if debug_names:
self.write_debug(f'Extracted global nsig functions: {", ".join(debug_names)}')
return argnames, fixed_code
def _extract_n_function_code(self, video_id, player_url):
@ -2347,7 +2385,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
func_name = self._extract_n_function_name(jscode, player_url=player_url)
# XXX: Workaround for the global array variable and lack of `typeof` implementation
# XXX: Work around (a) global array variable, (b) `typeof` short-circuit, (c) global functions
func_code = self._fixup_n_function_code(*jsi.extract_function_code(func_name), jscode, player_url)
return jsi, player_id, func_code
@ -2861,7 +2899,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
only_once=True)
continue
def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None,
def fetch_po_token(self, client='web', context: _PoTokenContext = _PoTokenContext.GVS, ytcfg=None, visitor_data=None,
data_sync_id=None, session_index=None, player_url=None, video_id=None, webpage=None,
required=False, **kwargs):
"""
@ -2946,7 +2984,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
fetch_pot_policy == 'never'
or (
fetch_pot_policy == 'auto'
and _PoTokenContext(context) not in self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
and not kwargs.get('required', False)
)
):
@ -3005,19 +3042,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _is_unplayable(player_response):
return traverse_obj(player_response, ('playabilityStatus', 'status')) == 'UNPLAYABLE'
def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg, player_url, initial_pr, visitor_data, data_sync_id, po_token):
def _extract_player_response(self, client, video_id, webpage_ytcfg, player_ytcfg, player_url, initial_pr, visitor_data, data_sync_id, po_token):
headers = self.generate_api_headers(
ytcfg=player_ytcfg,
default_client=client,
visitor_data=visitor_data,
session_index=self._extract_session_index(master_ytcfg, player_ytcfg),
session_index=self._extract_session_index(webpage_ytcfg, player_ytcfg),
delegated_session_id=(
self._parse_data_sync_id(data_sync_id)[0]
or self._extract_delegated_session_id(master_ytcfg, initial_pr, player_ytcfg)
or self._extract_delegated_session_id(webpage_ytcfg, initial_pr, player_ytcfg)
),
user_session_id=(
self._parse_data_sync_id(data_sync_id)[1]
or self._extract_user_session_id(master_ytcfg, initial_pr, player_ytcfg)
or self._extract_user_session_id(webpage_ytcfg, initial_pr, player_ytcfg)
),
)
@ -3033,7 +3070,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if po_token:
yt_query['serviceIntegrityDimensions'] = {'poToken': po_token}
sts = self._extract_signature_timestamp(video_id, player_url, master_ytcfg, fatal=False) if player_url else None
sts = self._extract_signature_timestamp(video_id, player_url, webpage_ytcfg, fatal=False) if player_url else None
yt_query.update(self._generate_player_context(sts))
return self._extract_response(
item_id=video_id, ep='player', query=yt_query,
@ -3042,10 +3079,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
note='Downloading {} player API JSON'.format(client.replace('_', ' ').strip()),
) or None
def _get_requested_clients(self, url, smuggled_data):
def _get_requested_clients(self, url, smuggled_data, is_premium_subscriber):
requested_clients = []
excluded_clients = []
default_clients = self._DEFAULT_AUTHED_CLIENTS if self.is_authenticated else self._DEFAULT_CLIENTS
default_clients = (
self._DEFAULT_PREMIUM_CLIENTS if is_premium_subscriber
else self._DEFAULT_AUTHED_CLIENTS if self.is_authenticated
else self._DEFAULT_CLIENTS
)
allowed_clients = sorted(
(client for client in INNERTUBE_CLIENTS if client[:1] != '_'),
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
@ -3087,11 +3128,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if (pr_id := traverse_obj(pr, ('videoDetails', 'videoId'))) != video_id:
return pr_id
def _extract_player_responses(self, clients, video_id, webpage, master_ytcfg, smuggled_data):
def _extract_player_responses(self, clients, video_id, webpage, webpage_client, webpage_ytcfg, is_premium_subscriber):
initial_pr = None
if webpage:
initial_pr = self._search_json(
self._YT_INITIAL_PLAYER_RESPONSE_RE, webpage, 'initial player response', video_id, fatal=False)
self._YT_INITIAL_PLAYER_RESPONSE_RE, webpage,
f'{webpage_client} client initial player response', video_id, fatal=False)
prs = []
deprioritized_prs = []
@ -3122,11 +3164,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
while clients:
deprioritize_pr = False
client, base_client, variant = _split_innertube_client(clients.pop())
player_ytcfg = master_ytcfg if client == 'web' else {}
if 'configs' not in self._configuration_arg('player_skip') and client != 'web':
player_ytcfg = webpage_ytcfg if client == webpage_client else {}
if 'configs' not in self._configuration_arg('player_skip') and client != webpage_client:
player_ytcfg = self._download_ytcfg(client, video_id) or player_ytcfg
player_url = player_url or self._extract_player_url(master_ytcfg, player_ytcfg, webpage=webpage)
player_url = player_url or self._extract_player_url(webpage_ytcfg, player_ytcfg, webpage=webpage)
require_js_player = self._get_default_ytcfg(client).get('REQUIRE_JS_PLAYER')
if 'js' in self._configuration_arg('player_skip'):
require_js_player = False
@ -3136,10 +3178,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
player_url = self._download_player_url(video_id)
tried_iframe_fallback = True
pr = initial_pr if client == 'web' else None
pr = None
if client == webpage_client and 'player_response' not in self._configuration_arg('webpage_skip'):
pr = initial_pr
visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg)
data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg)
visitor_data = visitor_data or self._extract_visitor_data(webpage_ytcfg, initial_pr, player_ytcfg)
data_sync_id = data_sync_id or self._extract_data_sync_id(webpage_ytcfg, initial_pr, player_ytcfg)
fetch_po_token_args = {
'client': client,
@ -3148,53 +3192,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'data_sync_id': data_sync_id if self.is_authenticated else None,
'player_url': player_url if require_js_player else None,
'webpage': webpage,
'session_index': self._extract_session_index(master_ytcfg, player_ytcfg),
'session_index': self._extract_session_index(webpage_ytcfg, player_ytcfg),
'ytcfg': player_ytcfg or self._get_default_ytcfg(client),
}
# Don't need a player PO token for WEB if using player response from webpage
player_pot_policy: PlayerPoTokenPolicy = self._get_default_ytcfg(client)['PLAYER_PO_TOKEN_POLICY']
player_po_token = None if pr else self.fetch_po_token(
context=_PoTokenContext.PLAYER, **fetch_po_token_args)
context=_PoTokenContext.PLAYER, **fetch_po_token_args,
required=player_pot_policy.required or player_pot_policy.recommended)
gvs_po_token = self.fetch_po_token(
context=_PoTokenContext.GVS, **fetch_po_token_args)
fetch_gvs_po_token_func = functools.partial(
self.fetch_po_token, context=_PoTokenContext.GVS, **fetch_po_token_args)
fetch_subs_po_token_func = functools.partial(
self.fetch_po_token,
context=_PoTokenContext.SUBS,
**fetch_po_token_args,
)
required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
if (
not player_po_token
and _PoTokenContext.PLAYER in required_pot_contexts
):
# TODO: may need to skip player response request. Unsure yet..
self.report_warning(
f'No Player PO Token provided for {client} client, '
f'which may be required for working {client} formats. This client will be deprioritized'
f'You can manually pass a Player PO Token for this client with --extractor-args "youtube:po_token={client}.player+XXX". '
f'For more information, refer to {PO_TOKEN_GUIDE_URL} .', only_once=True)
deprioritize_pr = True
if (
not gvs_po_token
and _PoTokenContext.GVS in required_pot_contexts
and 'missing_pot' in self._configuration_arg('formats')
):
# note: warning with help message is provided later during format processing
self.report_warning(
f'No GVS PO Token provided for {client} client, '
f'which may be required for working {client} formats. This client will be deprioritized',
only_once=True)
deprioritize_pr = True
self.fetch_po_token, context=_PoTokenContext.SUBS, **fetch_po_token_args)
try:
pr = pr or self._extract_player_response(
client, video_id,
master_ytcfg=player_ytcfg or master_ytcfg,
webpage_ytcfg=player_ytcfg or webpage_ytcfg,
player_ytcfg=player_ytcfg,
player_url=player_url,
initial_pr=initial_pr,
@ -3212,12 +3229,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
innertube_context = traverse_obj(player_ytcfg or self._get_default_ytcfg(client), 'INNERTUBE_CONTEXT')
sd = pr.setdefault('streamingData', {})
sd[STREAMING_DATA_CLIENT_NAME] = client
sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
sd[STREAMING_DATA_FETCH_GVS_PO_TOKEN] = fetch_gvs_po_token_func
sd[STREAMING_DATA_PLAYER_TOKEN_PROVIDED] = bool(player_po_token)
sd[STREAMING_DATA_INNERTUBE_CONTEXT] = innertube_context
sd[STREAMING_DATA_FETCH_SUBS_PO_TOKEN] = fetch_subs_po_token_func
sd[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER] = is_premium_subscriber
for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})):
f[STREAMING_DATA_CLIENT_NAME] = client
f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
f[STREAMING_DATA_FETCH_GVS_PO_TOKEN] = fetch_gvs_po_token_func
f[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER] = is_premium_subscriber
f[STREAMING_DATA_PLAYER_TOKEN_PROVIDED] = bool(player_po_token)
if deprioritize_pr:
deprioritized_prs.append(pr)
else:
@ -3243,6 +3264,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# web_creator may work around age-verification for all videos but requires PO token
append_client('tv_embedded', 'web_creator')
status = traverse_obj(pr, ('playabilityStatus', 'status', {str}))
if status not in ('OK', 'LIVE_STREAM_OFFLINE', 'AGE_CHECK_REQUIRED', 'AGE_VERIFICATION_REQUIRED'):
self.write_debug(f'{video_id}: {client} player response playability status: {status}')
prs.extend(deprioritized_prs)
if skipped_clients:
@ -3323,6 +3348,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}),
} for range_start in range(0, f['filesize'], CHUNK_SIZE))
def gvs_pot_required(policy, is_premium_subscriber, has_player_token):
return (
policy.required
and not (policy.not_required_with_player_token and has_player_token)
and not (policy.not_required_for_premium and is_premium_subscriber))
# save pots per client to avoid fetching again
gvs_pots = {}
for fmt in streaming_formats:
client_name = fmt[STREAMING_DATA_CLIENT_NAME]
if fmt.get('targetDurationSec'):
@ -3382,7 +3416,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
encrypted_sig = try_get(sc, lambda x: x['s'][0])
if not all((sc, fmt_url, player_url, encrypted_sig)):
msg = f'Some {client_name} client https formats have been skipped as they are missing a url. '
if client_name == 'web':
if client_name in ('web', 'web_safari'):
msg += 'YouTube is forcing SABR streaming for this client. '
else:
msg += (
@ -3442,18 +3476,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.report_warning(
'Some formats are possibly damaged. They will be deprioritized', video_id, only_once=True)
po_token = fmt.get(STREAMING_DATA_INITIAL_PO_TOKEN)
fetch_po_token_func = fmt[STREAMING_DATA_FETCH_GVS_PO_TOKEN]
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(client_name)['GVS_PO_TOKEN_POLICY'][StreamingProtocol.HTTPS]
require_po_token = (
itag not in ['18']
and gvs_pot_required(
pot_policy, fmt[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER],
fmt[STREAMING_DATA_PLAYER_TOKEN_PROVIDED]))
po_token = (
gvs_pots.get(client_name)
or fetch_po_token_func(required=require_po_token or pot_policy.recommended))
if po_token:
fmt_url = update_url_query(fmt_url, {'pot': po_token})
if client_name not in gvs_pots:
gvs_pots[client_name] = po_token
# Clients that require PO Token return videoplayback URLs that may return 403
require_po_token = (
not po_token
and _PoTokenContext.GVS in self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
and itag not in ['18']) # these formats do not require PO Token
if require_po_token and 'missing_pot' not in self._configuration_arg('formats'):
if not po_token and require_po_token and 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, 'https')
continue
@ -3468,7 +3509,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
name, fmt.get('isDrc') and 'DRC',
try_get(fmt, lambda x: x['projectionType'].replace('RECTANGULAR', '').lower()),
try_get(fmt, lambda x: x['spatialAudioType'].replace('SPATIAL_AUDIO_TYPE_', '').lower()),
is_damaged and 'DAMAGED', require_po_token and 'MISSING POT',
is_damaged and 'DAMAGED', require_po_token and not po_token and 'MISSING POT',
(self.get_param('verbose') or all_formats) and short_client_name(client_name),
delim=', '),
# Format 22 is likely to be damaged. See https://github.com/yt-dlp/yt-dlp/issues/3372
@ -3531,7 +3572,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
elif skip_bad_formats and live_status == 'is_live' and needs_live_processing != 'is_live':
skip_manifests.add('dash')
def process_manifest_format(f, proto, client_name, itag, po_token):
def process_manifest_format(f, proto, client_name, itag, missing_pot):
key = (proto, f.get('language'))
if not all_formats and key in itags[itag]:
return False
@ -3539,19 +3580,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if f.get('source_preference') is None:
f['source_preference'] = -1
# Clients that require PO Token return videoplayback URLs that may return 403
# hls does not currently require PO Token
if (
not po_token
and _PoTokenContext.GVS in self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
and proto != 'hls'
):
if 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, proto)
return False
if missing_pot:
f['format_note'] = join_nonempty(f.get('format_note'), 'MISSING POT', delim=' ')
f['source_preference'] -= 20
# XXX: Check if IOS HLS formats are affected by PO token enforcement; temporary
# See https://github.com/yt-dlp/yt-dlp/issues/13511
if proto == 'hls' and client_name == 'ios':
f['__needs_testing'] = True
itags[itag].add(key)
if itag and all_formats:
@ -3586,39 +3623,62 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
subtitles = {}
for sd in streaming_data:
client_name = sd[STREAMING_DATA_CLIENT_NAME]
po_token = sd.get(STREAMING_DATA_INITIAL_PO_TOKEN)
fetch_pot_func = sd[STREAMING_DATA_FETCH_GVS_PO_TOKEN]
is_premium_subscriber = sd[STREAMING_DATA_IS_PREMIUM_SUBSCRIBER]
has_player_token = sd[STREAMING_DATA_PLAYER_TOKEN_PROVIDED]
hls_manifest_url = 'hls' not in skip_manifests and sd.get('hlsManifestUrl')
if hls_manifest_url:
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(
client_name)['GVS_PO_TOKEN_POLICY'][StreamingProtocol.HLS]
require_po_token = gvs_pot_required(pot_policy, is_premium_subscriber, has_player_token)
po_token = gvs_pots.get(client_name, fetch_pot_func(required=require_po_token or pot_policy.recommended))
if po_token:
hls_manifest_url = hls_manifest_url.rstrip('/') + f'/pot/{po_token}'
fmts, subs = self._extract_m3u8_formats_and_subtitles(
hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live')
for sub in traverse_obj(subs, (..., ..., {dict})):
# HLS subs (m3u8) do not need a PO token; save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles)
for f in fmts:
if process_manifest_format(f, 'hls', client_name, self._search_regex(
r'/itag/(\d+)', f['url'], 'itag', default=None), po_token):
yield f
if client_name not in gvs_pots:
gvs_pots[client_name] = po_token
if require_po_token and not po_token and 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, 'hls')
else:
fmts, subs = self._extract_m3u8_formats_and_subtitles(
hls_manifest_url, video_id, 'mp4', fatal=False, live=live_status == 'is_live')
for sub in traverse_obj(subs, (..., ..., {dict})):
# TODO: If HLS video requires a PO Token, do the subs also require pot?
# Save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles)
for f in fmts:
if process_manifest_format(f, 'hls', client_name, self._search_regex(
r'/itag/(\d+)', f['url'], 'itag', default=None), require_po_token and not po_token):
yield f
dash_manifest_url = 'dash' not in skip_manifests and sd.get('dashManifestUrl')
if dash_manifest_url:
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(
client_name)['GVS_PO_TOKEN_POLICY'][StreamingProtocol.DASH]
require_po_token = gvs_pot_required(pot_policy, is_premium_subscriber, has_player_token)
po_token = gvs_pots.get(client_name, fetch_pot_func(required=require_po_token or pot_policy.recommended))
if po_token:
dash_manifest_url = dash_manifest_url.rstrip('/') + f'/pot/{po_token}'
formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False)
for sub in traverse_obj(subs, (..., ..., {dict})):
# TODO: Investigate if DASH subs ever need a PO token; save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH
for f in formats:
if process_manifest_format(f, 'dash', client_name, f['format_id'], po_token):
f['filesize'] = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url') or f['url'], 'file size', default=None))
if needs_live_processing:
f['is_from_start'] = True
if client_name not in gvs_pots:
gvs_pots[client_name] = po_token
if require_po_token and not po_token and 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, 'dash')
else:
formats, subs = self._extract_mpd_formats_and_subtitles(dash_manifest_url, video_id, fatal=False)
for sub in traverse_obj(subs, (..., ..., {dict})):
# TODO: If DASH video requires a PO Token, do the subs also require pot?
# Save client name for debugging
sub[STREAMING_DATA_CLIENT_NAME] = client_name
subtitles = self._merge_subtitles(subs, subtitles) # Prioritize HLS subs over DASH
for f in formats:
if process_manifest_format(f, 'dash', client_name, f['format_id'], require_po_token and not po_token):
f['filesize'] = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url') or f['url'], 'file size', default=None))
if needs_live_processing:
f['is_from_start'] = True
yield f
yield f
yield subtitles
def _extract_storyboard(self, player_responses, duration):
@ -3659,22 +3719,22 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
} for j in range(math.ceil(fragment_count))],
}
def _download_player_responses(self, url, smuggled_data, video_id, webpage_url):
def _download_initial_webpage(self, webpage_url, webpage_client, video_id):
webpage = None
if 'webpage' not in self._configuration_arg('player_skip'):
if webpage_url and 'webpage' not in self._configuration_arg('player_skip'):
query = {'bpctr': '9999999999', 'has_verified': '1'}
pp = self._configuration_arg('player_params', [None], casesense=True)[0]
pp = (
self._configuration_arg('player_params', [None], casesense=True)[0]
or traverse_obj(INNERTUBE_CLIENTS, (webpage_client, 'PLAYER_PARAMS', {str}))
)
if pp:
query['pp'] = pp
webpage = self._download_webpage_with_retries(webpage_url, video_id, query=query)
master_ytcfg = self.extract_ytcfg(video_id, webpage) or self._get_default_ytcfg()
player_responses, player_url = self._extract_player_responses(
self._get_requested_clients(url, smuggled_data),
video_id, webpage, master_ytcfg, smuggled_data)
return webpage, master_ytcfg, player_responses, player_url
webpage = self._download_webpage_with_retries(
webpage_url, video_id, query=query,
headers=traverse_obj(self._get_default_ytcfg(webpage_client), {
'User-Agent': ('INNERTUBE_CONTEXT', 'client', 'userAgent', {str}),
}))
return webpage
def _list_formats(self, video_id, microformats, video_details, player_responses, player_url, duration=None):
live_broadcast_details = traverse_obj(microformats, (..., 'liveBroadcastDetails'))
@ -3699,14 +3759,60 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return live_broadcast_details, live_status, streaming_data, formats, subtitles
def _download_initial_data(self, video_id, webpage, webpage_client, webpage_ytcfg):
initial_data = None
if webpage and 'initial_data' not in self._configuration_arg('webpage_skip'):
initial_data = self.extract_yt_initial_data(video_id, webpage, fatal=False)
if not traverse_obj(initial_data, 'contents'):
self.report_warning('Incomplete data received in embedded initial data; re-fetching using API.')
initial_data = None
if not initial_data and 'initial_data' not in self._configuration_arg('player_skip'):
query = {'videoId': video_id}
query.update(self._get_checkok_params())
initial_data = self._extract_response(
item_id=video_id, ep='next', fatal=False,
ytcfg=webpage_ytcfg, query=query, check_get_keys='contents',
note='Downloading initial data API JSON', default_client=webpage_client)
return initial_data
def _is_premium_subscriber(self, initial_data):
if not self.is_authenticated or not initial_data:
return False
tlr = traverse_obj(
initial_data, ('topbar', 'desktopTopbarRenderer', 'logo', 'topbarLogoRenderer'))
return (
traverse_obj(tlr, ('iconImage', 'iconType')) == 'YOUTUBE_PREMIUM_LOGO'
or 'premium' in (self._get_text(tlr, 'tooltipText') or '').lower()
)
def _initial_extract(self, url, smuggled_data, webpage_url, webpage_client, video_id):
# This function is also used by live-from-start refresh
webpage = self._download_initial_webpage(webpage_url, webpage_client, video_id)
webpage_ytcfg = self.extract_ytcfg(video_id, webpage) or self._get_default_ytcfg(webpage_client)
initial_data = self._download_initial_data(video_id, webpage, webpage_client, webpage_ytcfg)
is_premium_subscriber = self._is_premium_subscriber(initial_data)
if is_premium_subscriber:
self.write_debug('Detected YouTube Premium subscription')
player_responses, player_url = self._extract_player_responses(
self._get_requested_clients(url, smuggled_data, is_premium_subscriber),
video_id, webpage, webpage_client, webpage_ytcfg, is_premium_subscriber)
return webpage, webpage_ytcfg, initial_data, is_premium_subscriber, player_responses, player_url
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
base_url = self.http_scheme() + '//www.youtube.com/'
webpage_url = base_url + 'watch?v=' + video_id
webpage_client = 'web'
webpage, master_ytcfg, player_responses, player_url = self._download_player_responses(url, smuggled_data, video_id, webpage_url)
webpage, webpage_ytcfg, initial_data, is_premium_subscriber, player_responses, player_url = self._initial_extract(
url, smuggled_data, webpage_url, webpage_client, video_id)
playability_statuses = traverse_obj(
player_responses, (..., 'playabilityStatus'), expected_type=dict)
@ -3943,7 +4049,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def process_language(container, base_url, lang_code, sub_name, client_name, query):
lang_subs = container.setdefault(lang_code, [])
for fmt in self._SUBTITLE_FORMATS:
query = {**query, 'fmt': fmt}
# xosf=1 results in undesirable text position data for vtt, json3 & srv* subtitles
# See: https://github.com/yt-dlp/yt-dlp/issues/13654
query = {**query, 'fmt': fmt, 'xosf': []}
lang_subs.append({
'ext': fmt,
'url': urljoin('https://www.youtube.com', update_url_query(base_url, query)),
@ -3979,7 +4087,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
pctr = pr['captions']['playerCaptionsTracklistRenderer']
client_name = pr['streamingData'][STREAMING_DATA_CLIENT_NAME]
innertube_client_name = pr['streamingData'][STREAMING_DATA_INNERTUBE_CONTEXT]['client']['clientName']
required_contexts = self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
pot_policy: GvsPoTokenPolicy = self._get_default_ytcfg(client_name)['SUBS_PO_TOKEN_POLICY']
fetch_subs_po_token_func = pr['streamingData'][STREAMING_DATA_FETCH_SUBS_PO_TOKEN]
pot_params = {}
@ -3992,11 +4100,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
requires_pot = (
# We can detect the experiment for now
any(e in traverse_obj(qs, ('exp', ...)) for e in ('xpe', 'xpv'))
or _PoTokenContext.SUBS in required_contexts)
or (pot_policy.required and not (pot_policy.not_required_for_premium and is_premium_subscriber)))
if not already_fetched_pot:
already_fetched_pot = True
if subs_po_token := fetch_subs_po_token_func(required=requires_pot):
if subs_po_token := fetch_subs_po_token_func(required=requires_pot or pot_policy.recommended):
pot_params.update({
'pot': subs_po_token,
'potc': '1',
@ -4099,21 +4207,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'release_year': int_or_none(release_year),
})
initial_data = None
if webpage:
initial_data = self.extract_yt_initial_data(video_id, webpage, fatal=False)
if not traverse_obj(initial_data, 'contents'):
self.report_warning('Incomplete data received in embedded initial data; re-fetching using API.')
initial_data = None
if not initial_data and 'initial_data' not in self._configuration_arg('player_skip'):
query = {'videoId': video_id}
query.update(self._get_checkok_params())
initial_data = self._extract_response(
item_id=video_id, ep='next', fatal=False,
ytcfg=master_ytcfg, query=query, check_get_keys='contents',
headers=self.generate_api_headers(ytcfg=master_ytcfg),
note='Downloading initial data API JSON')
COMMENTS_SECTION_IDS = ('comment-item-section', 'engagement-panel-comments-section')
info['comment_count'] = traverse_obj(initial_data, (
'contents', 'twoColumnWatchNextResults', 'results', 'results', 'contents', ..., 'itemSectionRenderer',
@ -4280,6 +4373,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if upload_date and live_status not in ('is_live', 'post_live', 'is_upcoming'):
# Newly uploaded videos' HLS formats are potentially problematic and need to be checked
# XXX: This is redundant for as long as we are already checking all IOS HLS formats
upload_datetime = datetime_from_str(upload_date).replace(tzinfo=dt.timezone.utc)
if upload_datetime >= datetime_from_str('today-2days'):
for fmt in info['formats']:
@ -4311,7 +4405,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._has_badge(badges, BadgeType.AVAILABILITY_UNLISTED)
or get_first(microformats, 'isUnlisted', expected_type=bool))))
info['__post_extractor'] = self.extract_comments(master_ytcfg, video_id, contents, webpage)
info['__post_extractor'] = self.extract_comments(webpage_ytcfg, video_id, contents, webpage)
self.mark_watched(video_id, player_responses)

View File

@ -222,6 +222,14 @@ class LocalNameSpace(collections.ChainMap):
def __delitem__(self, key):
raise NotImplementedError('Deleting is not supported')
def set_local(self, key, value):
self.maps[0][key] = value
def get_local(self, key):
if key in self.maps[0]:
return self.maps[0][key]
return JS_Undefined
class Debugger:
import sys
@ -271,6 +279,7 @@ class JSInterpreter:
def __init__(self, code, objects=None):
self.code, self._functions = code, {}
self._objects = {} if objects is None else objects
self._undefined_varnames = set()
class Exception(ExtractorError): # noqa: A001
def __init__(self, msg, expr=None, *args, **kwargs):
@ -381,7 +390,7 @@ class JSInterpreter:
return self._named_object(namespace, obj)
@Debugger.wrap_interpreter
def interpret_statement(self, stmt, local_vars, allow_recursion=100):
def interpret_statement(self, stmt, local_vars, allow_recursion=100, _is_var_declaration=False):
if allow_recursion < 0:
raise self.Exception('Recursion limit reached')
allow_recursion -= 1
@ -401,6 +410,7 @@ class JSInterpreter:
if m.group('throw'):
raise JS_Throw(self.interpret_expression(expr, local_vars, allow_recursion))
should_return = not m.group('var')
_is_var_declaration = _is_var_declaration or bool(m.group('var'))
if not expr:
return None, should_return
@ -585,7 +595,8 @@ class JSInterpreter:
sub_expressions = list(self._separate(expr))
if len(sub_expressions) > 1:
for sub_expr in sub_expressions:
ret, should_abort = self.interpret_statement(sub_expr, local_vars, allow_recursion)
ret, should_abort = self.interpret_statement(
sub_expr, local_vars, allow_recursion, _is_var_declaration=_is_var_declaration)
if should_abort:
return ret, True
return ret, False
@ -599,8 +610,12 @@ class JSInterpreter:
left_val = local_vars.get(m.group('out'))
if not m.group('index'):
local_vars[m.group('out')] = self._operator(
eval_result = self._operator(
m.group('op'), left_val, m.group('expr'), expr, local_vars, allow_recursion)
if _is_var_declaration:
local_vars.set_local(m.group('out'), eval_result)
else:
local_vars[m.group('out')] = eval_result
return local_vars[m.group('out')], should_return
elif left_val in (None, JS_Undefined):
raise self.Exception(f'Cannot index undefined variable {m.group("out")}', expr)
@ -654,7 +669,19 @@ class JSInterpreter:
return float('NaN'), should_return
elif m and m.group('return'):
return local_vars.get(m.group('name'), JS_Undefined), should_return
var = m.group('name')
# Declared variables
if _is_var_declaration:
ret = local_vars.get_local(var)
# Register varname in local namespace
# Set value as JS_Undefined or its pre-existing value
local_vars.set_local(var, ret)
else:
ret = local_vars.get(var, NO_DEFAULT)
if ret is NO_DEFAULT:
ret = JS_Undefined
self._undefined_varnames.add(var)
return ret, should_return
with contextlib.suppress(ValueError):
return json.loads(js_to_json(expr, strict=True)), should_return
@ -857,7 +884,7 @@ class JSInterpreter:
obj = {}
obj_m = re.search(
r'''(?x)
(?<!\.)%s\s*=\s*{\s*
(?<![a-zA-Z$0-9.])%s\s*=\s*{\s*
(?P<fields>(%s\s*:\s*function\s*\(.*?\)\s*{.*?}(?:,\s*)?)*)
}\s*;
''' % (re.escape(objname), _FUNC_NAME_RE),

View File

@ -140,6 +140,12 @@ class RequestsResponseAdapter(Response):
def read(self, amt: int | None = None):
try:
# Work around issue with `.read(amt)` then `.read()`
# See: https://github.com/urllib3/urllib3/issues/3636
if amt is None:
# Python 3.9 preallocates the whole read buffer, read in chunks
read_chunk = functools.partial(self.fp.read, 1 << 20, decode_content=True)
return b''.join(iter(read_chunk, b''))
# Interact with urllib3 response directly.
return self.fp.read(amt, decode_content=True)

View File

@ -529,14 +529,14 @@ def create_parser():
'no-attach-info-json', 'embed-thumbnail-atomicparsley', 'no-external-downloader-progress',
'embed-metadata', 'seperate-video-versions', 'no-clean-infojson', 'no-keep-subs', 'no-certifi',
'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-youtube-prefer-utc-upload-date',
'prefer-legacy-http-handler', 'manifest-filesize-approx', 'allow-unsafe-ext', 'prefer-vp9-sort',
'prefer-legacy-http-handler', 'manifest-filesize-approx', 'allow-unsafe-ext', 'prefer-vp9-sort', 'mtime-by-default',
}, 'aliases': {
'youtube-dl': ['all', '-multistreams', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'],
'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext', '-prefer-vp9-sort'],
'2021': ['2022', 'no-certifi', 'filename-sanitization'],
'2022': ['2023', 'no-external-downloader-progress', 'playlist-match-filter', 'prefer-legacy-http-handler', 'manifest-filesize-approx'],
'2023': ['2024', 'prefer-vp9-sort'],
'2024': [],
'2024': ['mtime-by-default'],
},
}, help=(
'Options that can help keep compatibility with youtube-dl or youtube-dlc '
@ -1466,12 +1466,12 @@ def create_parser():
help='Do not use .part files - write directly into output file')
filesystem.add_option(
'--mtime',
action='store_true', dest='updatetime', default=True,
help='Use the Last-modified header to set the file modification time (default)')
action='store_true', dest='updatetime', default=None,
help='Use the Last-modified header to set the file modification time')
filesystem.add_option(
'--no-mtime',
action='store_false', dest='updatetime',
help='Do not use the Last-modified header to set the file modification time')
help='Do not use the Last-modified header to set the file modification time (default)')
filesystem.add_option(
'--write-description',
action='store_true', dest='writedescription', default=False,

View File

@ -2961,6 +2961,7 @@ def mimetype2ext(mt, default=NO_DEFAULT):
'audio/x-matroska': 'mka',
'audio/x-mpegurl': 'm3u',
'aacp': 'aac',
'flac': 'flac',
'midi': 'mid',
'ogg': 'ogg',
'wav': 'wav',
@ -3105,21 +3106,15 @@ def get_compatible_ext(*, vcodecs, acodecs, vexts, aexts, preferences=None):
def urlhandle_detect_ext(url_handle, default=NO_DEFAULT):
getheader = url_handle.headers.get
cd = getheader('Content-Disposition')
if cd:
m = re.match(r'attachment;\s*filename="(?P<filename>[^"]+)"', cd)
if m:
e = determine_ext(m.group('filename'), default_ext=None)
if e:
return e
if cd := getheader('Content-Disposition'):
if m := re.match(r'attachment;\s*filename="(?P<filename>[^"]+)"', cd):
if ext := determine_ext(m.group('filename'), default_ext=None):
return ext
meta_ext = getheader('x-amz-meta-name')
if meta_ext:
e = meta_ext.rpartition('.')[2]
if e:
return e
return mimetype2ext(getheader('Content-Type'), default=default)
return (
determine_ext(getheader('x-amz-meta-name'), default_ext=None)
or getheader('x-amz-meta-file-type')
or mimetype2ext(getheader('Content-Type'), default=default))
def encode_data_uri(data, mime_type):

View File

@ -0,0 +1 @@
# Utility functions for handling web input based on commonly used JavaScript libraries

View File

@ -0,0 +1,167 @@
from __future__ import annotations
import array
import base64
import datetime as dt
import math
import re
from .._utils import parse_iso8601
TYPE_CHECKING = False
if TYPE_CHECKING:
import collections.abc
import typing
T = typing.TypeVar('T')
_ARRAY_TYPE_LOOKUP = {
'Int8Array': 'b',
'Uint8Array': 'B',
'Uint8ClampedArray': 'B',
'Int16Array': 'h',
'Uint16Array': 'H',
'Int32Array': 'i',
'Uint32Array': 'I',
'Float32Array': 'f',
'Float64Array': 'd',
'BigInt64Array': 'l',
'BigUint64Array': 'L',
'ArrayBuffer': 'B',
}
def parse_iter(parsed: typing.Any, /, *, revivers: dict[str, collections.abc.Callable[[list], typing.Any]] | None = None):
# based on https://github.com/Rich-Harris/devalue/blob/f3fd2aa93d79f21746555671f955a897335edb1b/src/parse.js
resolved = {
-1: None,
-2: None,
-3: math.nan,
-4: math.inf,
-5: -math.inf,
-6: -0.0,
}
if isinstance(parsed, int) and not isinstance(parsed, bool):
if parsed not in resolved or parsed == -2:
raise ValueError('invalid integer input')
return resolved[parsed]
elif not isinstance(parsed, list):
raise ValueError('expected int or list as input')
elif not parsed:
raise ValueError('expected a non-empty list as input')
if revivers is None:
revivers = {}
return_value = [None]
stack: list[tuple] = [(return_value, 0, 0)]
while stack:
target, index, source = stack.pop()
if isinstance(source, tuple):
name, source, reviver = source
try:
resolved[source] = target[index] = reviver(target[index])
except Exception as error:
yield TypeError(f'failed to parse {source} as {name!r}: {error}')
resolved[source] = target[index] = None
continue
if source in resolved:
target[index] = resolved[source]
continue
# guard against Python negative indexing
if source < 0:
yield IndexError(f'invalid index: {source!r}')
continue
try:
value = parsed[source]
except IndexError as error:
yield error
continue
if isinstance(value, list):
if value and isinstance(value[0], str):
# TODO: implement zips `strict=True`
if reviver := revivers.get(value[0]):
if value[1] == source:
# XXX: avoid infinite loop
yield IndexError(f'{value[0]!r} cannot point to itself (index: {source})')
continue
# inverse order: resolve index, revive value
stack.append((target, index, (value[0], value[1], reviver)))
stack.append((target, index, value[1]))
continue
elif value[0] == 'Date':
try:
result = dt.datetime.fromtimestamp(parse_iso8601(value[1]), tz=dt.timezone.utc)
except Exception:
yield ValueError(f'invalid date: {value[1]!r}')
result = None
elif value[0] == 'Set':
result = [None] * (len(value) - 1)
for offset, new_source in enumerate(value[1:]):
stack.append((result, offset, new_source))
elif value[0] == 'Map':
result = []
for key, new_source in zip(*(iter(value[1:]),) * 2):
pair = [None, None]
stack.append((pair, 0, key))
stack.append((pair, 1, new_source))
result.append(pair)
elif value[0] == 'RegExp':
# XXX: use jsinterp to translate regex flags
# currently ignores `value[2]`
result = re.compile(value[1])
elif value[0] == 'Object':
result = value[1]
elif value[0] == 'BigInt':
result = int(value[1])
elif value[0] == 'null':
result = {}
for key, new_source in zip(*(iter(value[1:]),) * 2):
stack.append((result, key, new_source))
elif value[0] in _ARRAY_TYPE_LOOKUP:
typecode = _ARRAY_TYPE_LOOKUP[value[0]]
data = base64.b64decode(value[1])
result = array.array(typecode, data).tolist()
else:
yield TypeError(f'invalid type at {source}: {value[0]!r}')
result = None
else:
result = len(value) * [None]
for offset, new_source in enumerate(value):
stack.append((result, offset, new_source))
elif isinstance(value, dict):
result = {}
for key, new_source in value.items():
stack.append((result, key, new_source))
else:
result = value
target[index] = resolved[source] = result
return return_value[0]
def parse(parsed: typing.Any, /, *, revivers: dict[str, collections.abc.Callable[[typing.Any], typing.Any]] | None = None):
generator = parse_iter(parsed, revivers=revivers)
while True:
try:
raise generator.send(None)
except StopIteration as error:
return error.value

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2025.06.09'
__version__ = '2025.06.30'
RELEASE_GIT_HEAD = '339614a173c74b42d63e858c446a9cae262a13af'
RELEASE_GIT_HEAD = 'b0187844988e557c7e1e6bb1aabd4c1176768d86'
VARIANT = None
@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2025.06.09'
_pkg_version = '2025.06.30'