1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2025-09-03 00:25:08 +00:00

[webvtt] Merge daisy-chained duplicate cues (#638)

Fixes: https://github.com/yt-dlp/yt-dlp/issues/631#issuecomment-893338552

Previous deduplication algorithm only removed duplicate cues with
identical text, styles and timestamps.  This change also merges
cues that come in ‘daisy chains’, where sequences of cues with
identical text and styles appear in which the ending timestamp of
one equals the starting timestamp of the next.

This deduplication algorithm has the somewhat unfortunate side effect
that NOTE blocks between cues, if found, will be emitted in a different
order relative to their original cues.  This may be unwanted if perfect
fidelity is desired, but then so is daisy-chain deduplication itself.
NOTE blocks ought to be ignored by WebVTT players in any case.

Authored by: fstirlitz
This commit is contained in:
Felix S
2021-08-09 20:22:30 +00:00
committed by GitHub
parent ad3dc496bb
commit 25a3f4f5d6
3 changed files with 61 additions and 19 deletions

View File

@ -331,6 +331,26 @@ class CueBlock(Block):
'settings': self.settings,
}
def __eq__(self, other):
return self.as_json == other.as_json
@classmethod
def from_json(cls, json):
return cls(
id=json['id'],
start=json['start'],
end=json['end'],
text=json['text'],
settings=json['settings']
)
def hinges(self, other):
if self.text != other.text:
return False
if self.settings != other.settings:
return False
return self.start <= self.end == other.start <= other.end
def parse_fragment(frag_content):
"""