Set the start number in representation to the segment index that is sent by muxer.
With this enhancement, you can now specify the initial sequence number
to be used on the generated segments when calling the packager.
With the old implementation, it was always starting with "1".
---------
Co-authored-by: Cosmin Stejerean <cstejerean@meta.com>
This PR adds parsing of teletext styling, and rendering of the styling
in output TTML and WebVTT subtitle tracks.
Beyond unit tests, I've used the sample
https://drive.google.com/file/d/19ZYsoeUfH85gEilQkaAdLbPhC4CxhDEh/view?usp=sharing
which has rather advanced subtitling with two separate rows at the same
time, where one is left aligned and another is right aligned. This
necessitates two parallel cues to be rendered. It also has some colored
text.
Solve #1335.
## parse teletext styling and formatting
Extend the teletext parser to parse the teletext styling and formatting.
This includes translating rows into regions, calculating alignment
from start and stop position of the text, and extracting text and
background colors.
The colors are limited to full lines.
Both lines and regions are propagated in the TextSample structures.
This is because the number of lines may differ from different sources.
For teletext, there are 24 rows, but they are essentially always
used with double height, so the number of output lines is 12
from 0 to 11.
There are also corresponding regions are denoted "ttx_R",
where R is an integer row number. A renderer can use either
the line number or the region ID to render the text.
## ttml generation for teletext to EBU-TT-D
Add support to render teletext input in EBU-TT-D (IMSC-1) format.
This includes appropriate regions ttx_0 to ttx_11 signalled
in the TextSamples, alignment and text and background colors.
The general TTML output has been changed to always include
metadata, layout, and styling nodes, even if they are empty.
EBU-TT-D is detected by the presence of "ttx_?" regions in the
samples. If detected, extra TTML elements will be added and
the EBU-TT-D linePadding used as well.
Appropriate styles for background and text colors are generated
depending on the color and backgroundColor attributes in the
text fragments.
## adapt WebVTT output to teletext TextSample.
Teletext input generates both a region with prefix ttx_
and a floating point line number (e.g. 9.5) in the
range 0 to 11.5 (due to input 0-23 as double lines).
The output is adopted to drop such regions
and convert the line number to an integer
since the standard only used floats for percent
values but not for plain line numbers.
An updated version of PR #1027
That previous PR was done using 2021 code, and there were many changes
in the codebase from there, so a rebase was needed and also some minor
tweak here and there. But it's the same code, just reimplemented on a
newer codebase.
If you want to take a look at this in action, after building shaka
packager with this PR's code included, try this commands in 3 different
simultaneous bash sessions:
1. Video UDP input: `ffmpeg -f lavfi -re -i
"testsrc=s=320x240:r=30,format=yuv420p" -c:v h264 -sc_threshold 0 -g 30
-keyint_min 30 -r 30 -a53cc 1 -b:v 150k -preset ultrafast -r 30 -f
mpegts "udp://127.0.0.1:10000?pkt_size=1316"`
2. WebVTT UDP input: `for sec in $(seq 0 9999) ; do printf
"%02d:%02d.000 --> %02d:%02d.000\ntest second ${sec}\n\n" "$(( ${sec} /
60 ))" "$(( ${sec} % 60 ))" "$(( (${sec} + 1) / 60 ))" "$(( (${sec} + 1)
% 60 ))" ; sleep 1 ; done > /dev/udp/127.0.0.1/12345`
3. shaka packager command line: `timeout 60
path/to/build/packager/packager
'in=udp://127.0.0.1:10000?timeout=8000000,stream_selector=0,init_segment=240_init.m4s,segment_template=240_$Number%09d$.m4s,bandwidth=150000'
'in=udp://127.0.0.1:12345?timeout=8000000,stream_selector=0,input_format=webvtt,format=webvtt+mp4,init_segment=text_init.m4s,segment_template=text_$Number%09d$.m4s,language=eng,dash_roles=subtitle'
--mpd_output ./manifest.mpd --segment_duration 3.2
--suggested_presentation_delay 3.2 --min_buffer_time 3.2
--minimum_update_period 3.2 --time_shift_buffer_depth 60
--preserved_segments_outside_live_window 1 --default_language=eng
--dump_stream_info 2>&1`
Note the added `input_format=webvtt` to the shaka packager command's
second selector. That's new from this PR. If you don't use that, shaka's
format autodetection will not detect the webvtt format from the input,
as explained in
https://github.com/shaka-project/shaka-packager/issues/685#issuecomment-1029407191.
Try the command without it if you want to.
Fixes#685Fixes#1017
---------
Co-authored-by: Daniel Cantarín <canta@canta.com.ar>
Replaces #1181
* Add support for EBU Teletext input following Level 1.5 of the core
specification ETSI EN 300 706 V1.2.1 (2003-04).
* Add support for webvtt in MP4 segments output.
Closes#272
---------
Co-authored-by: Marcus Spangenberg <marcus.spangenberg@eyevinn.se>
A positive value, in milliseconds. It is the threshold used to determine
if we should assume that the text stream actually starts at time zero.
If the first sample comes before default_text_zero_bias_ms, then the
start will be padded as the stream is assumed to start at zero. If the
first sample comes after default_text_zero_bias_ms then the start of the
stream will not be padded as we cannot assume the start time of the
stream.
This work was done over ~80 individual commits in the `cmake` branch,
which are now being merged back into `main`. As a roll-up commit, it is
too big to be reviewable, but each change was reviewed individually in
context of the `cmake` branch. After this, the `cmake` branch will be
renamed `cmake-porting-history` and preserved.
---------
Co-authored-by: Geoff Jukes <geoffjukes@users.noreply.github.com>
Co-authored-by: Bartek Zdanowski <bartek.zdanowski@gmail.com>
Co-authored-by: Carlos Bentzen <cadubentzen@gmail.com>
Co-authored-by: Dennis E. Mungai <2356871+Brainiarc7@users.noreply.github.com>
Co-authored-by: Cosmin Stejerean <cstejerean@gmail.com>
Co-authored-by: Carlos Bentzen <carlos.bentzen@bitmovin.com>
Co-authored-by: Cosmin Stejerean <cstejerean@meta.com>
Co-authored-by: Cosmin Stejerean <cosmin@offbytwo.com>
This converts all time parameters to signed, finishing a cleanup that
was started in 2018 in b4256bf0. This changes the type of:
- timestamps
- PTS specifically
- timestamp offsets
- timescales
- durations
This excludes:
- MP4 box definitions
- DTS specifically
This is meant to address signed/unsigned conversion issues on arm64
that caused some test cases to fail.
Change-Id: Ic752a20cbc6e31fea6bc0894d1771833171e7cbe
It is not working correctly in gcc 4.8 or earlier, which is still
popular (bundled by default in CentOS 7).
Issue #865, #929.
Change-Id: I136446a70831bd0237cd29646dd349fe7558176b
Legacy players, e.g. older versions of ExoPlayer, do not handle default webvtt text alignment correctly. Need to specify `align:center` explicitly cues without text alignment for backwards compatibility.
Fixes#925.
It is not working correctly in gcc 4.8 or earlier, which is still
popular (e.g. bundled by default in CentOS 7).
Fixes#865, #929.
Change-Id: I55a42428dbd2a12fc2c3b1e6a49fdd662a295dca
This only supports TTML output; meaning the user can convert WebVTT
into TTML, but not the other way around. This will be useful for
DVB-sub subtitles that would be better supported within TTML.
This only adds text-based output; a follow-up will add MP4 support.
Change-Id: I0944b7df95d7765e55f203fc5e9a644f5c455dd8
This adds more generic settings for regions and CSS styles. These are
global settings, so they go on the StreamInfo object.
Change-Id: Ibb76c060206152ccf8e9a067c09877226f67c927
Now text cues are composed of nested fragments that can be individually
styled. This allows portions of the cue to be bold, etc. The
WebVTT parser doesn't parse the inputs, but the original tags are
preserved in WebVTT output. The WebVTT output will add tags if the
style elements are present in the cue object.
Change-Id: I6abba4175e376e4f753193f7d8cac63e958d3c89
Now the Cue settings are a generic object that is parsed in WebVTT.
This will allow setting the settings in different parsers without having
to use WebVTT-specifics.
Change-Id: I36689bec725bd2e515af962b7174fc5977f96fa2
This sets the groundwork for more generic text cues by having a more
generic object for the settings and the body. This also changes the
TextSample to be immutable and accepts the fields in the constructor
instead of using setters.
Change-Id: I76b09ce8e8471a49e6bf447e8c187f867728a4bf
Now text-based WebVTT also uses the generic media pipeline. This
converts the WebVttTextOutputHandler to a WebVttMuxer to be more
consistent with the other muxer types.
This also allows choosing between single-segment text and multi-segment.
Before, we would generate both and use single-segment for DASH and
multi-segment for HLS; but now you can choose between either and either
are supported in both DASH and HLS.
Change-Id: I6f7edda09e01b5f40e819290d3fe6e88677018d9
This changes it from an OriginHandler to a MediaParser and moves the
handling of it to the Demuxer. This will allow more generic handling
of text by giving it the same abstractions as video/audio handling.
Change-Id: Ibbde3c84d228ec8e83af1ed266ea97dbc9589c24
Instead of having the text readers reading from the file directly, they
now accept the data as a stream.
Change-Id: Id1b32c867a8058a68ae7aab5c568f77672a4401d
Opening a named pipe can block until both ends are open, and we cannot
control when the other end will be open. Ideally, we would always
open files in a thread so that Packager can be used with piped inputs
from naive applications without a potential deadlock.
This change will defer opening WebVTT files until the parser Run()
method is called from a thread. This way, WebVTT files being sent in
from a pipe will never be able to block the main thread.
Previously, files were opened on the main thread before calling the
parser constructor, passing the open file to the constructor as an
argument. I also tried doing it in the parser's InitializeInternal()
method, but that is also called from the main thread.
Change-Id: I54cc68ed9d48a8dc697829119be84d4065b1ae1c
Note that STYLE and REGION are not supported in mp4 container due to
spec limitation as 14496-30:2014 does not specify a way to signal
styles/regions inside mp4.
Closes#344.
Change-Id: I05c14df916f7b2c7ca4364ee9407e0eda4dc7a3f
Configurable with --transport_stream_offset_ms.
This is needed to compensate for possible negative timestamps in
inputs, which could happen on ISO-BMFF with EditLists.
Issue #112.
Change-Id: I0fce8766c9df2911b9bb859c1e54052a8ed2abfb
WebVTT cues without payload may not carry meaningful information, but it
is allowed by WebVTT specification [1]. It could also be useful
sometimes, e.g. to signal the time progression in live case.
Fixes#433.
[1] https://www.w3.org/TR/webvtt1/#types-of-webvtt-cue-payload
Change-Id: I9e31f4a3789cbdafb7667b64f4019834190ecfc0
Before we had an assert that would catch if a sample had an
invalid times, however input may have bad times.
We did have a message if we saw a sample with a duration equal to
zero. This expands that check to check if the time is valid in
general and will ignore any sample that is not valid.
Fixes#425
Change-Id: I9774bfbdbd401f3016d2c345665b9973d1889db7
Previously, the text padder media handler would assume that text always
started at time zero. This would work for VOD but would result with a
large pad at the start of LIVE content.
To avoid this, the text padder will use a bias to test whether or not it
thinks the content starts at zero. Right now the bias is set to be 10
minutes, but will later be configurable with a command line flag.
10 minutes was used as LIVE content will have much larger values and VOD
content should have much lower values.
Issue: #416
Change-Id: I07af15a577392fb030e36f052085cd4e667700e8
Added the key frame field to the IsMediaSample mather. All
tests that do not care about a sample being a keyframe (or not)
have been updated to use "_".
Change-Id: I44180687c58c260b6856e683d647f532227b14d5
Updated all the stream data matcher we use in our unit tests
to allow us to use matchers in them. We are now able to use "_"
to ignore specific parameters.
With this we were able to replace the different version of
matchers for each stream data type with a single instance for
each type.
Includes updates to printing strings to the listener. Strings
now go through a "pretty" function to help make it easier to
read them in the output.
Change-Id: I146351b54fccd63ab9ec936877e6c6b30f9aa9fc
Make the text stream info factory method in media_handler_test_base
require the caller to specify the time scale.
Issue: #399
Change-Id: Ibdfb183e0aa3f4ff50edf6b58c4e9b966006c6d2
Text Samples with no payload should be ignored, so this adds a
test to check that samples with no payload get treated the
same as a gap.
As long as this case is true, using gaps in our other tests should
functionally be the same as using samples with no payload.
Change-Id: Ic16b240c43eda2514b537a2d938d4135638adc4e
Before we used the sample payload for each text sample as we were
focusing on the times rather than the contents.
As we look to add tests that rely on specific sample payloads, we
need to change the tests to explicitly set the payload for each
sample.
Change-Id: I24174686f46535cf6c2d59a18308101a3bb51c87
In our text to mp4 tests, we were only checking if the times on
the samples lined-up with what we were expecting.
We want to check that text sample contents (ids, settings, and
payloads) were correctly merged into a single media sample. To
verify this, we now check if the sample ids appear in the
media sample.
Change-Id: Ica1a85a14e7b116275e3571332b2e90d7bc44c45
In our text to mp4 tests, we used the same sample id for each sample,
this changes it so that each sample (within a single test) has a
unique id.
This is done in preparation to look for the ids in the created
media samples.
Change-Id: I3215a6f09279af8f40e1ce8a959e0a522a811173
The previous text to mp4 webvtt pipeline was incomplete. It
did not insert ad cues and it could only insert a segment
after a sample ended.
Now the pipeline supports ad cue insert and segment insertion
mid text sample. This required the pipeline to use the text
chunker (to split samples and insert segments) and required
a major overhaul of the text to mp4 converter.
Before the converter came before the chunker. This meant that
the converter only expected to see stream info and text samples.
Moving the converter after the cue aligner and chunker means
that the convert had to be aware of segments and cues.
The general approach is the same, however the converter will
convert the samples per-segment as the chunker will introduce
duplicate samples if a sample spans across segments.
Closes#362Closes#382
Change-Id: I0f54a40524c36a602ad3804a0da26e80851c92fd
Renamed all the files called "webvtt_output_handler*" to
"webvtt_text_output_handler*" to better reflect the class name
in them.
Change-Id: I977bab362076974a124f263bcefff716ed8b6a0f
We don't want to allow any handler to be copyable or assigned-over
so this change enforces that for the webvtt output handler.
Change-Id: Ie0d59d6dbfb7a5e00bb4dd1422cd696d1a2d6072
Before, the webvtt output handler was written so that it could
share code between a segmented and non-segmented handler. As
we are not worried about that right now, this change simplifies
the handler to just be about segmented output.
Change-Id: I29dbc4e3a4ffbeb7ea10e23db489ee74b398a6c4
The Cue class was from a previous WebVTT implementation and
is not used in the current implementation. It was missed when the
other classes were removed. This change removes it.
Change-Id: I661ab3fcd80b5e5ef98b5213746b341a4028d1a1
When originally implementing the webvtt parser, there was a
misunderstanding in what the BOM was suppose to be
(https://en.wikipedia.org/wiki/Byte_order_mark). This corrects the
misunderstanding.
Close#397
Change-Id: I250d392db228e5e9b86684614b57adc5d8a4e5fe
To ensure that we can parse content with style and region blocks,
this change updates the parser to skip those blocks so that we
can still parse the cues from a file.
Full style and region support will be added later this year.
Issue #380
Change-Id: I11b8fd862a108c27a5c67b15d4703532b44a1214
The WebVtt Output Handler did not recognize cue events. This change
allows the handler to accept the events and tell muxer listener
about them.
Issue #362
Change-Id: I7c3318b72e539adc19af587c8e213fdb0af8290b
Created a media handler to come after parsers that will handle filling
in gaps between text samples. The padder takes a min duration, and if
the samples do not cover the min duration when flushed, one last empty
sample will be injected so that the samples will go up to the min duration.
Change-Id: I88605059664d09279676edac418ff3d4990d7556