In the cue aligner, we assumed that all text will be in milliseconds.
This was the last place with that assumption. This change removes that
assumption and uses the stream info's time scale.
Issue #399
Change-Id: Ie21bf27148e020bd85111dcace0bbdff3419c1ac