Page 1 of 1

captions displaying > correctly

Posted: Fri Jul 30, 2021 8:28 pm
by Ross Meyer
Hey folks,

I need to create an Open Captioned DCP from an .mov file along with an .srt subtitle file. Opening the two in VLC player works as it should but the problem comes when trying to create the DCP. All of the dialogue in the film is displayed as ">> dialogue" and plays as such in VLC but in the DOM preview window the >> signs are displaying as "&gt&gt" I'm going to spend the next few hours converting and see if the problem in the preview window also exists on the actual DCP captions. If so, does anyone on here know if there is a way to convince DOM to display the captions as intended and not attempt to convert them to html?

Re: captions displaying > correctly

Posted: Fri Jul 30, 2021 8:39 pm
by carl
Hi

If there's a problem in the preview I'm afraid it'll almost certainly also be there in the DCP.

Can you send the .srt file over to carl@dcpomatic.com so I can take a look?

Re: captions displaying > correctly

Posted: Sat Jul 31, 2021 9:32 am
by Carsten
Must be a charset/encoding issue. Can probably be solved by converting the SRT file with 'some' editor and choosing a different encoding. Which version of DCP-o-matic are you using?

Hmm... seems to be a bug in DCP-o-matic - no matter what encoding I use, > is shown as &gt

It also arrives in the captions XML file that way: <Text VAlign="top" VPosition="86.9545">&gt;&gt; dialogue</Text>

(using DCP-o-matic 2.15.152) So probably Carl needs to fix it.

One reason probably is that, as you see, in the resulting DCP captions XML file, > and < is also used to introduce the 'Text' separators. And the code dealing with that needs to deal properly with these formal separators and > in content.

So the reason is the XML for the captions in DCPs. VLC will probably use the SRT directly and thus has no problems.

- Carsten

Re: captions displaying > correctly

Posted: Sat Jul 31, 2021 10:08 am
by Carsten
Okay, I seem to be wrong, again. Maybe it IS a bug in DCP-o-matic, but only in the display/preview rendering part. Seems that the DCP you created is formally correct. That's a 'lighter' bug then:

From the Cinecanvas spec (and I am sure it is the same in SMPTE subtitles):

---
1.4 Predefined Entities
Since XML uses the ‘<’, ‘>’ and ‘&’ characters for special purposes, their use as content must be escaped.

1.5 Elements
Similarly, any Unicode character can be specified by using its decimal code-point preceded by “&#” and terminated with “;”. For example, “&#65;” represents the character ‘A’.
Unicode characters can also be specified using hexadecimal notation by preceding its code-point value with “&#x”. For example, “&#x41;” represents the character ‘A’.
---
The escape character sequence &gt; is what you see in preview.

So, I assume that DCP-o-matic and DCP-o-matic player just don't deal correctly with these escaped characters when displaying captions. The DCPs created, though, appear to be 100% correct. I can try one of these on our cinema projector later. The question remains wether it is safe to use these special chars in DCP captions. Nowadays, different software is used in cinema projection equipment to render captions, and it is possible that some equipment fails on them. I can test a few, but not all.

- Carsten

Re: captions displaying > correctly

Posted: Sat Jul 31, 2021 4:31 pm
by Ross Meyer
I've been offsite since creating the feature DCP. I'll be back onsite to ingest and test later tonight. I'll update with the results.

Re: captions displaying > correctly

Posted: Sun Aug 01, 2021 9:57 pm
by Ross Meyer
Unfortunately it looks like the >> signs are displaying incorrectly on the finished DCP as well.
20210801_163006.jpg
20210801_163006.jpg (3.91 MiB) Viewed 3553 times
In all reality, I think we're just going to screen this one without captions because after watching a portion of the program, it looks like the studio didn't hire anyone to actually transcribe the dialogue and just had some kind of voice recognition software handle it. They're pretty bad.

Re: captions displaying > correctly

Posted: Sun Aug 01, 2021 9:58 pm
by carl
Thanks for the update. I have a fix for this bug and it should be gone in the next test release.

Re: captions displaying > correctly

Posted: Sun Aug 01, 2021 11:34 pm
by Carsten
so, Carl, what is it? Shouldn't &gt; be the correct escape sequence for > ?

- Carsten

Re: captions displaying > correctly

Posted: Wed Aug 18, 2021 9:38 pm
by carl
It was just a bug: there was some code to convert > to &gt; but then some more code to convert & to &amp; so > got changed to &gt; and then to &amp;gt;

It shouldn't happen from 2.15.157 onwards.

Re: captions displaying > correctly

Posted: Wed Aug 18, 2021 9:43 pm
by Ross Meyer
Thanks Carl. You're a champ.