captions displaying > correctly

Anything and everything to do with DCP-o-matic.
Post Reply
Ross Meyer
Posts: 15
Joined: Fri Jun 04, 2021 12:48 am

captions displaying > correctly

Post by Ross Meyer »

Hey folks,

I need to create an Open Captioned DCP from an .mov file along with an .srt subtitle file. Opening the two in VLC player works as it should but the problem comes when trying to create the DCP. All of the dialogue in the film is displayed as ">> dialogue" and plays as such in VLC but in the DOM preview window the >> signs are displaying as "&gt&gt" I'm going to spend the next few hours converting and see if the problem in the preview window also exists on the actual DCP captions. If so, does anyone on here know if there is a way to convince DOM to display the captions as intended and not attempt to convert them to html?
carl
Site Admin
Posts: 2338
Joined: Thu Nov 14, 2013 2:53 pm

Re: captions displaying > correctly

Post by carl »

Hi

If there's a problem in the preview I'm afraid it'll almost certainly also be there in the DCP.

Can you send the .srt file over to carl@dcpomatic.com so I can take a look?
Carsten
Posts: 2648
Joined: Tue Apr 15, 2014 9:11 pm
Location: Germany

Re: captions displaying > correctly

Post by Carsten »

Must be a charset/encoding issue. Can probably be solved by converting the SRT file with 'some' editor and choosing a different encoding. Which version of DCP-o-matic are you using?

Hmm... seems to be a bug in DCP-o-matic - no matter what encoding I use, > is shown as &gt

It also arrives in the captions XML file that way: <Text VAlign="top" VPosition="86.9545">&gt;&gt; dialogue</Text>

(using DCP-o-matic 2.15.152) So probably Carl needs to fix it.

One reason probably is that, as you see, in the resulting DCP captions XML file, > and < is also used to introduce the 'Text' separators. And the code dealing with that needs to deal properly with these formal separators and > in content.

So the reason is the XML for the captions in DCPs. VLC will probably use the SRT directly and thus has no problems.

- Carsten
Last edited by Carsten on Sat Jul 31, 2021 10:19 am, edited 1 time in total.
Carsten
Posts: 2648
Joined: Tue Apr 15, 2014 9:11 pm
Location: Germany

Re: captions displaying > correctly

Post by Carsten »

Okay, I seem to be wrong, again. Maybe it IS a bug in DCP-o-matic, but only in the display/preview rendering part. Seems that the DCP you created is formally correct. That's a 'lighter' bug then:

From the Cinecanvas spec (and I am sure it is the same in SMPTE subtitles):

---
1.4 Predefined Entities
Since XML uses the ‘<’, ‘>’ and ‘&’ characters for special purposes, their use as content must be escaped.

1.5 Elements
Similarly, any Unicode character can be specified by using its decimal code-point preceded by “&#” and terminated with “;”. For example, “&#65;” represents the character ‘A’.
Unicode characters can also be specified using hexadecimal notation by preceding its code-point value with “&#x”. For example, “&#x41;” represents the character ‘A’.
---
The escape character sequence &gt; is what you see in preview.

So, I assume that DCP-o-matic and DCP-o-matic player just don't deal correctly with these escaped characters when displaying captions. The DCPs created, though, appear to be 100% correct. I can try one of these on our cinema projector later. The question remains wether it is safe to use these special chars in DCP captions. Nowadays, different software is used in cinema projection equipment to render captions, and it is possible that some equipment fails on them. I can test a few, but not all.

- Carsten
Ross Meyer
Posts: 15
Joined: Fri Jun 04, 2021 12:48 am

Re: captions displaying > correctly

Post by Ross Meyer »

I've been offsite since creating the feature DCP. I'll be back onsite to ingest and test later tonight. I'll update with the results.
Ross Meyer
Posts: 15
Joined: Fri Jun 04, 2021 12:48 am

Re: captions displaying > correctly

Post by Ross Meyer »

Unfortunately it looks like the >> signs are displaying incorrectly on the finished DCP as well.
20210801_163006.jpg
20210801_163006.jpg (3.91 MiB) Viewed 3560 times
In all reality, I think we're just going to screen this one without captions because after watching a portion of the program, it looks like the studio didn't hire anyone to actually transcribe the dialogue and just had some kind of voice recognition software handle it. They're pretty bad.
carl
Site Admin
Posts: 2338
Joined: Thu Nov 14, 2013 2:53 pm

Re: captions displaying > correctly

Post by carl »

Thanks for the update. I have a fix for this bug and it should be gone in the next test release.
Carsten
Posts: 2648
Joined: Tue Apr 15, 2014 9:11 pm
Location: Germany

Re: captions displaying > correctly

Post by Carsten »

so, Carl, what is it? Shouldn't &gt; be the correct escape sequence for > ?

- Carsten
carl
Site Admin
Posts: 2338
Joined: Thu Nov 14, 2013 2:53 pm

Re: captions displaying > correctly

Post by carl »

It was just a bug: there was some code to convert > to &gt; but then some more code to convert & to &amp; so > got changed to &gt; and then to &amp;gt;

It shouldn't happen from 2.15.157 onwards.
Ross Meyer
Posts: 15
Joined: Fri Jun 04, 2021 12:48 am

Re: captions displaying > correctly

Post by Ross Meyer »

Thanks Carl. You're a champ.
Post Reply