In audio sync, fancy bracket and asterisk treated as phrase

I’d like to start a new thread for something that @CraigN reported here:

See also specifically this post, which shows fairly clearly the range of the problem:

You can see some examples of this error in this test app (which expires at the end of April):
https://www.dropbox.com/s/7o0fyrtuias0qtn/Al-Kitaab_al-Mukhaddas-2.1.2.apk?dl=0

In Rooma 3:13, you see the same situation described by Craig, I believe.
In Luukha 18:20, you see a case where at the end of the verse there is a fancy bracket, then a regular guillemet, then the asterisk. You can see that the app actually highlights that sequence of punctuation, but then realizes that a new verse is there, and jumps to it. If there wasn’t a new verse, I think it would have highlighted that sequence of punctuation as the next segment.

I will try to post this in the developers issues and include a sample project (but it’s a whole Bible, so pretty big…).

I guess I should have pointed out that my test app is actually in Roman script (unlike the Arabic script of Craig), but it uses the fancy brackets because of the cultural context, and there is also a parallel Arabic script version of the app in the same language that also uses the fancy brackets.

@CraigN How did you get the fancy brackets? Are they inserted directly in the Paratext text? In our case, we use a substitute “<” and “>” anywhere in the text that we want the fancy brackets (because they are easier to work with), and then SAB has the following changes (from the .appDef file):

    <change name="Open fancy parens">
      <find>&lt;</find>
      <replace>\\op \uFD3E\u200D\\op\*</replace>
    </change>
    <change name="Close fancy parens">
      <find>&gt;</find>
      <replace>\\op \u200D\uFD3F\\op\*</replace>
    </change>

and we have a custom style defined to handle the specific formatting:

span.op {
    font-family: font2;
    font-size: 80%;
    position: relative;
    top: -0.2em;
}

If you ask me why we have a \u200D zero-width joiner in those replacements… I’m not sure I could tell you. We tried a number of different things, probably some LTR markers among them, and this is what we ended up with as working. (Maybe someone wants to try removing the ZWJ and see what happens…)

In my test app, Aʼmaal al-Rusul (ACT) 7:50 shows a case where the punctuation at the end (fancy bracket, guillemet, asterisk) doesn’t get selected at all during the reading of the audio, but if you use the double-arrows to move forward or back a segment, you can actually select that sequence of punctuation. So it seems like it is being captured as a segment of its own.

Matta (MAT) 23:39 shows a similar problem as LUK 18:20, but a different combination of punctuation: fancy bracket, asterisk, guillemet. This is also treated as a separate segment and you can see that it gets highlighted separately before it turns the page to the next chapter.

So I think the problem is that for a lot of cases (e.g. ACT 7:53 with exclamation point and plain guillemet, and all the cases with the fancy brackets except the one error in Rooma (ROM) 3:12-18), the punctuation is properly included in the phrase, but certain combinations of the fancy bracket and the asterisk are failing to be included along with the text of the segment. If this happens at the end of a verse, the app seems to be able to re-synch to the next verse, at least mostly, as it somewhat fails at LUK 18:20, as mentioned above.

Maybe we should try to make a list of combinations which seem to fail:

  • fancy bracket, asterisk, but only IF not at the end of a verse AND not followed by another punctuation (see Craig’s example, where it is followed by a question mark and works)
  • fancy bracket, guillemet, asterisk
  • fancy bracket, asterisk, guillemet

I assume that there is some phrase matching code in SAB, and we just need to make sure that these specific combinations get matched and included in the previous phrase.

Another example of the first case is Ibraaniyiin (HEB) 1:5. The fancy bracket and asterisk are highlighted as part of the following phrase, rather than the phrase they should be associated with.

Yes, the fancy brackets came with the Paratext files.

If you download my test app above and check out the problem at Rooma (ROM) 3:13, this is what the text looks like:

If you play the audio, it reads fine until you get to the end of that first \q1 \q2 (the middle of the verse). The highlighting of that \q2 is as shown in the image above. It does not select the fancy bracket or the footnote marker. Then when it starts to read the next segment (Wa chalaaliifhum), it highlights the fancy bracket and the footnote marker (considering it the next segment, apparently). And when it starts reading “induhum samm…” it highlights the \q1 before it (Wa chalaaliifhum).

So it seems like the problem is that it is considering the fancy bracket + footnote marker as a separate segment, at least when the verse continues after that. The fancy bracket + footnote marker at the end of that verse gets highlighted with the text just fine when it is read:

But it is also at the end of the verse, a context which doesn’t seem to show this problem.

I thought someone might like to see the output when I export that chapter to HTML, so here it is, from the end of v.12 to the beginning of v.15:

<div class="q2"><div id="T12c" class="txs">wa la waahid.<span class="op">‍﴿</span><span class="footnote selectable" id="F-1"><sup>*</sup></span></div><span id="bookmarks12"></span></div><span id="bookmarks12"></span><a id="v13"></a>
<div class="q-vv"><div id="T13a" class="txs"><span class="v">13</span><span class="vsp">&nbsp;</span><span class="op">﴾‍</span>Kalaam khuchuumhum yiwaddi le l‑khabur</div></div>
<div class="q2"><div id="T13b" class="txs">wa lisneehum malaaniin be kalaam halu khachchaach.</div><div id="T13c" class="txs"><span class="op">‍﴿</span><span class="footnote selectable" id="F-2"><sup>*</sup></span></div></div>
<div class="q"><div id="T13d" class="txs"><span class="op">﴾‍</span>Wa chalaaliifhum</div></div>
<div class="q2">induhum samm amchideegaat.<span class="op">‍﴿</span><span class="footnote selectable" id="F-3"><sup>*</sup></span><span id="bookmarks13"></span></div><span id="bookmarks13"></span><a id="v14"></a>
<div class="q-vv"><div id="T14a" class="txs"><span class="v">14</span><span class="vsp">&nbsp;</span><span class="op">﴾‍</span>Wa kalaamhum illa laʼana</div></div>
<div class="q2"><div id="T14b" class="txs">wa kalaam murr marra waahid.<span class="op">‍﴿</span><span class="footnote selectable" id="F-4"><sup>*</sup></span></div><span id="bookmarks14"></span></div><span id="bookmarks14"></span><a id="v15"></a>
<div class="q-vv"><div id="T15a" class="txs"><span class="v">15</span><span class="vsp">&nbsp;</span><span class="op">﴾‍</span>Rijleehum yajru</div></div>

You can see that the \q2 in the middle is broken into two <div class="txs"> elements, but the \q2 at the end is just one div. It’s easier to see if I tidy up the XML in those two spots:

<div class="q2">
  <div id="T13b" class="txs">wa lisneehum malaaniin be kalaam halu
  khachchaach.</div>
  <div id="T13c" class="txs">
    <span class="op">‍﴿</span>
    <span class="footnote selectable" id="F-2">
      <sup>*</sup>
    </span>
  </div>
</div>

<div class="q2">induhum samm amchideegaat.
<span class="op">‍﴿</span>
<span class="footnote selectable" id="F-3">
  <sup>*</sup>
</span>
<span id="bookmarks13"></span></div>

Note that the second div doesn’t even have the class="txs", which seems to be the text section designation. At the end of verse 12 (which does highlight the fancy bracket and the footnote marker correctly), you can see that the div that covers the text of that \q2 and the fancy bracket and the footnote marker DOES include the class="txs".

So how can we get that fancy bracket and footnote marker to get included in that earlier <div class="txs">? (Which would then push the <div id="T13c" class="txs"> down to the “We chalaaliifhum” text, and push the <div id="T13d" class="txs"> down to the “induhum samm” text, where it should be.)