Timing files problems, possibly with guillemets

By God’s grace, I think I’ve found a fix to the guillemet problem, following a suggestion by Ian McQuay to use zero-width spaces which got me thinking outside the box.
To recap, the problem is with the French standard of having a space between ending punctuation and the guillemet e.g. Hello. »
and even worse… Hello ! »
In the Changes section, I search for the punctuation, space, right-guillemet combination and replace it with a different form of the punctuation (diff unicode char), space and right-guillemet. Note: you have to use \s for space in the find field, and \u0020 for the space in the replace–not sure why!
Eg.
Find: \.\s»
Replace: \u2024\u0020»
This searches for a period (fullstop)-space-guillemet and replaces with a u2024 period- space-guillemet
I have to do one of these for each punctuation type, using \u01c3 for an exclamation mark replacement and \uFF1F for a question mark.
Then I add the right guillemet as a phrase-ending marker for the Aeneas synch. This retains the period, exclamation mark and question mark as normal phrase-ending chars, but for phrases ending with guillemets, the guillemet becomes the phrase-ender (since the period, ! and ? have been “removed”.)
Maybe there’s still a better way, but at least this works for my needs.

1 Like

Thanks for looking into this Andrew.
This is a reasonable workaround, but presumably might limit what fonts can be used to those with such code points.
So the “Period” needs to be changed to the “One dot leader”,
the “exclamation mark” to the “Latin Letter retroflex click”,
and the “question mark” to the “Fullwidth question mark”.
Particularly the latter, I can’t see many SIL fonts that have a glyph for this codepoint. You’ve presumably got this working with something though.
Thanks.

Good point. I’m using Charis SIL, which has those characters. This was my first attempt at using alternative chars like this, and perhaps there’s a different strategy that will work better.

Okay–new idea that seems to work with a quick test. Use the change strategy I noted above, but change the . ! and ? to some other normal char that Aeneas is not using as a phrase-ending marker. i.e. change the full-stop to @, ! to $ and ? to #. But ONLY if there is a space and then a right-guillemet (as noted in the Find/Replace change strategy).
Run Aeneas on those pages with the problem, making sure you have the right-guillemet as a phrase-ender. If you check the timings, the highlighting should work (even though you have the #, $ and @ symbols there). Then uncheck those 3 changes in SAB and build your app. The . ! and ? will be back, but the highlighting will work.
Try it and see. Shouldn’t depend on the font if you choose temporary substitutions in the font.

I’ve had a couple of exchanges with @Andrew_Shafe to try to implement this guillemet work-around strategy, but no luck yet. As a reminder, the issue is that in many French-speaking contexts, the text will have a space before a closing quote (guillemet), but that is likely after some end-of-phrase punctuation. The phrase-splitting algorithm then lumps that closing quote in with the following phrase. So with the phrase:

…hor taan, miŋni ! » N̰uugu…

If you tell Aeneas to break phrases on “,!»”, then it will break it into “…hor taan,” and “miŋni !” and “» N̰uugu…”. (But that closing guillemet weirdly remains selected after the highlighting continues to the next phrase when you play the audio!) So that sort of works, but ideally you would want the closing quote to be included with the previous phrase.

We will eventually need a fix in the code for this problem. I’ve located the relevant code, but haven’t had time to figure out and propose a fix yet. So I’m trying to use this work-around for now. But I can’t get it to work.

So I enable this rule before running Aeneas:
image

That changes all breaking punctuation (at least the ones in my test) into a “@” before a closing guillemet. This works fine - the synchronization then breaks the phrase into “…hor taan,” and “miŋni @ »” and “N̰uugu…”, which I can verify with an “Export to HTML” that it correctly plays the audio. But you obviously don’t want the “@” characters in your app in place of your punctuation. So I disable that change, and then “Export to HTML” again. Now the text on the page is correct (no “@” in place of punctuation), but the “»” is now considered a phrase all by itself, and is highlighted during the “N̰uugu…” phrase, and offsets all of the following phrases. So that’s even worse than the original with no changes.

If I look at the underlying HTML code, with the change rule changing punctuation to “@” I get the “miŋni @ »” all lumped together in one div:
<div id="Td" class="txs">...hor taan,</div> <div id="Te" class="txs">miŋni&nbsp;@&nbsp;»</div> <div id="Tf" class="txs">N̰uugu...</div>
So far so good. but if I disable that rule, and Export to HTML again, the underlying code changes to:
<div id="Td" class="txs">...hor taan,</div> <div id="Te" class="txs">miŋni&nbsp;!</div><div id="Tf" class="txs">&nbsp;»</div> <div id="Tg" class="txs">N̰uugu...</div>
You can see that the close quote guillemet » has now been placed in a separate div. But the underlying timing file hasn’t been rebuilt, so that div gets connected with what should be the next audio phrase.

I’m not sure that I’m understanding all that is going on here. (Does it recreate the phrases when you Export to HTML?) The only thing I’m doing differently from Andrew is that he said to build the app, and I was just running the Export to HTML test… so I went ahead and built the app and… same result. The guillemet is still broken into a completely separate phrase which steals the audio from the following phrase, and messes up all of the succeeded audio sync.

So any other thoughts on what I can try?

Two more tries:

  1. I tried replacing the space before the guillemet with a narrow no-break space (U+202f), and then the phrases created included extra phrases where the guillemets were, but Export to HTML produced something where I couldn’t click on a phrase to select it. And it really wasn’t any better than with no change rule, so that isn’t helpful

  2. I tried removing the space before the guillemet (just removing the \u0020 in the rule above). That actually works pretty well, but you lose the space before the guillemet. Turning off the change rule after the synchronization (before Export to HTML) results in the same problem of misalignment above. But if you leave the rule enabled, the audio synchronization in the app seems to work well. The only downside is that the space before the guillemet is actually part of the punctuation rules, so the quotes are uneven as seen below.

The open quotes have a space after, but the closing quotes do not have a space before:

But the audio sync seems to work well with the text. Of course, we could always write rules to remove all of the spaces (after opening guillemets and before closing guillemets), which will make the text a little more consistent. But then you won’t have the desired spacing separation on your guillemets