Timing files problems, possibly with guillemets

DavidW · April 16, 2021, 11:04am

Hi. I’m having problems with syncronising text and audio. I have run the aeneas sync and then right click on the file to Export to Html to preview the timings. The sync starts fine, but once it gets to closing quotation marks (french guillemets with NBSP) it starts having problems. It seems to treat just that as the next phrase and therefore the phrases are one step behind. The next guillemet it comes to cause it to be further behind and so on. There is no such problem with the fine tune timings, I can get that to work perfectly correctly, but it doesnt work with the html option or when I build an app. I have tried various things including replacing NBSP with standard space in the original text document, in the aeneas wizard box changing guillemets to double quotes. I’ve also tried manually deleting them from the aeneas.txt file that is created, though I’m not sure how to get SAB to “reload” that file and be sure it is seeing the new changes. Thanks for any help.
David

jeff_heath · April 16, 2021, 11:55am

Hi David, If I’m understanding you correctly, the timing gets behind at guillemets in the automatic synchronization, but then you can use fine-tune timings and correct those problems so that the timings are correct? But then I don’t understand why you say it “doesn’t work” with HTML or when you build the app. Do you mean that your fine-tuned timings are lost? There are some issues around that - do a search in this forum to find out how to fix that. (Choice of browsers is one possibility…)

But I primarily want to address the problems in the automatic synchronization. In most of my aeneas configurations, I change guillemets to nothing (blank), and that seems to work for me. Can you give that a try and see if it helps?

DavidW · April 16, 2021, 12:00pm

Hi Jeff, good to hear from you. I just tried replacing the guillemets with nothing, and get the same problem. Browser-wise it is set to use Edge, so that seems to work ok.

jeff_heath · April 16, 2021, 12:03pm

So in the synchronization it attaches an entire phrase onto the guillemet?

DavidW · April 16, 2021, 12:04pm

yes, the guillemet and the space/NBSP before it.

DavidW · April 16, 2021, 12:20pm

It’s possible it’s linked to the \b and \m markers that I’m using (it comes from a Bible module that has a lot of commentary, as well as citations).
I’m not good at reading the html, but it looks a bit odd ( I don’t know if it will paste correctly here) :

<div class="p_m-Paragraph-Margin-NoFirstLineIndent"><div id="Tj" class="txs">Súumolisuum ekaan jaat emisiyoŋ yan guroge mee «&nbsp;Gagur gara mútiliŋ.</div><div id="Tk" class="txs">&nbsp;»</div></div>

Bala útiiŋaal n'gawook gan Atúuha awook mee Ibrahima,

<div id="Tm" class="txs"

jeff_heath · April 16, 2021, 2:14pm

I was expecting to see SFM markers. So the text is all in HTML/XML, like the sample you included? Maybe you could attach a larger portion, to give more context (or send it to me on email)? So here’s the line you sent, but just formatted a bit:

<div class="p_m-Paragraph-Margin-NoFirstLineIndent">
  <div id="Tj" class="txs">Súumolisuum ekaan jaat emisiyoŋ yan guroge mee
  «&nbsp;Gagur gara mútiliŋ.</div>
  <div id="Tk" class="txs">&nbsp;»</div>
</div>

Do you see how the non-breaking space and end guillemet are in a completely separate

marker? My guess is that this is the problem. SAB finds a division, and apparently wants to try to attach some audio to it, even though it is just a space and a quote mark, since it’s pulled out of context.

Can you do a test where you incorporate those characters into the previous division, like this?

<div class="p_m-Paragraph-Margin-NoFirstLineIndent">
  <div id="Tj" class="txs">Súumolisuum ekaan jaat emisiyoŋ yan guroge mee
  «&nbsp;Gagur gara mútiliŋ.&nbsp;»</div>
</div>

My guess is that SAB will no longer try to attach audio to it. You can try it on just one place, and if that works, you can see about fixing all of them. Some carefully chose regular expressions could probably do the trick. If you need help with that, we would definitely need to see more context, to see what to match and remove. (E.g. Are the id’s unique for each line? In what format?)

DavidW · April 16, 2021, 2:17pm

Ok, I think I’m getting close to finding the problem, but not yet a solution. If I edit manually the html file then I can remove the erroneous </div><div id="Tk" class="txs"> that appears before the closing guillemet. I can then manually go ahead and edit the following marker which is TL and make it Tk. If I save that and try it, then it works. But then all future timing markers are out by one and it doesn’t work for the next missing marker, and I don’t want to have to edit them all manually! If I go back into SAB and export as html, it undoes all those changes…

jeff_heath · April 16, 2021, 2:29pm

So the id=“Tk”, etc. are timing markers? When were those inserted in the text? Not until after the synchronization, I imagine… Where exactly are you getting this HTML file from? Did you not have some other initial file of the text that you were starting from?

DavidW · April 16, 2021, 3:06pm

Having sent an email to @jeff_heath we haven’t found a solution, he has confirmed the same problems with getting the highlighting to work with export html/app building. Fine-tune timings works fine, but that doesn’t help with getting an app as the output.

Here’s some of the first few lines of the source file that I’ve had most success with from a .Doc version (SFM from Bible modules seem to be even more complicated).

\id B001

\toc2 18: Waa ucile Atúuha

\c 1
\b
\p_s-Heading-SectionLevel1 18: Waa ucile Atúuha nawook Ibrahima
\p_m-Paragraph-Margin-NoFirstLineIndent Gupaalom gammee ngajantenom, ilobe m’buguul n’gajow gara Atúuha acil mee gásuumaay n’epak. Ho hacamen mee mbil anooan : námir, ban nasoŋ, nalagen gagur gara mútiliŋ gan aliigolaal mee, nabaj gásuumaay bu nanoonan.
\b
\p_m-Paragraph-Margin-NoFirstLineIndent Súumolisuum ekaan jaat emisiyoŋ yan guroge mee « Gagur gara mútiliŋ. »
\b
\p_m-Paragraph-Margin-NoFirstLineIndent Bala útiiŋaal n’gawook gan Atúuha awook mee Ibrahima, fuŋesolaal bu gabaañen fo be émir Gagur gara mútiliŋ. Kábiriŋ nan Atúuha átuut mee émit n’etaam, káŋ bu m’buroŋ Ibrahima, baje Adama bugo n’Hawa

Thanks,
David

jeff_heath · April 16, 2021, 7:51pm

Just a thought… Have you tried building the app with RAB? Since there aren’t any verse numbers in your text, I wonder if it might be more successful.

DavidW · April 16, 2021, 7:58pm

Thanks, yes I tried RAB and it has the same problem, I guess it’s the same underlying code for processing that.

DavidW · April 21, 2021, 8:47am

Anyone else with any suggestions? Do I need to send this as a bug? The highlighting of phrases in our text is consistently not working, to the extent of rendering that feature useless.

DavidW · April 23, 2021, 8:27am

An update. I’ve made some more progress with this. The issue seems to be with the closing guillemets at the end of a paragraph. By removing the guillemet, or replacing it with a double quote the problem is solved. However I need there to be a guillemet there! I tried editing it to a double quote in the source text, and using the “changes” feature within SAB to alter it to a closing guillemet, but then the same problem re-occurs. I created a simple dummy Word document with a few lines and then a paragraph ending with guillemet, recorded it, ran it through aeneas and I’m getting the same problem.

CraigN · April 23, 2021, 11:22am

I’m not sure this is of any use, but our Kurdish text also uses guillemets for quotes, often finishing a paragraph and this has not caused a problem with audio syncing. Is this an example of RTL text NOT causing a problem?
The only difference I can think of is that with the guillemet at the end of a quote, RTL would have the opposite one.

DavidW · April 23, 2021, 2:04pm

Thanks, that got me thinking and I realised that the issue is with the fact that the sentence final punctuation (fullstop/period) needs to be included within the guillemet. Aeneas is setup to consider a period as a “punctuation character marking the end of a phrase”. Since a period marks the end of a phrase, logically a new phrase must start after that, even if it’s only a space followed by a closing guillemet. And that puts everything out of sequence. SAB somehow needs to be able to consider the whole sequence “period-space-guillemet” as marking the end of a phrase, not just the “period”.

CraigN · April 23, 2021, 2:22pm

Out of curiousity, why do you have a space between the full stop and the guillemet (and at the beginning)? Why not:
«Gagur gara mútiliŋ.» ?
Is it just they way they do it? In Kurdish there is no space, i think that explains why I have not experienced a similar problem.
Gen 1:3-5 has a few examples:
٣ خودا فەرمووی «با ڕووناکی ببێت.» ئینجا ڕووناکی بوو .٤ خوداش بینی کە ڕووناکییەکە باش بوو ئیتر خودا ڕووناکییەکەی لە تاریکی جیا
کردەوە. ٥ خودا ڕووناکییەکەی ناو نا «ڕۆژ» و تاریکییەکەی ناو نا «شەو.» ئێوارە بوو، بووە بەیانی ئەمە ڕۆژی یەکەم بوو.

CraigN · April 23, 2021, 2:29pm

Could you just find and replace (with SAB’s Changes feature) ‘« ’ with ‘«’ and ’ »’ with ‘»’ before syncing with aeneas, and then just untick it those changes after?

DavidW · April 23, 2021, 2:52pm

That’s always the way I’ve been taught that it should be done according to French orthography rules.

Good suggestion for the SAB changes, I hadn’t thought of selecting them then unselecting them, but I can’t get it to work unfortunately.

CraigN · April 23, 2021, 3:56pm

Don’t know if it will help but I changed some square brackets to ornate brackets in our app and it took a lot of trial and error, you can see a whole thread about it here: