Aeneas: Problem synchronizing text in Arabic script

RickBrown · October 28, 2022, 1:53am

Aeneas was used to produce a verse-by-verse timing file for text in Arabic script, using the converter for Persian. There are no special characters. Others have reported that this worked fine. But in this case, the highlight progresses more slowly than the voice, so in a short time the verses being spoken don’t even show on the screen. Worse yet, it stops on a verse for a while, then jumps forward a few verses, but never catches up with the voice. The voice itself is speaking slowly and carefully. Has anyone else encountered this? Does anyone know how to fix it or have a suggestion?

mcquayi · October 28, 2022, 5:15am

@RickBrown did the Fine Tuning work as expected?

It sounds a bit like a phone issue.

RickBrown · October 29, 2022, 12:11am

It would take an inordinate amount of time to adjust the timings for the whole Bible, especially when the adjustments needed are so great. I’m working with Dan Neville, who has lots of experience with SAB and Aeneas, and he doesn’t know why Aeneas is failing in this way. WE could try using transliteration. The problems is that there is no suitable transliteration software for this. There are websites that claim to transliterate, and the only one that even comes close to working requires you to paste a small amount of text in a box, then copy the results. One would have to work paragraph by paragraph through the Bible, which is a major waste of time. I’ll see if one of the SIL Converters can do this. There is still the question, however, of how Aeneas will interpret the transliteration. Can it read non-English Latin digraphs like dh, zh, and kh?

mcquayi · October 29, 2022, 12:38pm

I did a transliteration once, have forgotten the language, I used the Changes section of the aeneas wizard. Actually I modified the .appdef file behind that aeneas changes.
My understanding is that you need to worry about consonants but not vowels. The wave form is compared and vowels don’t affect the wave form like consonants.

The other thing to worry about is the start time settings. If those are too big or small it can mess up things.

Dan_Neville · October 31, 2022, 3:11pm

HI Ian,
What do you mean you modified the .appdef file “behind that aeneas changes”?
I am familiar with how to edit the appdef file but not what might need to be done if changes are made inside of SAB first. Thanks for expanding that thought a bit more.

mcquayi · October 31, 2022, 11:00pm

In a .appdef file you can find two sorts of changes. One type is “main” and the other “sync”.

For transliteration changes use the “sync” type.
This is an example of some minimal changes for syncing with aeneas.

<changes type="sync">
    <change>
      <find>\*</find>
      <replace></replace>
    </change>
    <change>
      <find>U+2013</find>
      <replace>-</replace>
    </change>
    <change>
      <find>U+2018</find>
      <replace></replace>
    </change>
    <change>
      <find>U+2019</find>
      <replace></replace>
    </change>
    <change>
      <find>U+201C</find>
      <replace></replace>
    </change>
    <change>
      <find>U+201D</find>
      <replace></replace>
    </change>
    <change>
      <find>n̄</find>
      <replace>n</replace>
    </change>
    <change>
      <find>N̄</find>
      <replace>N</replace>
    </change>
    <change>
      <find>m̀</find>
      <replace>m</replace>
    </change>
    <change>
      <find>ọ̀</find>
      <replace>o</replace>
    </change>
  </changes>

I’d work out the transliteration in a Google Sheet. Just a simple two column Find and Replace. Then export as Tab Separated Values.
I’d either run a XSLT transformation to expand it into the form above or use RegEx to do the job.

RickBrown · November 22, 2022, 1:55pm

Thanks Ian. In the end this was very simple to solve. In the menu for the Paratext project for this language, I opened “Characters Inventory.” This presented a table with a column for each Arabic character or combination of characters, the corresponding Unicode value(s), its validation status, and its count. I copied that and pasted it into a Word file. It had 532 rows, but I deleted rows with multiple characters in them, leaving 61 rows. I deleted the fourth column and the contents of the third column, then in the third column I put the Latin character closest in sound to the Arabic character. Then I deleted the column with the Arabic characters. That left just two columns: the Unicode and the Latin transcription. I then converted the table to text and specified the dollar sign as the symbol to separate the Unicode from its Latin counterpart. I then did a search and replace on the paragraph mark (^p), replacing it with
</replace>^p </change>^p <change>^p <find>U+
Then I replaced the dollar sign with
</find>^p <replace>
The result for each character looked like this one:
<change>
<find>U+0626</find>
<replace>i</replace>
</change>
I saved the file as a text file and sent it to the technician, Dan, to insert into the .appdef file. The resulting timing files were perfectly synchronized, with no need for fine tuning. Thanks, Ian, for your helpful guidance.