Audio Sync with RTL Scripts (Arabic)

NeilZubot · September 26, 2018, 10:14am

Has anyone had success getting Arabic script or any other RTL script to sync well with audio? I’ve tried various eSpeak voices and character replacements, but have not yet had any success. The SAB documentation states that one person, by transliterating Hebrew characters into A-Z roman characters, had success with Hebrew. I tried this with Arabic, but it didn’t work well for me. I even entered all of the various versions of Arabic characters (Initial, Medial, Final, and Alone).

I’m wondering if the SAB Character replacement interface cannot handle loads of character replacements. After entering about 100 character replacements and generating the timing files, I went back into the character replacement interface to find that most of the character replacements I entered were no longer there. *An SAB feature request: It would be great if SAB allowed one the option of saving character replacements as a preset.

Anyway, if anyone has any insight here, I would love to hear from you. Thanks!

mcquayi · September 26, 2018, 4:27pm

Re Changes as a Preset.
There is already a feature request for this here: Suggestions for Changes Gallery

Cast your vote there. I won’t raise this to a Feature Request.

When you transliterated the Arabic into Latin what did you use? What language did you use with the transliteration? As best I understand it the consonants are important while the vowels are much less important. That is consonants affect the wave form more than the vowels.

Dan_Em · September 26, 2018, 6:54pm

yes, i’ve done this and it works well. i wouldn’t mess with sab character replacement to achieve this, though. instead, make a new paratext project that is a romanization of the rtl script (project type: automatic transliteration using sil converters). then set up an sab app on this romanized project. once you’ve generated your timing files, you can copy them over to the rtl project. whether you prefer to do fine-tuning in romanized or rtl is up to you.

i actually found that i could get volunteers who can only read english to fine-tune the timing files for me. here’s how: http://tiny.cc/audiotimings

btw, my sil converters process was a compound converter that daisy chained a few steps together, to make it readable (e.g. reverse quote marks and question marks) and more phonemic (e.g. drop unpronounced syllables). that made it suitable both for english readers and for the esperanto-based timing file generation.

NeilZubot · September 27, 2018, 9:21am

Thank you @mcquayi for responding. In transliterating I went systematically through the whole Arabic alphabet (all consonants and vowels) in the four forms they can appear in - Initial, Medial, Final, and Alone.

I skimmed through Suggestions for Changes Gallery and didn’t find a feature request be able to save character replacements, within the Aenaes sync wizard.

NeilZubot · September 27, 2018, 9:21am

Very clever @Dan_Em! Thanks for sharing. I’ll try it out and let you know how it goes.

Thanks also for posting the link to the google doc you made for volunteers to help out in fine tune syncing your project. Really nicely laid out and great way to get the church involved and connected to the work.

mcquayi · September 27, 2018, 3:48pm

I was thinking of the SAB changes section. Now I understand.

NeilZubot · September 28, 2018, 12:38pm

It worked! @Dan_Em thanks so much for this tip! The sync has worked very well with my Cameroonian language (I used esperanto as the eSpeak voice). It seems that there is only the odd place that might need some fine tuning.

After creating the timing files with the romanized version of the text, I imported those files into the Arabic script app project. Initially it didn’t work, but after some thought and investigation, I realized that I needed to customize the phrase-ending characters (.?!:;,) to work with Arabic script. I entered the following Arabic unicode characters under Books/Main Collection/Audio Syncronization/Use Book Collection Setting:

\u061E, \u061F, \u003A, \u0021, \u061B, \u060C

This worked. Thanks so much!

ChrisHubbard · September 28, 2018, 1:26pm

@mcquayi, @richard: We should probably get this into the documentation as a strategy for RTL projects.

jeff_heath · October 2, 2018, 12:02pm

It’s been a while since I’ve done audio synchronization of an Arabic script text, but I seem to recall it worked fairly well just by selecting Persian (fa) for the Aeneas text-to-speech voice. Now this was on a language that is actually a variety of Arabic, so it doesn’t have any special characters that wouldn’t be in Persian. But I would guess that you could add Character Replacements to convert special characters to some standard equivalents if you have them. E.g. if you have an “ny” as \u0767 ARABIC LETTER NOON WITH TWO DOTS BELOW, you could change that to a NOON and YEH, or maybe to just one of those characters - do a couple of tests to see which seems to match better.

NeilZubot · October 2, 2018, 1:05pm

Thanks Jeff for the suggestion. Currently I’ve found that the above version worked very well. If I come across any chapters where the sync didn’t work as well as I wanted, I’ll certainly try using Persian.

It’s good that you posted this here, for the sake of any in the future who need help with this sync issue and come across this thread.

Corey_Garrett · April 17, 2019, 4:09pm

+1, Persian helped me on a difficult file. (Normally just using the Roman file for the same passage works, but sometimes it does not.) Thanks!

aabraham · July 1, 2019, 2:51pm

Please download here a synced Arabic SAB: https://apk.fcbh.org/Arabic_VDV

Using Farsi as a base works.