Interlinear with 2 text directions

I’m trying for format an interlinear project. The base text is Greek with Urdu glosses. The overall direction for the layout will be left-to-right, with the Urdu glosses intended in full right-to-left below the Greek words. Some of the Urdu glosses include spaces.

I have set up a basic layout successfully. Urdu (Arabic script) text flows right-to-left as expected. However, I cannot find a means to force full right–to-left for all text flow in the glosses. When a space occurs, the layout appears to revert to left-to-right for the duration of the space. Urdu text following the space flows right-to-left again. The result is 2 or 3 Urdu right-to-left strings in and overall left-to-right layout.

I wondered if I could force the directionality of the gloss|rb style somehow, but I do not find an option for this in style’s font configuration.

Then I thought I might be able to add right-to-left marks (U+200F) before all spaces (or perhaps any other character needing this treatment) in changes.txt, like:

in "\|.*?\\rb\*": " " > "\u200f "

This does not have any effect.

I know that the expression above applies a change in the right location – because if I output an ‘_’ (underscore) in place of the space, then the Urdu text layout is fine (except for the underscore!). I’ve tried using alternative spacing characters, but they also revert left-to-right (unsurprisingly).

image

The above the strings should flow like this (using the underscore as a space)

image

Would there be any direction from anyone else who has tried this?

Jeff

Hi Jeff,

I haven’t had to do this myself, but thought of a couple more things to try…

Did you try ALL of the alternative spacing characters, including non-breaking spaces? Don’t forget ‘NARROW NO-BREAK SPACE’ (U+202F).

Did you try replacing the space with a zero-width non-joiner? (U+200C) It won’t give you exactly what you want, but it may be closer to what you want, or part of a solution.

I’ve done a number of jobs with RTL marks, and even so, I’m often surprised when I try to apply them that I sometimes don’t get them in the right locations. (Like maybe I don’t REALLY understand the rules…) So I would recommend trying all of the other combinations - in your change above, put the \u200f after the space, put it both before and after the space, insert one right after the \rb, before the end of the field, etc. It may be that one of the combinations will do what you want. (And you don’t have to understand it if it gives the desired result!)

Good luck…

This is probably the wrong way to approach this, but when we were dealing with something like what you describe, I proposed using a glyphe that isn’t in the Arabic orthography, but still defined RTL instead of trying to solve the proper sequence of directional announcement glyphs.

That is, Instead of a space \u0020 between the Arabic words within each gloss, use something with a similar width, but defined to be RTL in unicode… like a Hebrew pasiq \u05c0 instead. This makes the issue how to hide the Hebrew punctuation, which is solvable by painting it clear.

However, in the instance I’m referring to, the cause had to do with digits within Arabic text, which we were marking as “Italian” language during conversion because of some font issue. This “Italian for figures” was a global that went with the font we used, not specifically for Arabic typesets. And this change in language created issues with spacing and punctuation in proximity. The solution in that case eventually turned out to be reclassifying the digits to another language (but still not Arabic if I remember right.) But this is to suggest that ensuring the languages are set properly in the text might have an impact on the arrangement of the glyphs without having to resort to more exotic fixes like I propose. That is the composer seems to be interpreting the spaces as ltr, which means it thinks they are greek, not Arabic. If you make sure the character style used for the gloss text is defining all characters as Arabic language, that might be all you need to do.

My first thought was in line with one of Jeff’s suggestions–my guess is that the U+200F marker needs to come right before the word, meaning after the space, not before.

I’m assuming that the text direction in the \rb section is simply set by the overall text direction of the main language, and that to override that will take the development team reprogramming things.

One last suggestion is to try to use something like an underscore character but style it with a color that is the same as the background. So maybe something like:
in "\|.*?\\rb\*": " " > "\\sc _\\sc\*"
and then style Small Caps with that color. Basically it can be any character style that you’re not otherwise using.

Thanks for these ideas everyone.

Yes - I did try others spacing characters, which do result in different sizes of space, but still flip direction. I tried adding the RTL mark on either side of the space or both (and for the sake of trying everything – before any string at all).

Jeff - I did try the zero width non-joiner, and that works OK! (but leaves no space). I suppose I could make that character active and then define it as some amount of glue (space)? My TeX is very very rusty and I’m not sure how to go about this. I tried:

\catcode`\‌=\active
\def‌{\bgroup\hskip 0.25em\egroup}

(there is a 200c in there after \catcode `\ and \def)

This works - but it gives the same result as a space (which is not surprising I guess, since it is now an amount of space).

I also tried:

\catcode`\‌=\active
\def‌{\bgroup\beginR\hskip 0.25em\egroup}

which is a different result, but still not correct – and I know I’m hunting around here at this point.

Malachi - I tried your suggestion, but the result is a TeX error, which I won’t paste here. I think the issue there is that it is inserting USFM markup into the attribute section of \rb ...|...\rb*

Since I can’t add markup in there, I’d like to try coloring a right-to-left character in place of the space (Michael’s suggestion), but I don’t really know what I’m doing. I tried the following with Paseq inserted in place of spaces:

\catcode`\׀=\active
\def׀{\bgroup\color{white}׀\egroup}

but I get a complaint about ! Argument of \color has an extra }

I don’t know how to use\color or whether it is even available like this, or a syntax problem. I’m aware that I’m searching in the dark at this point.

Without the color, the Paseq itself actually works fine! I wish there was an space character with Right-To-Left property set.

Thanks for the replies and ideas.

Jeff

Before, I implied we were seeking a “directional announcement” solution before we focused on language as the solution to getting commas to appear ‘AFTER’ the right word when digits were close (specifically in verse ranges).

LTR and RTL override glyphs are supplied to force composers to intrepret following glyphs one way or another. But they affect the display anywhere you use them (or may or may not affect…), so they get really fun to understand where they are in the text stream. And in the end they still didn’t do what was on the label in Indesign CS6 with a couple complex composition plugins in play, and a font with known opentype ‘features’ that caused issues with directionality.

But if you have the ability to feed PTX print regex unicode " \u glyphs, or html entity &…; nomenclature instead of the actual glyph, unicode directional override glyphs may work. They may work in this case even if you deal with the glyph itself, but again, these glyphs mess with the display order of following glyphs pretty much anywhere they appear, so interpreting what you actually see is tricky, and quickly leads to hex editor analysis to recover your file and sanity.

Sounds like we need a bit of help here. I’ve added support for start and end hooks to the interlinear handling code. So, in theory (all untested) you should be able to add a line like:

\sethook{start}{gloss|rb}{\beginR}

and all should be good.

I’ve just committed the change, so it should be in the next release 2.2.46 or later. As to whether we need some GUI for this, is open to a request from the community. It’s yet another checkbutton. Do we want ‘yet another checkbutton’?

Thanks for adding help, Martin!

I installed 2.2.46 this afternoon, and added \sethook{start}{gloss|rb}{\beginR} to ptxprint-mods.tex. Sadly - there seems to be no effect. The output is the same as before - including trying with U+200F or U+202D added before spaces.

I’ve just tried this again after updating to 2.2.50, and correct RTL rendering is working now in the Urdu glosses. Wonderful!