Suggestions for Changes Gallery

How helpful that there’s a gallery from which to select pre-made regex patterns!

It was suggested in the Asia LT meeting with Ian McQuay just now that users should suggest useful Change regex rules for inclusion in the gallery.

For those who have adopted the convention of repurposing the \fig marker’s obsolete loc field to tag the output locations (print/app/web) for the figure, the following change rule will filter out figures that are tagged for only print (‘p’) or web (‘w’) and not app (‘a’):

Name: Filter out figures tagged only for print (p) or web (w) locations, not app (a).  [USFM 2 version]
Find: (?i)\\fig ([^|]*\|){3}([pw]+)\|[^\\]*\\fig\*
Replace:  {leave empty}

To be ready for Paratext 8.1’s figure syntax, a second rule is also needed:

Name: Filter out figures tagged only for print (p) or web (w) locations, not app (a). [USFM 3 version]
Find: (?i)\\fig [^\\]*\bloc="[pw]+"[^\\]*\\fig\*
Replace:  {leave empty}

(Technically, these could be combined into one rule, but if the user ever needed to adjust the output location codes for anything other than a/p/w, a combined regex would be harder for them to recognize where the adjustment would be needed.)

For more details on this convention, see tiny.cc/tagoutput (Sorry it’s not clickable. It says “new users” can’t post links.)

What other suggestions do people have for the changes gallery?

Dan, I am wondering if there should be a User defined changes list that can be imported and exported. And those could be shared on a website where more fuller documentation could be seen to see what it does.

If we load up the presets with many complex options it becomes harder and harder to find the commonly used ones the majority want to use.

One thing was clear from the Asia discussion is that some Non-Roman scripts have to do a lot more work on the texts to get good results in the app. Mark had about 20 changes in his example. Nic was doing some complex changes to make audio sync work better.

Here’s a simple change rule I created just yesterday to make sure that spaces before certain punctuation are turned into narrow non-breaking spaces:

<change name="Make space before punctuation narrow non-breaking space">
  <find> ([:;?!])</find>
  <replace>\u202f\1</replace>
</change>

This is a critical rule in French-speaking contexts, because the French standard is a (non-breaking) space before these punctuation marks, but many French-speaking teams just type a normal space, e.g. here’s a sample from my text:

\c 7
\p \v 1 Wa Aliyaasaʼ radda leyah wa gaal : «Asmaʼo kalaam Allah !
Daahu Allah gaal : ‹Ambaakir fi nafs al-saaʼa...

If I pull that text as-is into a Scripture app, there are a number of places where punctuation could end up at the beginning of a line, depending on text zoom, e.g.:

Wa Aliyaasaʼ radda leyah wa gaal
: «Asmaʼo kalaam Allah ! Daahu...

Ugly! So this rule changes the plain space before the punctuation into a narrow non-breaking space to force it to be with the preceding word:

Wa Aliyaasaʼ radda leyah wa
gaal : «Asmaʼo kalaam Allah !
Daahu...

In a specific project there might be rules needed for spacing around quotation marks as well. For this project I added two rules for adding narrow non-breaking spaces after opening quotes and before closing quotes:

<change name="Add narrow no-break space after open quotes">
   <find>([«‹])</find>
   <replace>\1\u202f</replace>
 </change>
 <change name="Add narrow no-break space before close quotes">
   <find>([»›])</find>
   <replace>\u202f\1</replace>
 </change>

This sets them off nicely in the app. These rules might be too specific to put in a general gallery (there are a number of ways to deal with quotes, both in the source text and in the output), but a number of teams might still find them useful.

Yes, an import/export mechanism would be helpful. My experience with copy-paste from a web page is that users inevitably copy and paste regexes wrong. (extra spaces, etc.) Yet this is what we are currently relying on in tiny.cc/indictypesetting where we provide these regexes for app builders.

It might also help to organize the gallery if gallery items could be organized into groups. e.g. Punctuation, Images, etc, and if each item could provide descriptive HTML to explain why you might need it, and how to use it (if user-customization is necessary).

Thanks!

Remove illustrations from specific illustrators:

In many cases there are illustrations in a print version of the Bible which the illustrator does not currently permit to be present in apps. The following regex line will remove those particular illustrations, and make it easy to put them back in later if the illustrator allows.

  • \.(?i)(jpg|png|tif)(.*?)Horace Knowles

\fig Grapes on a grapevine|hk00111c.tif|col|Mak 12:1|Horace Knowles|(Mak 12:1)| \fig*

Some of the Regex elements:
(?i) - makes the search case insensitive
(.*?) - the question mark makes this non-greedy

This regex line works very well in all cases except where there are two illustrations within a single line. For instance, in the line below, there are two .tif file names in one line. I couldn’t figure out how to only capture the later in any clean fashion. I tried negations and negative look aheads, but didn’t manage to succeed. If someone has a good idea, I’m all ears.

…mum nɛ gɛn ketum kemew e.\fig Vineyard with watchtower and wine press|LB00103C.TIF|span|Mak 12:1|Louise Bass|(Mak 12:1; Mat 21:33-46; Luk 20:9-19)| \fig \fig Grapes on a grapevine|hk00111c.tif|col|Mak 12:1|Horace Knowles|(Mak 12:1)| \fig*

I believe a broader solution than this is necessary, as the converse is often required: Certain images (especially color images) are to be used only for electronic publications, and not for print. The solution we use in South Asia is to make use of the obsolete “Location” field, as documented for users here. The regexes you’ll need for that, both for the current USFM2 and the upcoming USFM3, can be found here (in the Illustrations section). This should also handle the problem of two illustrations in the same line of text.

I know this thread is very old, but I’m trying to use the “Changes” feature to change a breaking space before a semi-colon to a non-breaking one. I put \s\u003B in the “find” field and \u202f\u003B in the “replace” field. This doesn’t work. I tried \0020 instead of \s and still no success. In reading through the above suggestions, how do I put in a regular expression? In Jeff’s example change rule, how do I put that in SAB using the “find” and “replace” fields. What exactly would I type in each field?
Thanks!
Andrew

I am going back a bit far in the cobwebs here but I seem to remember something about the Changes not finding unicode characters, only replacing them…I may be totally misremembering this, but just try this to see if it works:

Find: \s;
Replace: \u202F\u003B

Did you get this figured out? You should be able to use Unicode codes in the find, but I would probably suggest:
Find: “\s*;
Replace with: “\u202f;
This finds zero or more spaces before a semicolon, and changes them to the narrow non-breaking space.

If you can’t get it to work, show us a bit of your source file, both with and without the “Show After Changes” box checked at the top.

Note that this will put a \202f character before all semicolons, even ones in references, like this:
\r (Matiye 3.1-6,11-12 ; Lik 3.1-6,15-18)
That may be fine, but you just need to make sure that’s what you want.

Well, it’s true that I don’t want the extra space where there wasn’t one before. I still haven’t found a solution that works.

Here’s before changes:
\id TI1
\mt1 W̃ënu ŋa

\s Ge ñaɗu ayët ɗus bi mo ye W̃ënu ŋa, ƴëkëry viŋi vikerëh:
\p * \xt Wapëgwala 1 \xt* ; * \xt Wapëgwala 2 \xt* ; \xt Matiye 22.32\xt* ; \xt Marëk 12.29-30\xt* ; \xt Lik 1.37\xt* ; \xt Lik 11.13\xt* ; \xt San 3.16\xt* ; \xt San 4.24\xt* ; \xt San 14.6\xt* ; \xt Vantiyehn 10.34-36\xt* ; \xt Vantiyehn 14.14-17 \xt* ; \xt Vantiyehn 17.24-31 \xt* ; \xt Vëvë Rom 1.18-23 \xt* ; \xt Vëvë Rom 11.33-36 \xt* ; * \xt 1 Vëvë Korent 8.4-6 \xt* ; \xt 2 Vëvë Korent 1.3 \xt* ; \xt 1 Timote 1.17 \xt* ; \xt 1 Timote 6.15-16 \xt* ; \xt Ebëre 4.13 \xt* ; \xt Ebëre 10.30-31 \xt* ; \xt Sak 1.17 \xt* ; * \xt 1 Piyer 1.15-17 \xt* ; * \xt 1 San 1.5 \xt* ; \xt 1 San 4.7-12 \xt* ; \xt Yuɗ 24-25 \xt* ; \xt Wapuyala 4.8-11 \xt* ; \xt Wapuyala 15.3-4 \xt*
\s Ge ntiyahnëki W̃ënu ŋa ɗënkwëtaɗilihna, ƴëkëry viŋi vikerëh:
\p * \xt Wapëgwala 39.20-21 \xt* ; * \xt Lik 12.6-7 \xt* ; * \xt Lik 12.27-31 \xt* ; * \xt Lik 15.1-10 \xt* ; \xt San 3.16 \xt* ; \xt Vëvë Rom 5.8 \xt* ; \xt Vëvë Rom 8.31-39 \xt* ; \xt Vëvë Efes 1.3-12 \xt* ; \xt Vëvë Efes 2.4-10 \xt* ; \xt 1 Timote 2.3-6 \xt* ; \xt 2 Piyer 3.9 \xt* ; \xt 1 San 3.1 \xt* ; \xt 1 San 4.9-10 \xt*
\s Ge ñaɗu paci W̃ënu ŋa tëhakëhni vële ỹahnëka va, ƴëkëry viŋi vikerëh:
\p \xt Matiye 18.19-20 \xt* ; \xt Matiye 28.19-20 \xt* ; \xt Lik 12.6-7 \xt* ; * \xt San 14.16-23 \xt* ; * \xt Vëvë Rom 8.35-39 \xt* ; \xt 2 Vëvë Korent 6.16-18 \xt* ; \xt Vëvë Efes 3.17-19 \xt* ; \xt Vëvë Filip 4.13 \xt* ; \xt Vëvë Kolos 2.6-7 \xt* ; \xt 2 Vëvë Tesalonik 3.16 \xt* ; * \xt Ebëre 13.5-6 \xt* ; \xt 1 Piyer 5.7 \xt* ; \xt Wapuyala 6.9-11 \xt*

And after:
\id TI1
\mt1 W̃ënu ŋa

\s Ge ñaɗu ayët ɗus bi mo ye W̃ënu ŋa, ƴëkëry viŋi vikerëh:
\p * \xt Wapëgwala 1 \xt* ; * \xt Wapëgwala 2 \xt* ; \xt Matiye 22.32\xt* ; \xt Marëk 12.29‑30\xt* ; \xt Lik 1.37\xt* ; \xt Lik 11.13\xt* ; \xt San 3.16\xt* ; \xt San 4.24\xt* ; \xt San 14.6\xt* ; \xt Vantiyehn 10.34‑36\xt* ; \xt Vantiyehn 14.14‑17 \xt* ; \xt Vantiyehn 17.24‑31 \xt* ; \xt Vëvë Rom 1.18‑23 \xt* ; \xt Vëvë Rom 11.33‑36 \xt* ; * \xt 1 Vëvë Korent 8.4‑6 \xt* ; \xt 2 Vëvë Korent 1.3 \xt* ; \xt 1 Timote 1.17 \xt* ; \xt 1 Timote 6.15‑16 \xt* ; \xt Ebëre 4.13 \xt* ; \xt Ebëre 10.30‑31 \xt* ; \xt Sak 1.17 \xt* ; * \xt 1 Piyer 1.15‑17 \xt* ; * \xt 1 San 1.5 \xt* ; \xt 1 San 4.7‑12 \xt* ; \xt Yuɗ 24‑25 \xt* ; \xt Wapuyala 4.8‑11 \xt* ; \xt Wapuyala 15.3‑4 \xt* \s Ge ntiyahnëki W̃ënu ŋa ɗënkwëtaɗilihna, ƴëkëry viŋi vikerëh:
\p * \xt Wapëgwala 39.20‑21 \xt* ; * \xt Lik 12.6‑7 \xt* ; * \xt Lik 12.27‑31 \xt* ; * \xt Lik 15.1‑10 \xt* ; \xt San 3.16 \xt* ; \xt Vëvë Rom 5.8 \xt* ; \xt Vëvë Rom 8.31‑39 \xt* ; \xt Vëvë Efes 1.3‑12 \xt* ; \xt Vëvë Efes 2.4‑10 \xt* ; \xt 1 Timote 2.3‑6 \xt* ; \xt 2 Piyer 3.9 \xt* ; \xt 1 San 3.1 \xt* ; \xt 1 San 4.9‑10 \xt* \s Ge ñaɗu paci W̃ënu ŋa tëhakëhni vële ỹahnëka va, ƴëkëry viŋi vikerëh:
\p \xt Matiye 18.19‑20 \xt* ; \xt Matiye 28.19‑20 \xt* ; \xt Lik 12.6‑7 \xt* ; * \xt San 14.16‑23 \xt* ; * \xt Vëvë Rom 8.35‑39 \xt* ; \xt 2 Vëvë Korent 6.16‑18 \xt* ; \xt Vëvë Efes 3.17‑19 \xt* ; \xt Vëvë Filip 4.13 \xt* ; \xt Vëvë Kolos 2.6‑7 \xt* ; \xt 2 Vëvë Tesalonik 3.16 \xt* ; * \xt Ebëre 13.5‑6 \xt* ; \xt 1 Piyer 5.7 \xt* ; \xt Wapuyala 6.9‑11 \xt*

I ran
Find= +;
Replace:= \u202f;

The find finds one or more spaces before a semicolon.

There is some inconsistency in the markup:
image
There is a space after the 17 then another space after the \xt* There should be no spaces before the \xt* That will cause unexpected wrapping and ; to occur at the beginning of lines…
Description: Remove space inside \xt at the end
find= \\xt\*
replace=\\xt*

That seems to work fine. I have ; ending lines but none starting lines.

I do see that maybe a change to put a non breaking space after any * would stop those appearing at the end of lines. Then do you add a non breaking space between 1 and Piyer, and the other books that have number in front?
Numbered books: find=(\d) ([PTVS]) rep=$1\u202F$2
Asterisk before space: find=\* rep=*\u202F

It is also possible to make a change for the hyphenated verse numbers in the \xt they could be changed to non breaking hyphens.

I don’t understand why the ; needs to have a space before it. It looks odd to have floating at the end of many lines.

French punctuation rules…

1 Like