SAB4.5 MAC import MS WORD docs recurring style problems

Hello Chris

You wrote this some time ago regarding DOCX files. It helped me understand a bit more of the “logic” … (see my comments … NOTE J:)

Just for you to know, DOCX files are converted to SFM files (the native format used by SAB). You have the most control if your source files are SFM. Styles are based on SFM markers (see the Styles tab in SAB).
From looking at the code, it appears that the following is done with regards to styles:
1 Some default styles are converted to SFM styles (e.g. Heading1 => s1, Heading2 => s2, Heading3 => s3)

NOTE J: yes heading 1 and 2 appear as s1 and s2 but then it stops. I can define up to 9 headings in MS Word and it would be ideal if all would be recognized and became visible somehow under the ImportedStyles tab all of them because then I could deal with them separately from s1 and s2 as they are needed and work properly within the bible books. This is an important issue …. like in our case we see that all bible books are ok. We now do not want to distort all that by pulling styles in every different way and direction to get what we want in de DOCX books and after that see that by doing this we have made a mess of the styles in the bible books
2 If a paragraph style is named after an SFM marker, then it uses that marker

3 I see some processing of paragraph styles ( <w:pStyle>) and run styles ( <w:rStyle> unless it is a hyperlink)

4 Handle bold, italic, strikethrough, underlined, highlighted text
Very good, but what we miss in our project is color … would it be possible to let SAB handle color codes like F00771# etc.
By the way CUSTOM COLORS of which there are 5, seem not to work as I expect. I defined e.g. custom color 1 in one of as red but it comes out in quite a diferent color.

5 Handle Alignment or Justification
Styles can be controlled in the Styles node in SAB if you can figure out the marker and the context.

NOTE J: I assume with the marker you mean the things like s1 and s2 etc. but what do you mean by the context?

We are not Paratext users but FileMakerPro users (database for Mac/Windows) and happy to find that - after XML export from FileMaker Pro - we could convert our XML bible books into SFM with the help of script. For the Bible books we are absolutely satisfied and it works allright despite the fact that we started from a different ‘track’. Several projects are dealt with by our small team using this technical infrastructure and our projects include also RTL script. So this is the very positive part of the story.

Before I continue, please note that I am not a programmer, although I have learned to understand SAB more or less and to work with it. So what I am going to say is from the point of view of the SAB user and from the point of view of the writer of BACKMATTER and FRONT MATTER BOOKS in MS Word (I use MS Word Mac as part of the Office 365 latest version)

About the DOCX files which we use we can say this: they contain introductory commentaries and and appendices with deeper studies of each bible book. Each book has its own introduction and its own appendix. We combine the Intro and Appendix of each Bible book into one BackMatterBook. These form the final part of one branch of our project which we would like to finalize this summer by putting the APP in 4 scripts on the Store and on the website, in fact 4 APPS.

So what is our problem or our wish regarding these DOCX files and the way SAB handles them. I am not really familiar with Paratext, but it seems to me that the DOCX format is so widely used that the way SAB deals with it, might be a bit more generous … if that is the right expression. Let me try to explain.

It might be very helpful if par and char styles sheets from MS Word would be recognized by SAB and put on the TAB ImportedStyles under the STYLES Tab in SAB.
It would be VERY helpful if SAB could recognize even just the names of lets us say 10 char-style-sheets and 10 par-style-sheets coming from a MSWord doc, even if they would not contain any specific internal information (like color, alignment, size, weight tc.). It would suffice if these names would just show up in a STABLE way inside e.g. with names like c1, c2 etc. or an other character let us say x1, x2, x3 and the same for the par. styles but with a different character eg. w1, w2 etc.

I capitalized the word STABLE above because we really experienced some ‘mysteries’ which - probably due to the fact that I am not a programmer - I would interpret as instability of the program at certain points (certainly not over the whole). Let me give an example: we imported a DOCX of more then 1000 pages (these will later be split up in portions for each book). Sometimes the styles or at least some would show up in the ImportedStyles Tab and sometimes not and sometimes I would wait till bedtime and say by myself , this is too bad (because they did not appear), let me take a rest and go to bed and continue on this to-morrow. So I close SAB to open it the next morning again after I started my Mac to find that some 40 styles appeared suddenly in the ImportedStyles Tab just because … because of what ??? because I slept or because SAB took a good rest and felt better the next day ?
I am not kidding, these things happened and of course I have no explanation for them as I also have no explanation for the fact that - after 10 minutes of very happily editing some of these imported styles (which suddenly appeared as ‘a miracle’) plus a few clicks on other TABs in SAB after which I returned again to the ImportedStyles sheet - I found that all these ImportedStyles had disappeared all of a sudden just as they came all of a sudden after I started my Mac in the morning and opened SAB.

And again today I was able to let some imported styles appear in the ImportedStyles Tab. They were /qt1 up to /qt6. I did manipulate them inside SAB and generated a nice APP with all these styles working in the APP. Beautiful. I thought we are almost where we want to be … so I enhanced my word doc a bit more and threw away the previous one imported it AGAIN into SAB …. and GONE were all the styles under the ImportedStyle Tab …. not to return again …. no mater how many times I tried to import it afresh.

So my question is what is going on here … and please help … it feels like I am walking in the mud …

I’m just responding to the idea that SAB should better support Docx. In my experience of publishing, everyone who has to take a Word file and do something else with it has headaches. Even using it as a typesetting tool itself gives people headaches. The surface form of Word looks simple but what is underneath can be very complicated especially if editing has gone on. Filtering what is wanted from what is not wanted is very complicated and most things are unwanted.

The USFM tools chains produce printed Bibles, apps, web pages and eBooks from SFM. Word is designed to produce a formatted page. Start with the simplest markup possible that caters for your publishing needs for all of your outputs. Don’t start with the most complicated.

USFM can take time to learn.

You could try saving-as and use Web Page, Filtered which produces a cleaner HTML that might work better. My test on another document shows that it works well. But you may need to make sure your HTML is in UTF-8 encoding not any non-Unicode encoding. I did not find the option in Word to output UTF-8 encoding. Colors are maintained.

HTML is not converted to SFM in SAB it is used directly.

think you mcquayi for your reply. The best part to it seemed to try WEB PAGE Filtered what I tried without success. Alas!
Regarding the complexity of MS Word I leave that because it is not my choice, but SAB team opened up the possibility to import that format and so I thought let us try (we were actually working in Nisus on Mac), Since as I explained we are in the end phase of our project - and for various reasons have chosen not to work in Paratext - it is not a workable proposal to change to SFM at the moment.

The problem - we must be honest - is iN SAB – I have explained extensively my experience with these BackMatterBooks iN SAB - and on this point I miss any explanation about the dual character of SAB - one time it lets me make an APP BackMatterBook with all I need and the next moment - reason absolutely unknown - it cancels all that work and you can not repeat or continue that work any longer. It shows on the one hand that SAB is able to do a very nice import of MS Word docs and on other times it throws that nice work all of a sudden in the trash. The first thing gives me hope and therefore I think it should work with MS Docs because it already did, but SAB is not stable. Maybe here is a challenge for your team … the things we do in MS Word are all very simple and as said I do not even need to see style characteristics inside SAB only editable styles and style names in the ImportedStyles Tab … once I find them there I can edit them nicely in SAB and then the work is done and we have beautiful BackMatter Books …

Later this monday ….
some very good news …I discovered (I should have looked there before) that we ca also look ate the STYLES just above the folder BOOKS and then above CONTENTS MENU where we also have STYLES …
I was used to select BOOKS and then MAIN SELECTION … when I had highlighted that we can see various Tabs at the right side of which STYLES is the third. There I was always looking and experiencing problems, but I did not realize that I could also look at the STYLES in the side bar. Well, it turned out that all styles are there, all the styles AND it seems they are STABLE there. So edited some of them and again I am very close to a very nice APP with Intros and Appendices but nevertheless questions remain for which I would appreciate an answer …

  • why can I not see the ImportedStyles where I used to look … Is this planned by the programmers … if yes, what is the philosophy? If no, then it seems to be a bug.
  • why are all the edits I did from the STYLES Tab belonging to the MAINCOLLECTION not saved under the STYLES as I see them at the right SideBar above the ContentsMenu?
  • why can I edit many things when I open a specific IMPORTED STYLE after I close STYLES form the Left Side Bar. E.g. I am able to align a text left, right or centred and in the APP it works as I define it, but colour does not work as I want it, nor does the size ? These are not Word problems, but at least for me they are SAB problems?

I am happy that I can move forward despite some unexplainable issues, but it would be helpful if I got some more t the point explanations that could help me cope with the last hurdles … thank you anyway for your attention for my struggles and see forward to some help …

By the way: I use the iOS Simulator which is a very good help in my situation