Convert from Doc(x) to USFM paratext

Is there a way to painlessly convert Docx files that were used for picture book app to paratext? This is already being done within SAB, as it displays as paratext on the source tab of every book; but I can’t find if it’s actually being stored anywhere in paratext format. Would be nice to be able to “export source” or to have some way of batch converting hundreds of files.

I trust you’ve already looked in "Documents/App Builder/Scripture apps/App Projects/project/_project_data ?

I did try a test with a DOCX file and I don’t see that the app builder saved the USFM file.

This is an interesting topic. Since SAB doesn’t seem to store the USFM data anywhere in the project_data folder, that must mean that it recreates that USFM data from the .docx file each time. But when does it do that? I guess you could test that by making a large .docx file, and see at exactly what point there is a bit of a delay in processing the project - when you open the project? When you open the source tab? When you build? But maybe the developers could just enlighten us. When does it actually do the .docx to USFM conversion?

And as for this “export source” idea - I think that would be helpful at times. But I’m thinking there’s something else that might be even more helpful. What has tripped me up at times is importing a book as a .docx, but the import doesn’t work quite right, so I go back to the .docx file, play around with styles, and then delete the old book and retry the import. The main reason I’m using .docx files in the first place is to easily import the images. Once the images are in, I would rather work directly with the USFM text, which can be a lot more precise. So how about this: make a feature where you tell SAB to convert the imported .docx file to an SFM file, and then to use THAT file as the new master. That file could be copied out of the _data folder (which is what the original poster needs), or it could be modified to tweak the SFMs (which is what I would like). The only potential problem with that is I’m not sure what SAB thinks about modifying the files directly in the _data folder. Even if I import an SFM file now, can I modify that file directly in the _data folder? It would be good to clarify that.

At the moment the Word to SFM conversion happens behind the scenes. But adding an ‘Export Source’ button is easy to do and will be in the next version of SAB. Thanks for the suggestion!

1 Like

This is now in SAB 4.6.1.

An ‘Export…’ button has been added to the Source page of a book created from a Word document, to allow you to export the book as a text file with SFM markers.

1 Like

Thanks for that, Richard. Question for you… If I perform these steps:

  1. import a .docx file
  2. export the source as SFM
  3. remove the .docx book from the project
  4. add in the SFM file (that was exported) as a book back into the project

will the result be identical? In other words, all of the image files that were created will still be in the project and still have the right names (as given in the exported SFM file)?

If so, then what I was asking above will be fulfilled: I can export to SFM, then “work directly with the USFM text, which can be a lot more precise.” My initial tests seem to indicate that this basically works, but I had to tweak the image names a bit. But it would be nice to get a confirmation.

Interesting … Ian advised me NOT to use DOCX but instead try export the DOCX to HTML and then work a bit in the code (this happened probably just a bit before the export button was created). That turns out to be a pretty good solution in our case for non-bible books.
But what would you generally recommend?
a. use the Export button in the source page
b the procedure followed as advise by Ian

and … is it worthwhile to also create such a button in html books?

Font colors and some paragraph formatting are not preserved in DOCX to USFM conversion.

There is no need for a HTML to USFM conversion as the app can be built with HTML basically unmodified. But HTML does not currently support chapter breaks. DOCX and USFM support chapter/page breaks.

So there is a trade off of what feature sets you want.

Hello,

Did we lose the ‘Export Source’ button? in the new Mac update? I used it to export the Word document into text and adjust the codes.

I see the HTML export but not the one on the Source window.

It was very useful.

You should still be seeing a button with the caption Export… at the top right of the Source tab for a book - when the book is a Word document (.docx).

Oh, that is what happened. So when it is a Word doc it shows up. When it is text there is no need. Thank you. I have one more question. I moved my book text file, is there a way to have SAB find the new file so that it can update or do I need to keep that text file in a permanent location? thank you.

SAB built to work with Scripture coming out of Paratext. Those files are always in the same location on the one computer.

If you moved the file, yo have two options.

  1. The safe option is to delete the book and re-import the book from the new location. (If you made tweaks to the book all those tweak will need to be made again.)
  2. Open the .appdef file and edit the location of the source there. (If you don’t know what you can and can’t do in an XML file this can cause your project to not load. So be warned, backup first.)

Best if you keep your files in a consistent logical place.

1 Like

I am a newbie to SAB. I have taken a docx file and managed to create an Android app of that single book.

But I also need to access the SFM file - but cannot manage to follow how to do this.

Please can someone give me a step by step guide on how to do this?

Thank you

Maurice

PS I am using version 9.0 on a Windows 10 machine