Importing formatted text

Word is ubiquitous, unfortunately.
Word and it’s files are bloated, certainly.
Word file format we have to work with, fact.

Fortunately I’ve been using InDesign since it’s early days and that can
gobble-up word files almost seamlessly, but FW is another case.

So because I have got a lot of very fiddly formatted Word files, basically
bibliographies with continual changes from roman to italic and back with a
bit of bold thrown in I’ve been doing some experimenting.

I thought the simplest way would be to get authors or more likely
sub-editors to mark up the fully formatted text with HTML mark-up. Around
the italic and bold in the text get them to put angle-bracketed code, then
open the Word file in TextEdit and covert/make plain text which would then
drop all formatting, but leaving the HTML mark-up as it’s just text, then
read the resulting HTML formatted text file into FW but that didn’t work
fully, not all the italics or bolds were successfully imported.

What can others suggest as the best workflow to deal with word/fomatted
text.

Best wishes Peter

================================
Peter Tucker, Oxford UK email@hidden


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

My own method, fussy though it is, is to convert all incoming copy to
plain text, open it in BBEdit, copy from there, and paste into
Freeway. Then I use a print of the Word file as an aid to mark up the
plain text in Freeway with pre-defined styles. I learned this method
the very hard way with 9 years of professional DTP in QuarkXPress 2 -
4. Everything else ends in tears.

I totally appreciate the pain this would be for you with your
bibliography. I have used this method to typeset the tiny screwed-up
text you find tucked into your pill bottles, that’s of similar scale
and degree of formatting.

You might try exporting from Word as RTF and importing that, perhaps
it might be better. But you might also try fixing the double-returns
and double-spaces that people who haven’t read Robin WIlliams’ fine
book “The Mac is not a Typewriter” are prone to insert into their text.

One paragraph return between paragraphs, one space between sentences.

And if you find yourself in the position of having to decipher
something that was “tabbed” with spaces, then just open the thing in
BBEdit where there are Entab/Detab commands in the Text menu and you
can usually sort things that way.

If the document you’re working with has used single returns for
linefeeds and double-returns for paragraphs, you can try using find-
and-replace in Word to prepare the file. Search for two returns in a
row, replace with @@@ or something like that. Then search for all
remaining returns, and replace with %%%. Finally, replace all
instances of @@@ with a single return, and leave the %%% in place.
Back in Freeway, you can find those %%%s and put a Shift-Return in
their place pretty easily.

Walter

On May 14, 2009, at 7:27 AM, Peter Tucker wrote:

What can others suggest as the best workflow to deal with word/
fomatted
text.


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

on 14/05/2009 14:06, Walter Lee Davis at email@hidden wrote:

My own method, fussy though it is, is to convert all incoming copy to
plain text, open it in BBEdit, copy from there, and paste into
Freeway. Then I use a print of the Word file as an aid to mark up the
plain text in Freeway with pre-defined styles. I learned this method
the very hard way with 9 years of professional DTP in QuarkXPress 2 -
4. Everything else ends in tears.

I totally appreciate the pain this would be for you with your
bibliography. I have used this method to typeset the tiny screwed-up
text you find tucked into your pill bottles, that’s of similar scale
and degree of formatting.

You might try exporting from Word as RTF and importing that, perhaps
it might be better. But you might also try fixing the double-returns
and double-spaces that people who haven’t read Robin WIlliams’ fine
book “The Mac is not a Typewriter” are prone to insert into their text.

One paragraph return between paragraphs, one space between sentences.

And if you find yourself in the position of having to decipher
something that was “tabbed” with spaces, then just open the thing in
BBEdit where there are Entab/Detab commands in the Text menu and you
can usually sort things that way.

If the document you’re working with has used single returns for
linefeeds and double-returns for paragraphs, you can try using find-
and-replace in Word to prepare the file. Search for two returns in a
row, replace with @@@ or something like that. Then search for all
remaining returns, and replace with %%%. Finally, replace all
instances of @@@ with a single return, and leave the %%% in place.
Back in Freeway, you can find those %%%s and put a Shift-Return in
their place pretty easily.

Thanks, Walter I was hoping you’d jump in because I thought you might have
the best answer, although not the one I wanted to hear!

I too was “with you” in the early-days pain of QXP and yes I use all the
tricks ­ massaging the text before bringing anywhere near FW or indeed any
other app.

One app that saved me a LOT of mundane work particularly when Word was not
to the top of the heap, was Nife [yes it was on the PC, now sadly defunct]
it was basically a command-line app that you setup a series of batch
operations. You poked the text file in one end and out the other you got
something half reasonable, particularly useful with CORA typesetters and
long-document or directory type publishing, where Ventura [Xerox then
eventually from Corel] was perfect, it was even briefly a Mac product.

Best wishes Peter

================================
Peter Tucker, Oxford UK email@hidden


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Have a look around for TextSoap (I think). That’s a text re-formatter of very great mojo. A few bucks in shareware fees, if I recall correctly.

Walter


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Ok, i am having this issue and was directed to this thread. As i was trying more things fruitlessly I tried again what was recommended on the previous thread and it worked. Apparently I had missed something before. Create html box, create custom style with P having space before at 3px and space after 3px. paste in text and it works. Resume looks as it should with no double spacing.


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Dang…jumped the gun looks great in fw, in browser double spacing…

confused about the whole @@@ and %%% how are those implemented in word. If i am in the find and replace area word saays it cant find those items. If i replace say ^p with @@@ or

as someone else had mentioned it shows up in freeway/browser.

Also downloaded trial version of textsoap, any tips on how to use.

Can anyone explain more simply, I am trying to get a 3 page resume on a web page. I could hand type it, but i know that more instances are going to come up like this. Thanks for any insights!


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options