[Pro] Cleaning up text

Hi chaps I have a new project and need to import loads of text in differing formats pdfs word etc, I’ve tried pasting into ‘textedit’ to remove formatting but with some doc types the hard returns stay, I found and excellent little prog called ‘clean text’ that does the trick but it’s $25 not a lot but I only need to de-format some text … is there another way ? Thanks one and all Roger


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

You could try: Bare Bones Software | TextWrangler is now BBEdit -- and still free! It's time to switch.

Its pretty good and Free! You can view, search and replace hidden
formatting characters etc.

David Owen :: Freeway Friendly Web hosting and Domains ::

http://www.ineedwebhosting.co.uk :: I Need Web Hosting Mac friendly web hosting and domain registration
:: http://www.PrintlineAdvertising.co.uk

On 2 Apr 2009, at 16:23, Roger Burton wrote:

Hi chaps I have a new project and need to import loads of text in
differing formats pdfs word etc, I’ve tried pasting into ‘textedit’
to remove formatting but with some doc types the hard returns stay,
I found and excellent little prog called ‘clean text’ that does the
trick but it’s $25 not a lot but I only need to de-format some
text … is there another way ? Thanks one and all Roger


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Shouldn’t you be asking the questions “How much will I have to charge
the client to do it all ‘manually’?” and “Can he/she pay me another
hour on this job?” - the second of which would surely cover the cost
of an application you may well want to use again. And the answer ti
the first might end up more than and hour!

Colin

On 2 Apr 2009, at 16:23, Roger Burton wrote:

Hi chaps I have a new project and need to import loads of text in
differing formats pdfs word etc, I’ve tried pasting into ‘textedit’
to remove formatting but with some doc types the hard returns stay,
I found and excellent little prog called ‘clean text’ that does the
trick but it’s $25 not a lot but I only need to de-format some
text … is there another way ? Thanks one and all Roger


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Thanks David, that was quick I’ll check it out and Colin, I agree and am very rarely hesitant about buying good software if it adds to my ‘tool box’ this just seemed a little too much to pay to get rid of some ‘returns’, if I cannot find an alternative I will, of course, pay up. Thanks for your contribution, regards Roger (being a little careful with cash at the moment, apart from the ‘crunch’ I need to stump up for a new Adobe Suite before their ‘offer’ closes at the end of April) !


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

On 2 Apr 2009, at 16:47, Roger Burton wrote:

Thanks David, that was quick I’ll check it out [TextWrangler]

TextWrangler is also very good for tidying up code snippets - like
changing pointers to your own web server or e-mail. Definitely one to
have on board. As to the Adobe Suite, if I can keep the car going
another year, I might just be going for that upgrade myself.

Colin


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

I know what you mean Colin. Check this out though; Quidco 10% cashback … not to be sneezed at ! Roger


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

I use TextEdit PLus for stuff like this - just because I am used to using it
and it works for me. Easy to find and replace returns. Clean up text
whatever. I’m sure Textwrangle is also. Both are free though TE+ has an
option to buy as a thankyou only.
all the best
Brian

Roger Burton said recently:

Thanks David, that was quick I’ll check it out and Colin, I agree and am very
rarely hesitant about buying good software if it adds to my ‘tool box’ this
just seemed a little too much to pay to get rid of some ‘returns’, if I cannot
find an alternative I will, of course, pay up. Thanks for your contribution,
regards Roger (being a little careful with cash at the moment, apart from the
‘crunch’ I need to stump up for a new Adobe Suite before their ‘offer’ closes
at the end of April) !


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

All of that advice as ever is really helpful, interestingly I already had but had forgotten that I had barebones, though when I remove line breaks I’m still left with something, some kind of return or para break, I can see it if I turn on invisibles - it’s like a lower case ‘n’ without the left side ‘descender’ any thoughts as to what this is and how I can remove it - I’ve tried search and replace but cannot work out what ‘hidden’ character it is ?


freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

That’s just how the line-feed character looks in BBEdit with the
invisibles on. I think of it as the “lazy L” character. You may be
seeing multiple line-breaks used instead of a paragraph.

One thing that might speed your text input is to remove all instances
of double-return, replace them with a single instance. Again with the
Grep option on in BBEdit / TextWrangler, the find would be r+
(backslash-r-plus sign) and the replace would be r (a single
backslash-r). Put those in and then press Replace All.

If you have text that has forced line-breaks within paragraphs, you
will have to try something different, though. There is a particular
character that should be used for a line-break within a paragraph –
say if you are laying out an address block or similar – and that’s
the Option-Return. Unfortunately, Freeway converts those into full
paragraph returns on import, and I haven’t found a way around that.
You’ll have to work through them manually, although you can make it
easy to find them by replacing them in BBEdit with a known nonsense
string, like @@@ or something like that.

Do that in three steps. First, replace all real paragraphs (the ones
the typist made with double-returns) with some other nonsense: Find
r{2,} (backslash-r-left curly brace-2-comma-right curly brace) and
replace with $$$ (assuming you don’t legitimately use that elsewhere).
Then find any single returns still standing: Find r (backslash-r)
replace @@@. Last, find $$$ and replace with r (backslash-r). Import
the text into Freeway, and then find @@@ and replace it (manually –
Freeway has no means of entering “control” characters in the replace
field) with an Option-Return.

I love love love BBEdit’s Grep search (and I think that TextWrangler
has it as well). A little time invested figuring out the correct
combination of magic characters and such will allow you to automate
these problems away.

Walter

On Apr 3, 2009, at 4:14 AM, Roger Burton wrote:

  • it’s like a lower case ‘n’ without the left side ‘descender’ any
    thoughts as to what this is and how I can remove it - I’ve tried
    search and replace but cannot work out what ‘hidden’ character it is ?

freewaytalk mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options