I know there are some genius here present that can do miracles… And I do hope for a hint in the right direction for a little problem I have.
I migrate data from one database to another. Thanks to some tools that EE provide I can do that with XML files, if well translated from CSV files.
I do have access to the MySQL server, and I can export in several formats.
Now I extracted all the tables from the old server into a CSV file. Problem is only that the old website ( not made by me) used TinyMc so certain fields are cluttered with HTML tags. And it is a mess in there.
How can I extract the HTML without disturbing the structure of the CSV? I tried several things my way with TextWrangler, but after that did not work out well.
BTW translating a CSV to XML is not easy as well, I discovered. A lot of trial and error…
Do you have Sequel Pro? You can use that (on your Mac) to export the database as XML or CSV, assuming you can get it to connect to that database server. If all you have is the CSV, you could try importing the CSV into a new MySQL database on your Mac, and then export that as XML, again in the same application.
As far as converting the CSV in a text editor, you’re going to need to escape the HTML bits as CDATA blocks, not sure what the exact syntax is for that, but that’s where I would start hunting.
Walter
On Mar 20, 2013, at 6:54 AM, atelier wrote:
I know there are some genius here present that can do miracles… And I do hope for a hint in the right direction for a little problem I have.
Thank you. I’d never heard of Sequel Pro, but it sound to me as the right tool. I have MAMP installed, so I can test this locally in all safety first.
I do not understand the part with CDATA blocks, or how to handle that. http://www.w3schools.com/xml/xml_cdata.asp I am not at my Mac right now, shall post some of the HTML mark-up later, to show what I want to get rid of in some of the fields.
Thank you. I’d never heard of Sequel Pro, but it sound to me as the right tool. I have MAMP installed, so I can test this locally in all safety first.
I do not understand the part with CDATA blocks, or how to handle that. http://www.w3schools.com/xml/xml_cdata.asp I am not at my Mac right now, shall post some of the HTML mark-up later, to show what I want to get rid of in some of the fields.
Are you trying to strip all tags out, or are you trying to encode the data so you can upload it to the new server? If you strip the tags, you will be left with nothing but plain text (this may be your goal). Can you elaborate on your desired end-state?
Here’s how to strip all tags from a text file (note – this will strip XML and HTML tags, so do this on your CSV version).
In BBEdit or TextWrangler, open the text file and open the Search dialog. Check the “Grep” option on.
In the Search field, enter just this:
<[^>]+?>
In the Replace field, make sure there is nothing at all entered.
Click Replace All and marvel at the power of regular expressions.
Walter
On Mar 20, 2013, at 4:10 PM, atelier wrote:
Actually everything between < and > has to go. In the whole file.
Walter, sir, you are indeed a star. My goal is to strip all HTML entities. The formatting takes place at the frontend.
Thinking loud, one small problem remains, the double quotes which have to go as well. I guess I add them in the regulair expression search and replace?
I am afraid that I loose the structure of the CSV, not being able to transform it into XML ( my goal)
I must fiddle a bit with this, think of a stepped solution.
Be careful that you don’t remove those quotes too aggressively. Remember, they may be needed to preserve the CSV structure (if the content contains commas, it cannot be delimited by them unless fenced off by quotes).
I recommend that you fool around with Sequel Pro a bit. Make a dummy database table and put some varied content in it, then export as CSV. See what happens to it – some fields get quoted, others do not.
Walter
On Mar 20, 2013, at 5:37 PM, atelier wrote:
Walter, sir, you are indeed a star. My goal is to strip all HTML entities. The formatting takes place at the frontend.
Thinking loud, one small problem remains, the double quotes which have to go as well. I guess I add them in the regulair expression search and replace?
I am afraid that I loose the structure of the CSV, not being able to transform it into XML ( my goal)
I must fiddle a bit with this, think of a stepped solution.
Walter, it is amazing.
After having set up Sequel Pro and my Mamp, I imported the 2 sql dumps. No problems, two tables present.
Then I just (…) opened the sql dump right into TextWrangler, did the GREP thing, saved and imported again.
The formatting gone.
Flabbergasted. Though I did not do the big test, setting up a local EE build to see if the migration had succeeded, I do have hope I made a big first step!
Walter, sir, you are indeed a star. My goal is to strip all HTML
entities. The formatting takes place at the frontend.
Thinking loud, one small problem remains, the double quotes which
have to go as well. I guess I add them in the regulair expression
search and replace?
I am afraid that I loose the structure of the CSV, not being able to
transform it into XML ( my goal)
Remember that csv isn’t really a valid format. It’s not difficult to
make data that can’t be saved in it such that it can be reliably
extracted again. A data ‘cell’ that contains text containing double
quotes and commas in an unfurtunate arrangement can blow it.
David
–
David Ledger - Freelance Unix Sysadmin in the UK.
email@hidden www.ivdcs.co.uk
I think I managed deleting all of the HTML tags by working directly in the sql file. And I could import this back into the database. I’ll test it out, then see how the export as XML works.
I think you mentioned recently that you use Coda (version 2?), if so, then Coda’s built-in MySQL tool is a stripped-down version of Sequel Pro. I guess Panic struck a deal with Sequel Pro devs.
Yes, I mentioned Coda, as part of my plans to ‘change horses’. I am still in that process, so Coda has to be evaluated yet…Good you mentioned their MySQL tool, thanks!