Encoding nightmare

Hi,
I am having en encoding nightmare. I have a large CSV file which has been spewed out of an old FileMaker database. It comes to me in Western(MacOS Roman). I am using PHP to read in the file and import it into a MySQL database.

The problem is that characters like the registration mark ® comes out as an umlaut when the data is taken from the database and displayed on the page. These characters are present in the database, so the registration symbol seems to be converted to ¨

I’ve tried various forms of htmlentities() in PHP with no luck. There seems to be few options for htmlentities(), and none give the results I want.

There has to be a solution to this - any suggestions?


dynamo mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

You’re going to need to convert the character set from the input text to the format your database needs. Check the charset in your database, and then look at the iconv command for a really nice and neat way to translate.

What I would recommend is that you explicitly set the PHP script to be the output format you want to insert into the database using mb_internal_encoding(), then use iconv() to do the actual conversion.

There is another option, and that is to do it all in SQL, but I’ve never successfully done that so I can’t recommend it.

Walter


dynamo mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

One other idea–since it’s all CSV, try opening it in BBEdit and converting it to another charset there. Save as a new file in the encoding that your database expects, and then use phpMyAdmin to convert the new CSV to SQL.

Walter


dynamo mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options