Ramblings

July 6, 2008

Getting rid of the dreaded “can’t encode” errors with Python and MySQL and SqlSoup

Filed under: automation, cool, dev, mysql, python, sqlalchemy, sqlsoup, tip, xls — michaelangela @ 10:47 pm

So I was getting problems inserting data that has the ‘™’ symbol in it. Brutal stuff. Couldn’t find a way around it. Did it by hand in the end. Nasty. The problem was that I was getting

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122' in position 12: ordinal not in range(256)

where the '\u2122' is the ‘™’ symbol. There were some others as well. Finally happened upon this post which, fortunately, cleared it up. To use this tip with SqlSoup, the connection string needs a couple of extra parameters.

Instead of

db = SqlSoup('mysql://user:password@host:port/database')

use

db = SqlSoup('mysql://user:password@host:port/database?charset=utf8&use_unicode=0')

The requirement of course is that the database tables are set up with the UTF-8 charset.

But now I can thrash data at will! Mwahahahaha! Sorry… but now I can pull in unicode data from an Excel file and throw it into a database with rediculous ease. OK not rediculous ease but wow this simplifies it.

[edit]I like to use iPython to do my data thrashing with SqlSoup and SqlAlchemy. If you use the second connection string with the charset and use_unicode options, don’t reset the db connection to the previous one without those. Once doing that I couldn’t set it back and had to exit and restart the session. Bummer.[/edit]

mysql unicode issues – sqlalchemy | Google Groups

a. have use_unicode=0, set charset=’utf8′ on the connection, AND

b. ensure you have
table options = {‘mysql_charset’: ‘utf8’} when creating tables, and
all should be well.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.