Bug in FeedTools

I’ve been chasing a bug in Amethyst for over a week. Sometimes accented characters cause problems, sometimes they don’t. MySQL will often complain about a duplicate key in a multi-row INSERT. The INSERT is correct, but the duplicate key doesn’t appear in any of INSERTs! With enough examples I figured out that they are all the prefix of the key from one of the INSERTS, up to the first accented character. But not all accented characters cause problems, though all accented characters on one feed do cause problems. I’ve been tracing the incoming data starting with the data on the network and working my way through the system.  In  chatting with Greg Foster of  the Consumers Union at the Lone Star Ruby Conference, the topic came up and he mentioned that in spite of various feeds proclaiming that they are UTF-8, sometimes they contain Latin-1 characters. He said he had a conversion routine in Ruby he’d send me. Sure enough, I looked closer and there they were! Depending on what I used to look at them, they might render as expected, or as a ‘\361′ sequence.

Later in the conference I became impatient to fix the problem and Googled “Latin1 conversion UTF8 Ruby”. At the top of the list was
How-to fix ruby’s FeedTools latin-1 parsing. There is a bug in FeedTools, it converts numeric HTML entities under 256 to Latin1 characters instead of UTF-8 characters.  The blog entry includes some code to monkey patch FeedTools to correct the problem. I dropped the code in, deleted the corrupt data, refreshed the feed, and voila, problem fixed.

Leave a Reply

You must be logged in to post a comment.