Archive for December, 2008

Amethyst and Fragment Caching

Monday, December 29th, 2008

Amethyst quickly grows to be a CPU hog, even for a single user. I’m now working my way through speed ups. Database access has been tweaked a lot, yielding better than 10x speed ups in certain areas. (Bypassing ActiveRecord and JOIN tables for direct SQL and keeping the necessary data in the article to reconstruct the join on the fly. It works, I measured.) This was successful enough that rendering is now the bottleneck.

My first at fragment caching was by feed (each with up to 25 unread articles visible). The speed up was significant, a refresh was 3-4 times faster than the initial view. However, each channel is updated once an hour (1/12 of channels every 5 minutes). So the fragment cache quickly grows stale and is totally out of date in an hour.

The next stab was caching article fragments. Articles can persist for months and only need to be rendered once until some action is taken (click-thru, vote up/down, or hide, i.e. mark as seen but not read). The article fragment cache grows stale much slower. However, there are 12 times as many articles as channels. The speed up is less impressive, 2 times faster refreshes.

All in all, I’m sticking with the article fragment cache. (Note: all work has been done with the memory storage mechanism, essentially a hash.) I’ll be posting the details of how to cajole the 3 fragment caching mechanisms that don’t explicitly implement timestamp expiry to do it anyway (mem_cache already does). I tried posting the whole mess earlier and WordPress was barfing on permission problems. Hopefully it can handle smaller bites

Testing Rails Apps and Off-line Indexing Search Engines

Tuesday, December 9th, 2008

For a variety of technical reasons, most of the full-text search engines available for Ruby on Rails do off-line indexing. (Changes to the indexed tables are added to a queue that is processed in a cronjob, i.e., changes do not show up immediately in indexes). Examples of off-line indexing search engines are Xapian, Sphinx, and Hyper-Estraier. I think all three retrieve records for indexing directly from the database. This causes problems in testing.

To speed up testing, Rails does not commit any changes to the database made from individual tests. The lot is discarded in a rollback of a transaction started at the beginning of each test. Fast, but programs outside the Rails stack do not see the changes. Even after I jumped through hoops to run the index update program within a test.

Loading the fixtures, indexing them, and then running the tests works if fixtures are all the tests search for. In MySQL, the statement “SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED” will work around the limitation, but it’s non-portable and I’d have to maintain some hacked third-party code. No thanks.