According to the docs xhr is an alias for xml_http_request, a wrapper for the post, get, etc. methods for testing that flags a request as an AJAX style request. However, for at least some versions of Rails (I’m using 2.1), the xhr alias is broken. Using it gives the following error: ArgumentError: wrong number of arguments (4 for 3). The solution is to use xml_http_request, details here.
xhr != xml_http_request
August 14th, 2008When Optimization is No Longer Premature
August 13th, 2008Pre-mature optimization is the root of all programming evil.
– Donald E. Knuth
The latest change to Amethyst, dropping DSPAM and using a simple frequency of use scoring, is working well. Except it is slow. I’ve tweaked the database access from Ruby on Rails about as much as is feasible (mostly combining multiple updates into a single SQL statement, e.g. UPDATE tokens SET occurrences = occurrences + 1 WHERE id IN (1,2, 3, 100, 1001, 1234)). By dropping down from ActiveRecord instances to SQL and pre-computing in the background where straightforward I’ve speeded up the Web/user interface to where the response is acceptable and the delays are where expected (i.e., searches but not click-throughs, up and down votes, etc.).
However the fetches of the RSS feeds and updates of the database take a long time. When my laptop has been off or off-line for several hours getting caught up is a problem. With DSPAM (written in C), it was possible to update in one big gulp. I’ve spread the update over four cron job invocations and it is still a problem. With all the code in Ruby, updates can take several minutes, frequently more than the five minute interval between cron jobs that do the updates. The result is multiple entries for the same blog story. I’ve add the use of a lockfile to prevent multiple instances of the cron job, but there are still occasional conflicts and unexplained errors. And I like to not wait 20 minutes for the updates to complete.
Since I know from experience with DSPAM that dropping into C will speed it up enough and the current slowness is a problem, it seems to me that optimization is no longer pre-mature. And the place to start seems to be the DSPAM database access code without the scoring code.
Updates as details emerge. The DSPAM code is a bit of a slog, it has more options than you can shake a stick at.
lockfile
August 7th, 2008The change from DSPAM to a home-grown solution for ranking the RSS feed articles in Amethyst has generally been a change for the better. In spite of several bugs that skewed the statistics, it generally behaves as desired. Except it is much more CPU intensive. DSPAM is written in C and eats a fair amout of CPU time. Amethyst is entirely in Ruby and Ruby on Rails. When the laptop has been off for several hours, it really eats up the CPU cycles when up and reconnected to the Internet. So much so that I’ve had to spread the catchup over 20 minutes instead of letting it do it all in one go. The reason is that the RSS feed refreshes run as a cronjob and don’t always complete in the 5 minutes between invocations. So there are multiple copies trying to update the database at once, leading to duplicates. Not good.
So I did a little searching around. There is a Ruby Gem, lockfile, for this and similar problems. It creates a a file, the lockfile, and runs a program if the lockfile doesn’t already exist. It deletes the lockfile when the program completes. If the lockfile already exists, it can retry for an period of time or a number of retries. It is NFS filesystem safe and has rudimentary stale lockfile detection (based on the age of the lockfile).
So far it has blocked just one invocation of the RSS feed refresh cronjob and I’m not seeing the duplicate key errors I had been. So far, so good.
Amethyst dumps DSPAM
July 30th, 2008DSPAM does an excellent job of filtering spam out of my e-mail. I’ve been trying for two years to tweak it to do a good job of adaptive ranking of articles in RSS feeds. It hasn’t worked and now I’m trying a home-grown solution.
DSPAM does Bayesian classification (among several algorithms) and is tweaked for spam filtering. Part of the problem is it does classification, a yes/no decision. I need ranking, this is more interesting than that. Basic mismatch. And it has been optimized for e-mail. It recognizes e-mail headers and bodies and treats them differently. Not needed and even detrimental. The result was wild jumps in rankings of articles and occasional strange result, e.g., “Sponsored Link” articles had sunk to the bottom of the heap where I wanted them and stayed there for months. Suddenly they were scattered all through the rankings and while I could downgrade individual items, new “Sponsored Link” articles continue to show up all over.
The new algorithm uses several ideas from DSPAM, bi-grams (word pairs as well as individual words) and the basic database structure (an article has many words/word pairs). Rather than decide ahead of time what makes a good scoring algorithm, the database stores all actions on an article and all word/word-pairs. The actions are:
- clicks thru
- votes up – I am more interested in this article than it’s current ranking
- votes down – I am less interested in this article than it’s current ranking
- hide – stop showing this article (e.g. duplicates)
- expires – article fell off RSS feed without ever being read.
Each work/word-pair also records how many times it has occurred. The current scoring algorithm is:
sum((click + (ups – downs)/2)/occurrences)
# of word/word-pairs
This works fairly well. It doesn’t have the wild jumps on up/down voting an item and articles I truly have no interest in continue to cluster at or near the bottom of the rankings.
After there is several weeks data, I will pick some stories throughout the rankings give them scores and then trying various curve fitting methods to find a better ranking algorithm.
Duty Cycle isn’t Panacea
July 22nd, 2008I wrote Duty Cycle to throttle back CPU intensive programs that boost my laptop’s temperatures beyond what I was comfortable with. It works fine for kernel builds and most other things I tried it on.
Lately I’ve been making some changes to Amethyst, a Ruby on Rails app, that require significant changes to the database — changing the primary key of some tables, merging all uppercase/lowercase versions of a word into a single record, etc. Guess what, some of the table have 1/3 million records and the changes take time and CPU power, i.e., the laptop heats up. So I killed the conversion program, and restarted it with Duty Cycle’s default 50% duty cycle. The CPU usage drops from 99% to 98%! Huh! Oh, the conversion program is being throttled by Duty Cycle, but it’s mostly making calls to the MySQL database server which is doing most of the CPU intensive work. Cutting the duty cycle back to 5% still loads the system significantly, but the temperature stays under 70°C.
Save & Cancel Buttons in AJAX Forms
June 18th, 2008More than one button on a HTML form is possible (e.g. Save and Cancel). The details are not obvious, but there are tutorials: Using multiple submit buttons on a single form and Multiple Submit Buttons in HTML. The key is using the same name attribute for each button but different value attribute values. The server can sort them out by value.
On AJAX forms, this doesn’t work. The browser sends all value attributes. Several Websites I use do this so it is possible. Digging around in them and watching the requests and responses with Firebug reveals some interesting things. Many of the Cancel buttons are purely client-side! No server request or response.
I’ve found out enough to do what I want, but this article is hardly complete. For the buttons after the first, use a different URL. A GET request would do what I want, so whether a POST would send the form contents is left as an exercise for the reader.
To build a browser side Cancel button for in-line (AJAX or remote) edit:
- Give the content to be edited some enclosing tag with a unique ID, e.g. “task_123″. The server should change the style to “
display: none“. - Insert the edit form after the element with ID “task_123″. Give it a unique name, e.g., “form_task_123″. Firefox at least will let you open up multiple in-line edits.
- At the end of the form place tags like this:
<input class="save" value="Save" name="commit" type="submit" /> <input class="cancel" value="Cancel" onclick="Element.remove('form_task_123'); Element.show('task_123');" type="button"/> - Enjoy
Hope this is sufficient detail, if not you can look at the details in my Ruby on Rails app TagFlow hosted at RubyForge. Look at the inline_edit and inline_update actions the tasks controller and the associated RJS templates.
Programming, Process, & Feedback
June 2nd, 2008The original waterfall model of programming process had no feedback loops, i.e. “complete and correct” requirements were handed off to the architects who produced a “complete and correct” system architecture, etc. It quickly became obvious that the various documents were seldom completely “complete and correct” so some rework was required. But the various process “gurus” still make pronouncements like “complete and correct” is still doable and try to pretend that the various feedback loops wouldn’t be necessary if you would just do the process right. It’s pretty obvious that that emperor has no clothes.
Most of the agile processes elevate feedback to a first class part of the process. On-site customer or stakeholder representatives give feedback, unit tests give feedback, if pair programming is used, programmers give each other feedback, etc. Feedback is no longer reluctantly allowed in to accommodate imperfect programmers but is encouraged, built into the process, and pushed to as early as possible. Testing can begin as soon as there is executable code.
Installing Garnet VM apps via Web Server.
May 19th, 2008The recommended way to install applications in the Garnet VM (Palm Pilot virtual machine) is with an SD card. I don’t have one, am cheap and don’t want to buy one, and don’t have an SD card reader. So I tried another way that works. Downloading the application files (.prc and .pdb file) from a Web server. In my case I just down loaded them from a Web site such as Freeware Palm, uncompressed or unzipped as needed, and moved the files to the public directory of a Web server on my laptop. From there I just plugged in the URL into the browser on the Nokia and downloaded them. The installation in Garnet VM is just like applications installed via an SD card, just from a different location in the Nokia.
Nokia N810 and Garnet VM
May 19th, 2008I was a bit surprised how few applications there are for the N810. My plan was to replace an aging Handspring Visor. There just aren’t the apps to do that yet. I’m going to write some Real Soon Now, just as soon as I wrap up some Ruby on Rails app releases.
There is a Palm Pilot Virtual Machine, Garnet VM. It is in beta and supports only some Palm OS applications. I am digging through the apps I have on the Handspring, but most won’t install. So I’m trying to find free/cheap equivalents that will. So far Teapot seems to be an acceptable replacement for Toast Timer to time meditation sessions and watering sessions.