A crafty, nommy, occassionally geeky blog-thing.

Archivey Goodness

With the unvieling of Google’s gmail, everyone’s a-buzz with how wonderful it will be to store all of their email in one place (1G? C’mon… I’ve got mailspools five times that size =P), and more realistically, how marvey being able to apply Google’s magic algorithms to their spools, in all their glorious indexing goodness.

Yeah, I’d probably buy that.

Well, not with actual money, but it does remind me. Finding the best way to manage those mail spools, make them useful again (grepping large text files isn’t my idea of fun). I know I want something with an SQL back-end. Data that’s stored in a relational database is intrinsically more useful, and easier (and faster) to manipulate, than data which is not. Simple as that. With MySQL now able to index blobs, and capable of doing full text indexing, archiving message bodies suddenly sounds a whole lot more possible.

Web-based is also preferrable. If I can find a good CLI tool, which already does what I want, and can run in my shell, then I’d bite. But somehow, someway, I need it to be accessible where ever I go. Data which is tied to my desktop is neither interesting nor useful.

I’ve spent the past week playing with Zoe, which promises to be everything that I want. It is Java based, and the hit to the system became apparent after running it for a couple of days. But it was bearable.

Some things about Zoe I really liked. It aggregates and catalogs just about anything. Email messages, RSS feeds, usenet posts, any computer document that you want to throw at it, it will add to its catalogue.

Like blosxom, a number of its filters can be intuitively requested through the URL. Requesting items from a particular day involves tacking /$year/$month/$day to the end of the URL. From a particular user? /address/$tld/$domain/$user

There’s a sidebar, which mines, sorts, and displays various information from the current result set. The sidebar includes a quicklink to everybody who’s contributed to result set (i.e., sent one of the email messages currently showing), any mailing lists, attachments, and links contained in the message bodies. That’s a feature which I wouldn’t have thought of, but which I found myself using quite a bit.

OTOH, Zoe has some frustrating limitations which make it virtually unuseable for me. One of the most basic requirements I have for any email-specific search tool is the ability to do header specific queries. I don’t need all headers available to me. But being able to pull up all of the messages sent to a particular address? Or with certain keywords in the subject line? All messages from a particular mailing list? Seems simple enough, but when I asked about this on the Zoe mailing list, I was informed that Zoe doesn’t work that way. More, that I shouldn’t need to have access to those headers. What search could I possibly want to do, that couldn’t be resolved simply by general keyword? Lots, I can think of. I couldn’t even figure out how to pull up all of the messages within a given date range. One day? Sure. An entire month? Forget it.

Which is a shame.

Back at square one, I’ve tried playing with pronto for a bit, which is a perl based POP client which stores messages in a MySQL database. Its a bit chunky, and is tied to the desktop, but gives me ideas. With pronto already handling the first hurdle—punching message headers, attachments, and bodies into MySQL—why not write my own web-based front-end? I don’t require any of the sendmail functions; I’m not looking for a webmail client. I just want an easy way to browse and seearch all of the data once its punched into the database.

Something like rss2email could be a quick and dirty way to get my feeds in there as well.