Cold Hard Code

September 2008 Archives

Free Trials Done Wrong.

By J. Shirley on September 15, 2008 9:55 AM |
Comments welcome

A fairly frequent subject of discussion that comes in blogs and news feeds is piracy.  Piracy regarding games, software and justifications, etc.

A common response in people justifying piracy is that it is often times easier and better to grab a cracked version than some ridiculously limited and crippled demo version.  In this, I wholly agree.

I grabbed a piece of software that initially prompted me to register and had a convenient "Not Now" button.  I promptly clicked that button, as I'm not going to spend $40 to see if a piece of software works.

30 minutes later, it tells me, "Oh, your 30 minutes is up so I'm going to quit now."  Application state gone, only option is to buy the product.  This is a bad experience.

These are the types of behaviors that make it easy to justify pirating software.  It's cool to nag.  It's cool to have a trial period of the full version (thank you TextMate).  The idea of exiting after 30 minutes is ridiculous.

So, instead they lost all possibility of ever getting a customer on my end and likely anything else that they do.  If they pay such little attention to user experience, it certainly isn't an app I would grow to rely on.

Please, if you are working on a desktop application and want to let users try it then do that.  Let them try it as if they own it.  Just remind them that they don't.  Intruding in their work flow is a straight trip to the garbage.


Comments welcome

A review of Varnish.

By J. Shirley on September 13, 2008 2:35 PM |
Comments welcome

After several respected peers mentioned Varnish, I decided to give it a try.  For reasons that are fairly complex, so here's the story from the top.

The way our MogileFS cluster works here (and I'm just about finished with the new front-end media management code, so stay tuned next week) is quite simple. 

To start with, when designing it, I threw out the idea that we should pregenerate any of the image sizes/formats.  Instead, we simply filter and scale to a max size (configurable, but currently set to 1280x1024) and then we throw out all the meta data (EXIF profiles, etc) that we can.  This drastically reduces the size on disk, and then to spit out resized images it doesn't take long (I haven't finished stress testing this yet, but I'll post when I do).

So, now I have files stored in the MogileFS cluster that are stripped down versions of the originals.  For the sake of simplicity, just think of those as the original (if I were running something like SmugMug, I would store it in the original size, but then extract the meta data into a separate data node).

Now, when a request comes in for a media asset, we just enforce it follows a specific convention.  In our case, the URL looks like:

http://static.cartionary.com/images/{uuid}/{mutations}.{ext}
For now, we just serve images, so that part is easy to deal with.  The UUID is the unique identifier that we use to store the image and {mutations} is any series of programmatic mutations.

If the {uuid} isn't known to the system (the test is if MogileFS knows the key), it returns 404.

If it is, the system then parses {mutations}, which can also simply be 'original' and it will return out the original copy (or rather, the raw data stored in MogileFS).  Mutations are a format that I've simply concocted to determine rotation, scaling and anything else we can come up with.  For example: "s=s" means "size=small" (other sizes are "m", "l", "xl").

Combining that upstream with as permanent of a cache as we can gives us better results, because we don't have to generate thumbnails that may never be used.  They're generated on demand, and then cached on our proxy for as long as we reasonably can keep them.

The reason why I went with Varnish, over lighttpd+mod_cache or Squid, is simply that Varnish (from the varnishcmd command) allows you to purge cached items via regular expression.  Which means that if we want to delete an image from the entire cluster we just have to delete it from MogileFS then issue:
varnishadm -T :6082 url.purge {uuid}
Done!  All requests for that UUID will end up as a 404.

That's the reason why I picked Varnish over lighttpd+mod_cache and Squid (or, rather to be more specific, picking something that doesn't rely mostly on HTTP PURGE) because we'd have to know every image ever generated (that's a lot).  This is also why I decided to do away with the idea of generating permanent thumbnail and other modified images.

Every time you need to do some operation, or make some change, you're stuck with however many user contributed images * number of generated images.  It's costly, silly, and most of those images will sit dormant except for the occasional .  CPU time is really really cheap. Disk space, too.  So make your caching huge and your media cluster fast.

The decision really comes down to the fact that I'd rather have the CPU deal with on-demand requests, and have the user have to wait an extra second or two, then have a developer have to get "creative".  The reality of it is that a user, in most cases, won't have to wait.

When an image is uploaded, have a job (or preload it in the resulting HTML displayed to the user) that asks the caching cluster for the expected image sizes (small, medium, large) and then they're saved until the cluster runs out of disk space, then the least recently used items fall off and won't be missed.

I'm happy to be building this system in a day and age where I can make that decision and expect it to work out well.  I've dealt with two other high traffic photo-centric organizations, and I experienced the pain of not doing it this way.

I'm eager to experience the pain of doing it my way now.

Comments welcome

Using the right tool for the job.

By J. Shirley on September 4, 2008 3:01 PM |
Comments welcome

If you haven't read Zed Shaw's C2I2 Hypothesis, you should really do so now.  I mean it, go read it.  I'd like you to come back here but if you don't that's ok, I'll understand.

The main gist is properly identifying just what type of person you are working with, and to a more significant degree, what type of person you are (which is variable).  I have my own thoughts on the specifics that he lists that I'll likely write up later, but for now just reading it is a good start to continue my current thoughts.

I think that it equally important to understand the tools you are using. It seems a lot of people use a specific tool because they don't really think about other tools. If a problem looks like a nail, they use a hammer.  While I hate metaphors when they're not necessary, I think using the nail/hammer metaphor does work for this case.

What ends up happening is the, "When all you have is a hammer, every problem looks like a nail." syndrome.  Conversely, if you have a vast array of tools at your disposal, you are going to over-complicate problems.  There is a very important balance that must be maintained.

If I had to pick over-engineering a problem, versus simply driving a nail in a plank of wood and calling it a deck, I'd over-engineer it every time. This is coming from years of experience, and several contracts that seemed to extend in two-week intervals where I was expecting to find another gig every two weeks.  One of those went for 2 years, adding two-weeks worth of "enhancements". It was all hammer and planks of wood. Absolutely terrible.

The reason why I'm writing this is because a discussion popped up where I expressed my confusion in people using plain vanilla mod_perl, then being frustrated with testability and other nuances. Someone pointed out that mod_perl is a perl runtime inside of Apache that gives you programmatic hooks into Apache.

Fantastic.  However, that doesn't enter into what my point was.  People use mod_perl because that is the only tool they know.  I believe this is the case with most PHP-based engineering shops.  They use PHP because that is the tool they, or the managers, know.

The problem with this mentality is that by not knowing the tools, you are doomed to work within the confines of what you know.  While the skillset and knowledge may evolve, it won't have a renaissance or giant evolutionary leap.

This is the key reason for learning other toolsets and libraries that are available.  It isn't about jumping to trendy and shiny technology, it is about defining your own capabilities in relative scope to what other technology is around.

It's very easy to spot this type of individual, which I view as a blessing.  If a person ever argues against a tool (or rather, using a tool in place of something that they know) you can count on them being very limited in their ability to accurately scope and understand a problem.  They will mutate the problem to be solved with the tools at their disposal.

In context of Zed's C2I2 hypothesis, I've been struggling to really distinguish between a collaborator and an implementer.  They were very much the same in my mind.  Thinking about the usage and understanding of tools has helped clear this up for me.

A collaborator doesn't know the details of the tools.  They follow cookbook recipes.  Contrast that to an implementer, who still follows cookbook recipes but understands the tools that are being used.

The inventor, of course, usually is the one building the tools.  If a tool doesn't fit an implementers agenda, they'll either move on or smash at it until it works.  An inventor will go in and fix the tool.  A collaborator will be that guy on the mailing list you wish would go away, and probably be the first in line to argue against other tools they've never used.

Comments welcome