Monday 15 August 2011

Reliable, High Performance JS Error Tracking

At first glance, errorception seems to be demanding a lot. Firstly, you have to add a script tag on your page, a tag given by me - some third-party indie developer. Secondly, the script tag has to be placed in the head of the page. That's just ridiculous!

It should bother you that there's such a requirement. If you don't know why, I'll quickly summarize:
  • A number of things could go wrong when fetching the script from our servers. What if our servers are down? What if the DNS can't be resolved? Are you going to hold your page-load hostage just because you want to track errors?
  • Errorception could be malicious. We could be in your pages, messing with your pages. (This is a huge security issue, though I won't be addressing it in this post.)
  • Anyone familiar with YSlow! or PageSpeed will tell you that the head tag is the worst place for putting a script tag, because of the performance penalties associated with blocking network requests for scripts.
How audacious of me to ask you to do this!

The design goal from the outset for errorception has been exceptional high performance. You should have zero impact from errorception on account of any possible network latency between your users and our server, and zero impact if our servers go down or are otherwise unreachable. Yes, you read that right - ZERO impact.

Here's a quick outline of how our script snippet works:
  1. The first thing it does is create a client-side local queue for processing errors. Any error that occurs on the page is pushed into this queue. This is done completely inline on your page, so there's no network request and associated performance latency at all.
  2. Next, it waits for page load to occur. Only after page load does it inject our external script asynchronously into the page which then processes this queue.
This gives us huge advantages:
  • We can trap errors very early in the page cycle. Even before our own script has loaded from over the network!
  • Since we do not introduce any network latency in the process of loading the page, your page load time has no impact whatsoever.
  • Your scripts will typically kick in far before the page loads. If you're using $.ready or other similar functions in your favourite library, chances are you are using DOMContentLoaded. This means that your scripts would get a head-start for execution, while the errorception script's network request wouldn't even have gone out yet! I have completely eliminated the possibility that the script could interfere with your page load time. The script could even fail completely, and it will have no impact at all.

Put another way, I simply cannot mess with your page's performance characteristics. It's impossible for me to do so. Even if errorception's servers crash and burn, it cannot affect your page performance. The need for trust has been eliminated. Errorception is fast because it proactively gets out of the way. Errorception is reliable because it cannot impact your site negatively. That's the way it's designed.

There are very few third-party tracking scripts that do this. Google Analytics comes the closest with the asynchronous tracking script. And even then, they only help you in your rendering speed. Your window.onload is still at Google's servers' mercy. I can't believe there are so few people doing this right - honestly, it's not even that hard!

There is a minor disadvantage to this approach though. It's possible that your users will go click-happy on your page, and they might be navigating between pages before page-load is fired. In that case the local queue wouldn't be flushed to errorception's server, essentially meaning that we wont be able to track such errors. I'm fully aware of this, and took the call that that was fine. The philosophy in three words is: "Performance over accuracy". I'd err on the side of not recording data, rather than doing anything to hamper your site's performance.

For those who are interested, the following is the extra-verbose version of the tracking script.



Don't worry that it looks too big. It actually compresses very well (298 bytes gzipped):


Thursday 11 August 2011

Introducing Errorception

I've been a client-side JavaScript developer for most of my professional life. Being a JS developer is very different from being a server-side web developer, and most people don't understand or appreciate the difference. This post is about one of those differences.

In sharp contrast to server-side setups, my JS doesn't run in controlled environments - I don't have any control on either the runtime or the OS, or for that matter almost any aspect of how the application will run. It's code running literally in the wild.

Several times, users call in to say that "the website doesn't work", and I end up having no idea why. I have no insight into the end-user's machine or browser. It's like debugging with a blindfold on. It's beyond painful. It worries me that for every user who called in, there might be many more who didn't. At best (if you can call it that) it's a horrid user-experience; at worst it's lost revenue. Dayam!

It doesn't have to be that way. Errorception aims to solve a simple problem: Get some insight into what is happening at the user's end when an error occurs. Gather as much data as possible to describe the state of the runtime when your application encountered the error. And do this with a strong philosophical stand: that any performance hit is simply not tolerable, whether run-time or load-time.

As a teaser, here's a screenshot of the errorception admin UI on my current dev box. It's still rather early, and things are likely to change. If you have any thoughts, I'd love to hear from you.


If you've signed up for early access at errorception.com, I'll be sending you invites in a couple of days, and you'll get to use the app for free for a limited period. If you haven't signed up, I urge you to, so that can get priority access.