Monday 15 August 2011

Reliable, High Performance JS Error Tracking

At first glance, errorception seems to be demanding a lot. Firstly, you have to add a script tag on your page, a tag given by me - some third-party indie developer. Secondly, the script tag has to be placed in the head of the page. That's just ridiculous!

It should bother you that there's such a requirement. If you don't know why, I'll quickly summarize:
  • A number of things could go wrong when fetching the script from our servers. What if our servers are down? What if the DNS can't be resolved? Are you going to hold your page-load hostage just because you want to track errors?
  • Errorception could be malicious. We could be in your pages, messing with your pages. (This is a huge security issue, though I won't be addressing it in this post.)
  • Anyone familiar with YSlow! or PageSpeed will tell you that the head tag is the worst place for putting a script tag, because of the performance penalties associated with blocking network requests for scripts.
How audacious of me to ask you to do this!

The design goal from the outset for errorception has been exceptional high performance. You should have zero impact from errorception on account of any possible network latency between your users and our server, and zero impact if our servers go down or are otherwise unreachable. Yes, you read that right - ZERO impact.

Here's a quick outline of how our script snippet works:
  1. The first thing it does is create a client-side local queue for processing errors. Any error that occurs on the page is pushed into this queue. This is done completely inline on your page, so there's no network request and associated performance latency at all.
  2. Next, it waits for page load to occur. Only after page load does it inject our external script asynchronously into the page which then processes this queue.
This gives us huge advantages:
  • We can trap errors very early in the page cycle. Even before our own script has loaded from over the network!
  • Since we do not introduce any network latency in the process of loading the page, your page load time has no impact whatsoever.
  • Your scripts will typically kick in far before the page loads. If you're using $.ready or other similar functions in your favourite library, chances are you are using DOMContentLoaded. This means that your scripts would get a head-start for execution, while the errorception script's network request wouldn't even have gone out yet! I have completely eliminated the possibility that the script could interfere with your page load time. The script could even fail completely, and it will have no impact at all.

Put another way, I simply cannot mess with your page's performance characteristics. It's impossible for me to do so. Even if errorception's servers crash and burn, it cannot affect your page performance. The need for trust has been eliminated. Errorception is fast because it proactively gets out of the way. Errorception is reliable because it cannot impact your site negatively. That's the way it's designed.

There are very few third-party tracking scripts that do this. Google Analytics comes the closest with the asynchronous tracking script. And even then, they only help you in your rendering speed. Your window.onload is still at Google's servers' mercy. I can't believe there are so few people doing this right - honestly, it's not even that hard!

There is a minor disadvantage to this approach though. It's possible that your users will go click-happy on your page, and they might be navigating between pages before page-load is fired. In that case the local queue wouldn't be flushed to errorception's server, essentially meaning that we wont be able to track such errors. I'm fully aware of this, and took the call that that was fine. The philosophy in three words is: "Performance over accuracy". I'd err on the side of not recording data, rather than doing anything to hamper your site's performance.

For those who are interested, the following is the extra-verbose version of the tracking script.



Don't worry that it looks too big. It actually compresses very well (298 bytes gzipped):


7 comments:

  1. why not create a errorception._errs variable?
    "_errs" is kinda generic?

    ReplyDelete
  2. I'm open to change the variable name to anything short to type, but do you know if it's being used by someone already, or in general how a possibility of clashes can occur?

    It's prefixed by an underscore, and those are meant to be used as privates by convention. It's similar to Google analytics' "_gaq" variable.

    ReplyDelete
  3. Some questions / comments

    1. Any specific reason why https is not supported?

    2. Returning false for window.onerror may be a bad idea in case are watching for it. Can't you simply ask users if they want to return false?

    3. You seem to only capture window.onerror - is that enough : http://blogs.cozi.com/tech/2008/04/javascript-error-tracking-why-windowonerror-is-not-enough.html

    4. _errs is an array initially and you change it to an object with the push function. I think it is cool coz you wanted to use only one var. After this, every time a push is made, you are creating an iFrame which is an expensive DOM operation. Consider a case where errors are occurring in a loop. It will slow your page as iFrames are injected and removed constantly.

    5. If the iFrame is display:none in the external script, why bother to make form elements hidden? Also, why not simply submit the Stringified version of the error to the server instead of multiple form fields?

    6. Did you use post instead of get coz of GET size limitations?

    7. Why do you add your external script before the first script instead of simply appending it to the body? Will inserting it before the first script block to load your script before the script of the page? Network prefetching is done in most browsers, so this may not be as big a problem.

    ReplyDelete
  4. 8. Also, what if some script in my page defines a window.onerror after your code snippet.

    ReplyDelete
  5. Thanks for these questions, Parashuram. I should really put up a FAQ page, but for now I'll just answer your questions here.

    1. I don't want to inadvertently send any sensitive data from a secure page to the errorception server. Though that's not happening right now, it might be an issue when/if I start recording more data from the page. But then, maybe I'm over-thinking the problem given that it's not even an issue at this time anyway.

    2. Don't want to ask the user what they want - it's a additional question that needs answering. I'm assuming you bring this up because it's hard to debug on-page with the return value, and I'm aware of that. I'll see what I can do to fix this.

    3. I'm aware of this issue in event handlers in Firefox. Unfortunately, workarounds are rather ugly and will hurt performance. For now I've chosen to ignore this. I'll come back to this at a later time. Also, this is a browser bug, and should ideally be fixed by Firefox itself, though I know that's not a great argument. https://bugzilla.mozilla.org/show_bug.cgi?id=312448

    4a. The reason I'm redefining _errs is so that I won't have to change the onerror handler. It will be unexpected behavior to redefine the handler at a later arbitrary time in the page lifecycle, especially if other scripts in the page have assigned their own handler. The fact that _errs is still the only global variable is just a great by-product of this, but I'm really just trying to be a good citizen on the page.

    4b. Iframes in general are problematic, and I'm still on the lookout for a good alternative. However, the problem you've mentioned is easily fixed with some sort of client-side rate-limiting and aggregation, so that errors are posted less frequently and in batches. I'm very close to releasing that in beacon.js.

    5. No particular reason to make the input fields hidden. The reason I'm not serializing data manually is because I'd need a serializer of some sort. JSON.stringify might not be available everywhere (old IE for eg), and I don't want to add to the payload by sending a serializer on the wire. Form fields work well enough.

    6. Two reasons: first is the length limitation you pointed out, and the second is semantics - POST is a better verb to use. However, maybe I'm just being too uptight about these things. Using a GET does solve a lot of problems.

    7. Details here: http://www.stevesouders.com/blog/2010/05/11/appendchild-vs-insertbefore/ In a gist, it's done this way to circumvent IE bugs.

    8. If a script redefines window.onerror, errorception would be effectively disabled. I think that's expected behavior if someone wants to define their own error handler, so I'm ok with this. Let me know if you disagree though.

    All these are great questions and concerns. The reason I've kept errorception in closed beta right now is that I can work closely with the people who are using errorception, and iron out issues like these, and the feedback has been awesome. The final version of the code might deviate slightly from the snippet above, though my principles of high-performance and reliability will never change.

    ReplyDelete
  6. 4b. Regarding can't you simply have one form and iFrame embedded inside the page and setting target of POST requests to iFrames. That way, you will still have performance without much change.

    6. Given that you are looking at optimizations, users wont have to load the external script as the get can simply be embedded into your inline script.

    ReplyDelete
  7. 4b. I could maintain just one iframe, but I'll have to ensure that I'm timing everything right since I'd have to wait for previous posts to complete. Once complete, I'll have to recreate the iframe anyway since I can't access it anymore because of the same-origin-policy. Creating multiple iframes is easier, and leads to lesser code.

    6. The reason there is a separate JS file to download from errorception's servers is so that I can make future bug-fixes and feature releases without asking everyone to upgrade their inline snippets. The current snippet is a balance between future updates and high-performance loading.

    ReplyDelete