Friday 14 December 2012

Catching Cross-Domain JS Errors

As I've mentioned before, most modern browsers do not provide access to error information in window.onerror for scripts loaded from across domains. This is a very severe restriction, and has limited the usefulness of Errorception to some extent.

Fortunately, a couple of months ago, Firefox landed a patch to add this feature, and this has already been shipped with the latest versions of Firefox. Chrome is expected to follow suit very soon, since this has already landed in Webkit.

Unfortunately, this doesn't work out of the box, and will require some tweaking of your server and markup. Fortunately, the changes you need to make are minimal.

On the server

You will need to enable CORS for the external JS file you load. The most minimal way to do this is to set the following HTTP header in the response for your JS file.

Access-Control-Allow-Origin: *

That's the only server-side change you need to make!

In the markup

Script tags have now got a new non-standard attribute called crossorigin. The most secure value for this would be anonymous. So, you'll have to modify your script tags to look like the following.

<script src="http://sub.domain.com/script.js" crossorigin="anonymous"></script>

Browser support

As of this writing, only Firefox supports reporting errors for cross-domain scripts. All WebKit browsers including Chrome is expected to support this very soon. This isn't a problem with IE at all, since IE already reports errors to window.onerror irrespective of the domain (yay, security!). Standardisation for the new attribute has been proposed though it hasn't gotten anywhere.

Update: Thanks to Matthew Schulkind for pointing out in the comments below: It appears that Firefox insists that if you are using the cross-origin attribute, the script file must be served with the access control HTTP header. If the access control header isn't present, the script simply doesn't get evaluated. This is a minor annoyance at development time, so I've filed a bug with Mozilla about this.

Friday 23 November 2012

Capture Custom Data With Your Errors

The more context you have around an error, the better it'll help you when debugging. And who understands your application's context better than you!

Starting today, you will be able to record custom information with your errors. It's super simple too! Just create an _errs.meta object, and add anything you want to it!

You can pass the _errs.meta object any number of properties, and the values can either be strings, numbers or booleans. Values with other types will be ignored.

You can even add and remove properties from the _errs.meta object at runtime. So, if your user changes his or her preferences about cats while using your application, you can set _errs.meta.lovesCats = false; when that happens. The tracking script will record the new value lovesCats from that point on whenever an error occurs.

This can be a huge help when debugging your code. Imagine if you could record which user got the error, which action the user was performing at the time, and on which area of your page!

Other improvements

There have also been several improvements to the tracking code. Two hard-to-find bugs have been squashed, overall performance has been improved, browser support has been expanded, resilience has been improved in case our servers are in the middle of a hurricane, and the code is much better tested now. All of this while reducing the code size! (Ok, it only reduced by 4 bytes, but it's something, right?) Just to remind you, the code size doesn't really effect you, because the tracking script doesn't come in the way of your page load time at all, giving you maximum performance at all times.

As always, feedback welcome. I can't wait to see what you will do with this ability to record custom data.

Limits

The custom data recorded is put into the same store as the one used for raw error data and shares the same limits. That is, you can currently put in upto 25 MB of data. Beyond that, older data is purged to make room for the new data.

Thursday 22 November 2012

Raw Error Data

Since about 2 weeks now, I've been recording information about each individual error occurrence. This data was previously being discarded. I thought, why throw away perfectly fine data, if it can be useful in any way to help in debugging?

A couple of minutes ago, I rolled out a UI to start looking at this data. Now, every error details page will show you all the information that we've captured for every occurrence of the error.

Give it a spin, and let me know what you think.


Limits

The reason I used to discard this data previously is because it grows really big really quickly. So, for now, I've decided to cap the amount of logs stored. Each project gets 25 MB of storage for these raw logs. If you end up generating more logs than the limit, older log entries are discarded to make room for the new ones.

Though 25 MB might seem small, it's actually quite a bit. Considering that each log line will realistically be much lesser than 0.5kb of data, you can store more than 50,000 individual log entires before older records get purged.

That said, I've not wholly decided on the 25 MB limit, and am open to change my mind. Feedback welcome.

Tuesday 20 November 2012

Problems with MasterCard Payments

At the outset, I want to apologize for the email notifications you've been getting about "billing attempt failures", or "rebill attempts exceeded". This affects all MasterCard payments. This is not the experience I want to provide, and this ordeal has been hugely embarrassing for me.

I've been using payment services from 2checkout, who seem to have run into an issue with MasterCard payments. They have not been very transparent about the issue, and kept dragging their feet about the problem despite repeated requests to fix it soon. Today they sent an email saying that the issue won't be resolved for another month!

What you can do

If you are one of those who's affected, 2checkout would have already sent you a mail with a link to update your payment information. Please use the link to pick a different payment method (they still support Visa, Amex and Paypal among others). If you only have a MasterCard card, you could update your payment method to use Paypal and make a MasterCard payment through them. You don't need a Paypal account for this.

If you have already updated your payment information, huge thanks to you! You rock!

If you are paying for a new account, you won't have the MasterCard option when you checkout, but you will be able to use any other card, or use Paypal to route your MasterCard transaction.

If either of these is not possible, please get in touch with me at rakeshpai at errorception dot com, and we'll take it from there.

I know that this is annoying for you, and it's annoying for me to have to put you through this. This has been an embarrassing ordeal for me over the last few weeks, and I will continue to chase 2checkout until they resolve this issue. I wholeheartedly apologize again.

Monday 20 August 2012

Improved Error Management

Errorception turned 1 on the 15th of this month, and I thought I would do a major release for the birthday. However, you know how all software projects go. Fortunately, I'm only 4 days late. That's not too bad, right?

Automatic closing of errors

Starting today, all errors that haven't recurred over the last month are automatically marked as closed. This has caused the average list of open errors to reduce by over 70%!

The huge never-reducing list of errors was turning out to be quite the de-motivator, so I wanted to have a no-hassles way of keeping the list trim. The one month period is quite arbitrary, but most likely sufficient to ensure that the error has indeed disappeared.

And you can close them manually too

Errorception will keep working for you in the background, closing errors that haven't occurred in a long time. However, nothing beats the pleasure of manually marking an error as closed. Errorception hasn't provided that pleasure so far, but now you can clear your list as you please.

Re-opening errors

If an error was closed, and then for some reason occurs again, Errorception will re-open the bug for you. This way, you can keep your errors list trim, while we work in the background ensuring that you aren't missing recurrences.

Ignoring errors

This has been a highly requested feature. If an error isn't something that you can fix, usually the case with third-party embedded scripts, you can now ignore the error. Occurrences of such ignored errors will still be recorded, but will be tucked away from the main error listing to help keep your list clean. Just a quick reminder: we have always been ignoring errors automatically for you, based on several criteria. This ability has now been exposed to you as well.

And much more

The stuff listed above is just the biggest of the features in today's release. There are several small enhancements too. After all, this release has been in the making for almost two months! Poke around see if you can spot them.

Feedback welcome as always!

Monday 2 July 2012

Announcing Multi-User Support

This post is from the OMG, I can't believe you didn't already have this! department.

One of the most requested features on Errorception has been support for multiple users. Today I rolled out a build to add just this.

The reason it took so long, among others, was that I was finding it hard to distill the feature down to the least required, but no lesser. I think the current implementation does a good job of this.

Once logged in, simply navigate to Settings > User Accounts to manage who has access to your project. You simply have to provide the email address of the person you want to invite, and you are done. It's really that simple!

Invited people will get an email telling them about the invitation. If they were already users of Errorception, they don't need to do a thing. The mail is just a notification. It can't get simpler. If they didn't already have an account, they have to go through the simplest signup process possible — they have to pick a password. That's it! That's the signup process!

<aside>The real reason for not pushing out a user-facing build in the last month or so was that I've been hard at work dealing with the scale that Errorception is growing at. It's amazing. I wouldn't in my wildest dreams have guessed that Errorception would be this successful, and hadn't engineered for those kind of scales. Over the last month or so, I've been rewriting the very guts of Errorception to cope with the surge of traffic. It's like changing the parts of a car while it's running. At 200 mph. But that's done now, and the code-base is managagable once again. Today's release is the first in a series of pending feature releases.</aside>

Sunday 15 April 2012

"Script Error" on line 0

Update: Cross-domain error reporting is improving. See this post.

This blog post is to address some confusion about "Script error on line 0" errors as recorded by window.onerror — A confusion I've certainly had, and one that I've seen many others have too.

This Stack Overflow answer does a great job of explaining the details:

The "Script error." happens in Firefox, Safari, and Chrome when an exception violates the browser's same-origin policy - i.e. when the error occurs in a script that's hosted on a domain other than the domain of the current page.

Who's affected?

If you are recording errors from window.onerror, and your scripts are loading from a different subdomain or domain, and if such an external file throws an error, the error will be recorded as Script error with no more details, in all browsers except IE. Errors in IE get recorded as expected, irrespective of the hosting domain of the script.

You are not affected if your scripts load from the same domain as your page, or if your errors occur in IE.

Unfortunately, loading scripts from a different domain is a very good idea, and is something you should strongly consider. However, this severely limits the usefulness of window.onerror in non-IE browsers.

Why is this?

It's because of a possible exploit in browsers that support window.onerror. Firefox fixed this issue by restricting the information available in window.onerror for x-domain scripts. As far as I can tell, WebKit and Opera (which were late entrants in the window.onerror game) have had this behavior from the outset. In IE, even as recently as IE9, this security vulnerability still exists.

What about Errorception?

From the very beginning, Errorception has not recorded any Script error errors, because the information is insufficient to debug in any sensible way. However, I never knew the cause of this error, and admittedly have been confused about how it originated. Now that it's clear why this error occurs, it's also clear that there's a class of errors that Errorception cannot catch (hat-tip: @vitaly_babiy), and that it's necessary to come up with an alternative mechanism to catch such errors. All isn't lost yet — IE still gives enough information to debug the error in all cases, and all this only happens if you loading from across domains. However we clearly need to do better than window.onerror.

Lest you think otherwise, problems like this are in fact the very reason you should use services like Errorception. Catching and processing errors is evidently a complex task, and is something you don't want to bother with too much — you just care about the results. Errorception takes on the responsibility to catch errors for you, and to process them to make them easy to digest. Setbacks like this are common, and we've always found a way around such problems.

Why I'm telling you all this is because I think it's far better to educate people and be open about Errorception's limitations, than to hide such information. Hiding information is tantamount to lying, and I don't want to do that.

I'll be actively looking for alternative ways to record errors, since window.onerror has these (and other) restrictions. I already have a couple of ideas to improve error information, and will be testing them out soon. Meanwhile, I'm open to suggestions, if any.

Friday 13 April 2012

New Pricing Slab: $5/month!

$5/month, 500 errors/day

You wanted a cheaper plan, so we created one! For only $5/month you find out about upto 500 JS errors everyday that your users encounter on your site! It doesn't get much cheaper than this.

Just because it's cheaper, don't think that it's worse. You get all the features of the more expensive plans — high performance fail-safe error tracking, powerful grouping and aggregation to keep the noise low, flexible filters to let you slice and dice your errors, and above all, peace of mind knowing that you aren't in the dark about errors your users encounter.

Friday 30 March 2012

The Tale of an Unfindable JS Error

At Errorception, I genuinely care about making people's front-end code bug-free. Often this means that when people email me asking for help with bugs they are seeing, I wholeheartedly help them track down and fix the bug.

Today morning I woke up to an email from a user, asking about a bug that had occurred nearly 4,000 times in the last two days!

Hey Rakesh,

This just showed up yesterday, and I'm really stumped by it. It's only on Chrome 18/19, and it's put us way over the daily limit.

<link to the bug>

Just wondering if you seen it before, as I found one reference to it being caused by the Twitter widget (https://gist.github.com/1878283). I've grep'd every piece of JS looking for it, but haven't found one mention. Any ideas?

<snip>

I rubbed my eyes and got ready to start looking at the user's site, almost certain that he must've missed something. I mean, how could he have an error in his site and not have a source for the error? That just doesn't make sense.

Looking at the error report, this is what it said:


Uncaught TypeError: Cannot read property "originalCreateNotification" of undefined

It's an inline error on the page, not an external JS file. This just has to be something on the site, I thought to myself. Hit the page, viewed the source, and there was nothing there. That's weird I thought, because Chrome usually does a good job of reporting line numbers correctly.

As I was brushing my teeth, I got the feeling that I had heard about this error before. originalCreateNotification. Where had I heard about this before? Toothbrush still in my mouth, I did a quick Google search. The only sensible hits were the gist the user mentioned above, and a StackOverflow question where the poster was asking for help with his code. His code used a variable called originalCreateNotification. That's an interesting coincidence I thought to myself, but I was sure that that's not where I had heard about this bug. Then, I decided to search my mailbox for this bug, wondering if someone had emailed me about it before.

There I found it. Another user had once mailed me about this error, and that it had occurred 17,000 times within a few minutes, all from the same client! I recollected that we had tried to track down the bug for quite some time, but we couldn't. The error interestingly never happened again. We just let the matter be, hoping to never see the bug again.

Two different users facing the same bug! Both the cases were in Chrome, and on line 4! That's just too much of a coincidence, I thought. Something's going on here.

Twitter?

I jumped to the gist that we had found. It was authored by Pamela Fox. I have tremendous respect for Pamela, and I know that she has her own internal error reporting system. The gist is the code she uses to catch and post errors to the server. She has a list of errors she ignores because its unfixable, and originalCreateNotification is one of them. Clearly, I wasn't the first to find this bug. She has a comment there, simply saying // Twitter widget. So Twitter seems to be responsible for this problem. That also explains why two different users got the same error. Case closed, I thought.

Except, it didn't explain one little thing. How did Twitter generate so many errors in such a short time? It not unusual for files to be somehow unreachable at the client, so it's entirely possible that a small fraction of Twitter's widget users will face an error. However, that would just be an error or two. It can't be tens of thousands of errors. I decided to find out which Twitter widget this is, and find the bug in their code. So, I pulled up my user's page again to find the Twitter widget that he was using, so that I could start looking at Twitter's JS code. That's when the case became far more complex.

The user didn't have any Twitter widgets on the site!

Is there anybody out there?

Back to square one, I thought, as I prepared my green tea. By now, I was too deep into this. I wanted to find and solve the problem once and for all. Two of my users had already noticed this error, and it wasn't cool that I wasn't helping them fix it. How is it that no one else has mentioned the error on the web anywhere? Of course, one reason is that hardly anyone even tracks JS errors in the first place. It's a shame, I thought. Maybe a lot of people face this error and are completely unaware of it.

Back at my laptop, I decided to ssh into my server, and find any other occurrence of this error across all my error logs. Ran the query. 57 sites had seen this error occur at some time or the other! All of them with similar characteristics. They all occurred in short bursts, tens of thousands of times! And all of them were on Chrome, at line 4! I was onto something, but I wasn't sure what it was. I needed to get more information.

So, I took all the user-agent strings of the browsers where this error occurred, and started looking at them for patterns. The browser version numbers were too varied to extract any meaningful information. It seems to occur on all versions of Chrome, I thought. Not too useful. Dead end.

OS specific?

It can't be OS specific, can it? It didn't make sense for a bug in Chrome to be OS specific. That era of browser bugs is long gone, and I've never experienced such a thing with Chrome. In any case, I started mining the UA strings again, and found that it occurred on all versions of Mac and Windows. No dice. There wasn't a single instance of Linux, but that's probably just a coincidence, I thought. It may be because Linux has a lower market-share on desktops.

With no other lead, and almost ready to give up, I was staring at the Google search results page for originalCreateNotification. Just two sensible hits. Pamela's gist, and a StackOverflow question. I decided to read the StackOverflow question as I kept pondering what the cause of the error might be. The question read:

I'm attempting to override the default functionality for webkitNotifications.createNotification and via a Chrome extension I'm able to inject a script in the pages DOM that does so. […]

A Chrome Extension! Why didn't I think of that! I decided to find out more about the author of the question, and landed up on his site, where he was advertising a Chrome Extension. Clicked through, and I landed up on the Chrome Web Store.

This extension is meant to create more native Webkit notifications on Linux Operating Systems which use the notify-osd notification framework. Notify-OSD is installed on Ubuntu by default and used for all system notifications.

This has to be it! It explains why the bug only exists on Chrome, why it occurred at the same place at all times, why it was independent of the site or its contents, and more importantly why it never occurred on Linux! Time to do a couple of quick checks to test my theory.

Fortunately, the extension code is open-source. With my heart racing, I pulled up the repo, and started browsing it to find a mention of originalCreateNotification, and there it was, staring me in the face! It was clearly an extension designed for Linux, so it made sense that the author might not have tested it on other platforms. I decided to give it a spin. I installed it on my Mac, and BAM! My console filled up with Uncaught TypeError: Cannot read property "originalCreateNotification" of undefined, on line 4!

I don't think I've been happier seeing an error.

I've now filed a bug report, asking the author to do what he can to make the extension fail gracefully on Mac and Windows. Meanwhile, I've added originalCreateNotification to my blacklist, so that none of my users will ever see this problem again.

Another day, another bug squashed. Just another perk of working with Errorception. You are tracking JS errors, aren't you? If you aren't, start now!

Monday 13 February 2012

Slicing and Dicing Errors

The most up-voted feature request on our UserVoice Forum had to do with slicing the errors list to see only a sub-section of the errors. The use case was this: Let's say you deployed code yesterday that fixed a bunch of your bugs. Now, you are particularly interested in bugs that happened since yesterday, since it'll tell you if your fix did indeed work and if any new bugs have been introduced. However, Errorception didn't allow any such slicing of errors. Until now.

Now, in your error listing, you will see a little sidebar that will let you slice and dice your errors list based on time, browser, and any combination of the two.

JavaScript App

To make the filters possible, and to provide the best UX, Errorception's error listing page is now rather JS-heavy. What this means for you is that you get a highly responsive UI, and you can interact with the filters mentioned above without having to go through painful page reloads. But this JS-driven UI has another benefit, one that is just awesome in the long term.

Yo dawg, I herd you like Errorception

The most important change, is that now Errorception uses — wait for it — Errorception! I was never comfortable that I wasn't a heavy user of Errorception myself, but now that's fixed. I'll be actively using Errorception to find bugs in Errorception. Of course, this helps reduce errors. However, and more importantly, it helps me identify pain-points that everyone faces in the day-to-day use of Errorception. It ensures that the feature set of Errorception grows to have the right mix of the best features and ease of use.

It was exciting to see that since I've pushed this build out, I've already caught two errors in the new JS. I have already fixed and deployed new code to address these issues! That's exciting!

Wednesday 25 January 2012

Now, Using CDN-Power!

Using a CDN has been on my list for some time now, and I've finally gotten around to configuring one. I just changed the tracking snippet to use Amazon's CloudFront CDN, ensuring that the tracking snippet is edge-cached at over 25 locations world-wide, in North America, South America, Europe and Asia. This means that your users will get the least possible delay in loading Errorception's tracking code. Your users get blazing high performance by default! And boy, do we at Errorception love high performance. :)

This change is being rolled out all of Errorception's users, whether on the free trial or on any of the paid plans, at no extra cost.

Action required

Unfortunately, to use the CDN you will have to change your tracking snippet in your site. Log in, and go over to Settings > Tracking Snippet to get your new tracking snippet, and replace the old tracking snippet in your site with the new one. I know it's a chore — I apologize for the inconvenience. However, the benefits far outweigh the little trouble.

While the old tracking snippet will continue to work for some more time, I urge you to upgrade soon so that you can make the most of the CDN. At some point in the not-too-distant future, the old tracking snippet will not be supported anymore.

Tuesday 17 January 2012

Writing Quality Third-Party JS - Part 3: Planning for an API

In the first post in this series, I wrote about the fact that you don't own the page, and how that affects even little things in your code. In the second post in the series, I dived into a good deal of detail about how to bootstrap your code. In this third and final part of the series, I'll talk about how to make an API available to your users, and how to communicate with your server.

This is a three-part series of articles aimed at people who want to write JavaScript widgets (or other kinds of scripts) to make their application's data/services/UI available on other websites. This is the third in the series, discussing means to making an API available to your users, and communicating with your server.

So you want to provide an API

Why is this such a big deal? I mean, look at jQuery, or any library for that matter, right? You add a script tag to the page, and you have the API ready to use, right? How is this any different for a third-party script?

It's different depending on the way you choose to load your code. In Part 2: Loading Your Code, we went over why your code should be loaded asynchronously. We'll get back to that in a moment. Let's first consider the really bad case when your loading strategy requires that your code is loaded synchronously — Twitter's @Anywhere API, for example.

Again, to clarify, I'm sure Twitter has got their reasons to do what they do. I'm still going to call it bad because it is something you should not do.

So, from Twitter's examples:

This is really the simplest way to define an API. It's similar to how JS libraries, like jQuery, work. Add a script tag, and then use the library's code. Simple enough. Except, for reasons explained in Part 2, we should load our code in an asynchronous fashion.

Asynchronous loading

If your code is loaded in an asynchronous fashion, it means that the browser's parser will continue parsing and evaluating other script tags on the page before your own script is evaluated. This is great of course — it means that you have stepped out of the way when loading the page. Unfortunately, this also means that if someone attempts to access your API namespace, they might get errors, depending on whether your code has been downloaded and evaluated already or not. You now need a way of signaling to your API consumer that your code is ready to use. Facebook's documentation talks about how they do this, from the API consumer's perspective:

What Facebook requires you to do is to declare a global function called fbAsyncInit. When Facebook's JS SDK has loaded, it checks for the existence of this function in the global object (window), and then simply calls it if it exists. In fact, here are the relevant lines from the SDK that call this function:

The hasRun is irrelevant to us — it's an internal detail to Facebook. But note how they explictly check if the function exists, and if it does, they call it. I've removed an outer wrapper here for clarity — The code above is called in a setTimeout(0) to ensure that it's at the end of the execution stack. Chances are, you'd want to wait till the end of the execution stack too. You could either wait explicitly till the end, or fire it off as a setTimeout, like Facebook does.

To drive home the point, the flow works as follows:

  • Before starting to load the code, ask the user to define a global function that should be called when the third-party code has finished loading.
  • Start loading the third-party code asynchronously.
  • In the third-party code, when the code has finished loading, call the global function if it has been defined.

There are minor variations one can make on the pattern above. One I'd like to make is that of removing the need for a global function. I'd agree that it's a bit nit-picky to remove just one global variable, but take it for what it's worth.

The improvement is that fbAsyncInit has been replaced by FB.onReady. This removes the need for the global function definition. Also, when the FB's SDK loads, it will continue to use the FB namespace, so no more globals are created.

For APIs that are as complex as Facebook's, I think this is the best that can be done without adding more complexity. There are many more interesting things that can be done if you are willing to embrace say AMD support, but this is the very least required.

Other uses of having a predefined global

There are other uses of having a predefined global. For example, it could house all your initialization parameters. In FB's case, it might need the API key before making any API calls. This could be configured as members in the FB object, even before the API is actually loaded.

But Facebook is complex

Sometimes, APIs are not as rich as Facebook's. Facebook's API is rich, and allows for both reads and writes. Usually, widgets/snippets are much simpler than that.

Write-only APIs

This requires special mention, since the most frequently used third-party APIs are usually invisible, write-only APIs (ref: end of this post). Take Google Analytics for example. It only collects data from the page, and posts them to Google's servers. It doesn't read anything from Google's servers - it doesn't need to. The API is a classic write-only API. In such a case, initialization can be simplified drastically. In fact, this is what Errorception itself does too — admittedly blatantly copying the technique from GA.

If you follow this technique, you don't need a global onReady function to be defined, since it is immaterial to know when the code has loaded. All you need a queue that needs to be flushed to the server. This queue can be maintained as an array of ordered items. Since arrays already have a .push method on them, that is your API method! It's that simple!

So, both Google Analytics, and learning from it, Errorception, have a .push method on their only global variable (_gaq / _errs), because this global variable is essentially just a regular array! See how it's set up:

This doesn't stop you from doing what you'd expect a decent write API to do. For example, GA let's you do a bunch of configuration and record custom data, all using just the .push method.

In Errorception's case, we are recording JS errors, and despite the need to load our code late and asynchronously, errors must be caught as early as possible. So, we start populating this queue as early as possible. Our embed code itself does this, using the window.onerror event.

This way, errors are caught very early in the page lifecycle, without compromising performance one bit. Once our code has loaded, we simply start processing this queue to flush it to the server. Once the queue has been completely flushed for the first time, we simply redefine the global _errs object to now be an object (instead of an array), with just one .push method on it. So, all existing push calls will continue to work, and when push is called we can directly trigger internal code. This might break if someone has got a reference to the _errs object, and I decide to change the global. An alternative would be to leave the array untouched, and to poll the array to check for new members. Since polling just seems inefficient, and at the moment I don't have a public API anyway, I opted for redefining .push.

It's hard to read the minified code from Google Analytics, but it appears that Google does the exact same thing. They too seem to be redefining the global _gaq to add a .push that points to an internal method.

Communicating with your server

There are established ways to bypass the browser same-origin policy and communicate with remote servers across domains. Though usually considered to be hacks, there's no way around them in third-party scripts. Let's quickly round-up the most common techniques.

Make an image request with a query string

This is the technique Google Analytics uses. When the queue is flushed, the data is encoded into query strings, and an Image object is created, to load a image with the data passed in as a query string. Though the server responds with a simple enough 1x1 gif since it has to play well with the page, the query string data is recorded as what the client had to say.

Pros: Simple, non-obtrusive since the DOM is not affected. (The image need not be appended to the DOM.) Works everywhere.
Cons: Ideal only for client-to-server communication. Not the best way to have a two-way communication. Have to consider URL length limits. Can only use HTTP GETs.

JSON-P

You can alternatively create a JSON-P request. This is in essence similar to the image technique above, except that instead of creating a image object, we create a script tag. It comes with the down-side that we'll be creating script tags each time we want to tell the server something (and hence we'll have to aggressively clean up), but also has the upside that we have two-way communication since we can listen to what the server has to say.

Pros: Still simple. Two way communication, since the server can respond meaningfully. Works everywhere. Excellent when you want to read data from the server.
Cons: Still have to consider URL length limits. Only HTTP GETs. Requires DOM cleanup.

Posting in hidden iframes

This is the technique Errorception uses. In this method, we create a hidden iframe, post a form to that iframe, wait for the iframe to finish loading, then destroy the iframe. The data is sent to the server as POST parameters, but the response is not readable due to the domains not matching any more after the iframe has been POSTed.

Pros: Simple. HTTP semantics respected. Works everywhere. URL length limits don't apply.
Cons: Only client-to-server communication. Requires DOM cleanup.

CORS

Errorception will soon be moving to CORS while maintaining the iframes approach for backwards compatibility. The benefits over the iframe based approach for us is that there is no DOM cleanup required. Another obvious benefit is that you can read the response of the request as well, though this is not very critical for the fire-and-forget nature of write APIs like Errorception's.

Pros: Full control on HTTP semantics. No data size limits. No DOM alterations. Cons: Only works in newer browsers, hence must be supported by another method for the time being.

More elaborate hacks

Hacks can get pretty elaborate, of course. I had discussed on my personal blog a couple of years ago how we can use nested iframes to devise a ugly but workable method of establishing read-write communication. This was at a time when CORS wasn't around yet. Even today, you'd need this mechanism to deal with older browsers. This should be superseded by CORS now, though. Facebook still uses this as a fallback for older browsers, as do other read-write APIs like Google Calendar.

Wrapping up

This has been one massive article series! We've very quickly covered several topics to do with creating high quality third-party JS. This gives a decent birds-eye-view of the factors to consider, and possible solutions to problems, if you want to act like a well behaved citizen on the page. There's so much more details we can get into, but I have to draw the line somewhere. :)

Do let me know what you think in the comments. Have you come across other problems in writing your own third-party JS? What fixes did you use? I'll be only glad to know more.

Wednesday 11 January 2012

Writing Quality Third-Party JS - Part 2: Loading Your Code

In the previous post Writing Quality Third-Party JS - Part 1: The First Rule, I wrote about the fact that you don't own the page, and how that affects even little things in your code. With that background, this post focuses on how to load your code in the host page.

This is a three-part series of articles aimed at people who want to write JavaScript widgets (or other kinds of scripts) to make their application's data/services/UI available on other websites. This is the second in the series, tackling the topic of loading your code in the host page.

Bootstrapping

Remember how in my last post I kept harping on that you don't own the page? It applies here too. Two factors to consider are:

  • Can the network performance effects of downloading your code be eliminated completely?
  • If your servers ever go down, will it affect the host page at all?

What's wrong with a <script> tag?

I won't go into too much detail here since it has been written about in enough detail elsewhere, but just to quickly summarize: a script tag will block download of resources on your page AND will block the rendering of the page. It's what Steve Souders calls the Frontend Single Point of Failure. You don't want to be a point of failure.

It surprises me then, that Twitter not only requires you to include a vanilla script tag, but also that you should put it in the <head> of your page. This is despite it being common knowledge at the time of the launch of their @Anywhere APIs that this is a bad practice. In fact, on the contrary, they say in their docs:

As a best practice always place the anywhere.js file as close to the top of the page as possible.

Now, I don't mean to make them look bad — they are smart people and know what they are doing. What I'm saying is that I can't think of any reason to do this. They claim that it's for a better experience with OAuth 2.0 authentication, but I'm not convinced. I'd recommend that you do not use Twitter's model as an example for how to load your script. There are much better mechanisms available.

To be fair, even Google Analytics was doing something very similar until recently, but at least they asked for the script tag to be before the closing </body> tag. They've since deprecated this technique anyway.

Eliminating the performance hit

Realize that my first point above isn't about reducing the performance cost, but about eliminating it. Techniques for reducing the performance hit are already well known — caching and expiry, CDNs, etc. That said, how much ever you reduce the performance hit, there is still a performance hit anyway. How can this be eliminated?

async FTW?

HTML(5) introduced the async attribute, which instructs the browser's parser-renderer to load the file asynchronously, while continuing to render the page as usual. This is a life-saver, but unfortunately not good enough for use just yet.

Why isn't it good enough? Because it isn't supported in older browsers, including IE9 and lower. Considering that IE is a large part of the browser market, and as of right now IE10 isn't in lay-people's hands yet, you will need to do better than slap on an async attribute on your script tag. The async attribute is amazing, just not ready yet.

Dynamic script tag creation

Another way to load your code is to create a script tag dynamically. Nicholas Zakas has gone into details about this technique in a blog post, so I won't mention it all here. I'll just show his code here instead:

Seems simple enough. He uses DOM methods to create a script tag, and then appends it to the page. As Souders has explained, this technique causes the script to be downloaded immediately, but asynchronously. This is usually what you want. In fact, this is the approach Facebook takes to load their JavaScript SDK, though they append to the head instead of the body, which also has the same effect.

There's a minor improvement that has been common knowledge since some time now. Remember in my previous post I said that you cannot make any assumptions about the page? The snippets above assume that document.body or document.getElementsByTagName("head")[0] reliably exists for use. Turns out, that assumption might be wrong. So, the minor improvement is to not depend on document.body or the head. Instead, only assume that at least one script tag exists on the page. This assumption is always right, since that's how your code is running in the first place. So, instead of appending to document.body, do what Google does:

The code inside the self-executing anonymous function immediately invoked function (hat-tip) creates the script tag, and then appends it as a sibling of the first script tag on the page. It doesn't matter which the first script tag is — we rely on the fact that a script tag surely exists on the page. We also rely on the fact that the script tag has a parentNode. Turns out, this is a safe assumption to make.

Both these scripts set script.async = true;. I'm unsure why they do this. Firefox's documentation implies that this is not necessary in Firefox, IE, or Webkit:

In Firefox 4.0, the async DOM property defaults to true for script-created scripts, so the default behavior matches the behavior of IE and WebKit.

I guess there's no harm done in setting the async flag to true, so everyone just does it. Either that, or I simply don't know. As pointed by Steve Souders in a comment below, async=true is required for some edge-case browsers, including Firefox 3.0.

But when does the download happen?

All browsers that implement async start the download immediately and asynchronously. This means that your script will be parsed and executed at some time in the future. The question I had at this point was: At what point in the page load cycle does my script get evaluated? Does onload get blocked? This addresses my second point at the top of the post. There might be situations when my script is unreachable due to either server or network problems, and I didn't want Errorception to affect any other scripts on the page in such a situation.

So, I ran quick tests to find the answer, and unsurprisingly the answer is you can't be sure. Just today, Steve Souders published a browser-scope test page that tests just this behavior. It looks like older implementations of async were indeed holding back page load events. This seems to be getting phased out in newer versions of browsers. If I were writing the tracking snippet today, I would probably have used the technique mentioned above.

Instead, I decided to do slightly better than these techniques while coding the Errorception tracking script. I decided that I'll explicitly wait for the page onload to fire first, and only then include the Errorception tracking script. The tracking snippet in Errorception's case looks as follows:

The snippet above completely gets out of the way during the page load process, hence ensuring that we do not depend on browser behavior of event firing sequences when using the async attribute. It has an additional minor benefit that the end user's bandwidth is completely available for loading up the site's intended resources first before loading the Errorception tracking code. Effectively, it prioritizes the site's code and content over Errorception's own code. This fits in perfectly with Errorception's philosophy of not affecting page load time at all. Depending on the type of connection the end-user has, this bandwidth-prioritization technique might have merits.

I'll be the first to admit that what Errorception is doing might not be for everyone. Take Facebook for example. If you are including their API, chances are, you want to do something with it. Delaying the loading will only deteriorate the end-user experience. However, in Errorception's case, there's no such interaction that will ever block, so this works out fine.

I initially thought I'll discuss about APIs as well in this post, but the post has seemed to run too long already. I'll save that bit for the next post.

Errorception is a ridiculously easy-to-use JavaScript error tracking system, designed for high performance and high reliability. There's a free trial so you can give it a spin. Find out more »

In Part 3…

In the next post in the series, I'll discuss mechanisms to signal the availability of your APIs to your developers considering that your code is downloaded at some arbitrary time in the future, and methods of communicating with your server by bypassing the same-origin policy.

Monday 9 January 2012

Writing Quality Third-Party JS - Part 1: The First Rule

It's fascinating how JavaScript has quickly become to de-facto mechanism to deliver third-party integrations that are easily pluggable into people's websites. Services like Facebook, Twitter and Disqus, programmable widgets like Google Maps, and even invisible scripts like Google Analytics, KissMetrics and our own Errorception give you ways to integrate their services with your website.

You'd think that with these mechanisms becoming so popular, there'd be a good deal of information available about how to build great third-party integrations in JavaScript. Turns out, the information available on the Interwebs is actually rather sparse. This series of posts aims to add to that repository of knowledge, based on my experience building Errorception.

This is a three-part series of articles aimed at people who want to write JavaScript widgets (or other kinds of scripts) to make their application's data/services/UI available on other websites. This is the first in the series, highlighting important considerations.

The First Rule

The First Rule of Third-Party JavaScript is... man, this will never sound as epic as Tyler Durden. Anyway, here's the first rule, and the most important consideration:

The Rule: You DO NOT Own The Page

Understanding and assimilating this rule gives you two principle considerations when designing your script.

  • The impact of adding your script should be minimal. Preferably none.
  • You cannot make any assumptions about how the page is coded.

Let's start with the first point. How can you make the impact of your script minimal? A couple of considerations come to mind.

No globals

You should ideally have no global variables in your code. Making sure this happens is rather simple. Firstly, ensure that you enclose your code in a self-executing anonymous function. Secondly, pass your code through a good lint tool to ensure that you've not used any undeclared variables, since undeclared variables will cause implicit global variables. This is also regarded as a general best-practice for JS development, and there's absolutely no reason you shouldn't adhere to it.

A typical self-executing anonymous function looks as follows, in its most minimal form:

To do this slightly better, I recommend the following form instead:

In the pattern above, the window and document objects — both rather frequently used — become local-scope variables. Local variables are usually reduced to one or two letter variable names when passed through the most popular JS minifiers, so the size of your code reduces somewhat by using this pattern. I'll come back to the undefined variable in just a bit. Once minified, your code will look something like the following (using closure-compiler in this case). Notice how window and document have been reduced to single letter variable names.

Wait, what? No globals?

Ok, there are some cases when a global variable is absolutely necessary. Since linkage in JS happens through global variables, it might be necessary to add a global variable just to provide an API namespace to your users. Here's what the most popular third-party snippets do:

  • Google Analytics exposes a _gaq variable (docs) with one method on it — push.
  • Facebook exposes a FB variable (docs), which is the namespace for their API. (Using FB also requires you to define a global fbAsyncInit function, which could've been avoided. I guess they're doing what they do for ease of use, even though it's against best practice.)
  • Twitter @Anywhere exposes a twttr variable (docs), which like FB is their API's container namespace.
  • For completeness, Errorception exposes a _errs variable. We currently do not have an API (coming soon!), but this has been left as a placeholder.

Take a moment to think about those variable names. They are pretty unique to the service-provider, and will usually not conflict. The possibility of conflict cannot be completely avoided, but can be reduced significantly by picking one that's unique to your service. So, exporting a global of $, $$ or _ is just a horrible idea, however easy-to-type it may seem.

No modifications to shared objects

Several objects are shared in the JS runtime — even more than you might imagine at first. For example, the DOM is shared, but then so are the String and Number objects. Do not modify these objects either by altering their instances directly or by modifying their prototypes. That's simply not cool.

The only cautious exception to this rule might be if you need to polyfill browser methods. In all honesty, I would avoid using polyfills as much as possible in my third-party code and I don't recommend it at all, but you could do this if you are feeling adventurous. The reason I wouldn't recommend it is two-fold:

  • The code on the page may make assumptions about browser capabilities based on browser detection rather than feature detection. (Remember, even jQuery removed browser detection only recently, and many popular libraries still do a good deal of browser detection.)
  • The code on the page might do object enumeration for...in instead of array iteration for(var i=0;i<len;i++) for iterating through arrays. Even when enumerating object properties (which is a legit case of for...in), the code on the page might not use hasOwnProperty.

Either of these will break the code on the host page. You don't want code on the host page to break just because they've added your script.

No DOM modifications

Just like you don't own the global namespace, you don't own the DOM either. Making any changes to the DOM is simply unacceptable. Do not add properties or attributes to the DOM, and do not add or remove elements in the DOM. Your widget is never important enough to add extra nodes or attributes to any element of the DOM. This is because code on the page might be too tightly dependent on the DOM being one way, and if you modify it their code is likely to break.

That said, there are two permissible cases when you can modify the DOM. The first is when you are expected to present a UI, and the second is when you need to communicate with your server and circumvent the same-origin policy of the browser. In the first case, make the contract explicit by asking for a DOM node within which you will make the modifications, so that the developer of the page knows what to expect. I'll address the second case in detail in the third post in this series.

Make no assumptions about the page

This is the most complex to ensure. Even Google Analytics has had issues with their tracking script because of assumptions they inadvertently made. Steve Souders has enumerated some on his blog. So, for example, you can't even assume that the DOM will have a <head> node even after parsing the HTML! jQuery also had some bugs in their dynamic loader due to assumptions they inadvertently made. It seems that the only real thing you can rely on is that your script tag is on the page (and hence in the DOM). Nothing else should be taken for granted.

Unfortunately, I don't have a clean solution for this problem. The only solution seems to be that you should keep your dependencies on the DOM and to native object to a bare minimum, and test in every environment you can lay your hands on. In Errorception, I test on way more browsers than I'd care about, even including old browser versions and mobile phones. It's the only real way to be sure. I have to do this irrespective of whether I support the browser or not, because it might be perfectly supported by the developers of the page.

undefined redefined

A slightly less scary but equally dangerous problem is that undefined is not a keyword or literal in JavaScript. It really should have been. Since it's not a keyword, it's possible for someone to create a global variable called undefined, and that can mess with your script. If you really need to use undefined, you should ensure that you have a clean, unassigned undefined for your use. There are several ways to make sure you are working with a clean undefined. The anonymous function I've shown above implements one of these mechanisms, such that undefined inside the function is the undefined you expect.

Trust

Before closing this post, I want to touch upon the issue of trust. By adding your script to a page, the developer is knowingly or unknowingly placing a lot of trust in you. I completely dislike this, but unfortunately JavaScript has no built in mechanism to reduce the problems of trust. Several projects exist to reduce the possibilities of vulnerabilities, but they seem too heavy to use. Douglas Crockford has been trying to educate people about the issues, but it seems to be mostly falling on deaf ears.

One post in particular is relevant here: an excellent post by Philip Tellis titled "How much do you trust third-party widgets?". It's a must read to get a gist of the issues surrounding trust in third-party widgets, along with some high-level solutions.

In Part 2…

In the next installment of this article series, I'll talk about strategies to load and bootstrap your code in the page, without causing a performance hit. I'll also be touching upon how, at Errorception, we mitigate the risk of the service going down, if ever — an approach that's definitely not unique to Errorception, but isn't as abundantly used as you think.

Sunday 8 January 2012

A Host of New Features

Just rolled out a new build - the first build of the new year!

Improved duplicate detection

Some time ago I had rolled out the "Mark as Duplicate" feature, which inspected your errors and automatically tried to find if they are duplicates of each other. Today that algorithm underwent a significant improvement, and now does a much better job of finding duplicates. This has been applied to all the existing errors as well, bringing down error count by a massive 93%!

What this means for you is that the number of errors you need to worry about has come down significantly. Like I say, fewer bugs = happy developer. :)

"Mark as Duplicate" no more!

Even though the Mark as Duplicate feature was nice and was very helpful in bringing down bug count, it always felt very clunky and legacy-like. The idea of different bugs reports for the same bug, and links flying between them just felt wrong. It doesn't help productivity if you spend a lot of time just navigating the bug reporting system. It sorely needed improvement.

But why improve it if you can drop the concept completely! Now, all of the duplicates are merged into one bug, while still preserving all of the details that each individual bug had. This gives you details about which browser the error occurred in, which versions of the browser it occurred in, which pages the error was on, and how many times it has occurred in the past, all in one page. All this done while keeping the report pretty. I feel that this is a huge productivity win. Do let me know what you think.

Graphs integrated right into error reports

The graphs feature was an instant hit from when it was launched. It was clear that graphically visualizing information is much more pleasant as compared to loads of text. So, I've taken the idea of graphs to the next level — displaying them within the error report itself. See the following screenshot for an example.

Inline scripts

Inline scripts' detection has been added. This tells you where the breaking script was located — whether it was in an external script file, or it was in an inline script tag on the page. This helps a great deal in debugging the the error, as you know exactly where to look.

Inline scripts usually tend to exist on several pages — for example if they are generated by Rails helpers. We now smartly detect if there are such pages on which the same error has occurred in the same script, and designate them as duplicates of each other. Previously this wasn't a factor that was taken into account when marking bugs as duplicates. This has been the single largest contributor towards reducing total bug count.

When did the error occur?

One critical piece of information for a JavaScript developer is when in the page cycle did the error occur. Errors that happened during bootstrapping (ie. before page load) are very different from errors that happened after page load. This information was never before available, and has now been added to your error logs. This should simplify debugging significantly.

And much more…

That's just a small set of the new features rolled out. There have been improvements all over the place, right from the error listing to the notification emails. There has also been a massive performance boost — previously the report generation slowed down as the number of errors grew. Support for Opera and iOS has been improved since they started supporting window.onerror. However, probably the most important improvements have been in areas that are invisible — this build gives us immense flexibility at the logging and data-model end, so that additional features can be built with amazing ease and at a high pace.

Do let me know what you think of the new features in the comments or over email. Feedback is always welcome. Wish you a great year ahead!