Friday 30 March 2012

The Tale of an Unfindable JS Error

At Errorception, I genuinely care about making people's front-end code bug-free. Often this means that when people email me asking for help with bugs they are seeing, I wholeheartedly help them track down and fix the bug.

Today morning I woke up to an email from a user, asking about a bug that had occurred nearly 4,000 times in the last two days!

Hey Rakesh,

This just showed up yesterday, and I'm really stumped by it. It's only on Chrome 18/19, and it's put us way over the daily limit.

<link to the bug>

Just wondering if you seen it before, as I found one reference to it being caused by the Twitter widget (https://gist.github.com/1878283). I've grep'd every piece of JS looking for it, but haven't found one mention. Any ideas?

<snip>

I rubbed my eyes and got ready to start looking at the user's site, almost certain that he must've missed something. I mean, how could he have an error in his site and not have a source for the error? That just doesn't make sense.

Looking at the error report, this is what it said:


Uncaught TypeError: Cannot read property "originalCreateNotification" of undefined

It's an inline error on the page, not an external JS file. This just has to be something on the site, I thought to myself. Hit the page, viewed the source, and there was nothing there. That's weird I thought, because Chrome usually does a good job of reporting line numbers correctly.

As I was brushing my teeth, I got the feeling that I had heard about this error before. originalCreateNotification. Where had I heard about this before? Toothbrush still in my mouth, I did a quick Google search. The only sensible hits were the gist the user mentioned above, and a StackOverflow question where the poster was asking for help with his code. His code used a variable called originalCreateNotification. That's an interesting coincidence I thought to myself, but I was sure that that's not where I had heard about this bug. Then, I decided to search my mailbox for this bug, wondering if someone had emailed me about it before.

There I found it. Another user had once mailed me about this error, and that it had occurred 17,000 times within a few minutes, all from the same client! I recollected that we had tried to track down the bug for quite some time, but we couldn't. The error interestingly never happened again. We just let the matter be, hoping to never see the bug again.

Two different users facing the same bug! Both the cases were in Chrome, and on line 4! That's just too much of a coincidence, I thought. Something's going on here.

Twitter?

I jumped to the gist that we had found. It was authored by Pamela Fox. I have tremendous respect for Pamela, and I know that she has her own internal error reporting system. The gist is the code she uses to catch and post errors to the server. She has a list of errors she ignores because its unfixable, and originalCreateNotification is one of them. Clearly, I wasn't the first to find this bug. She has a comment there, simply saying // Twitter widget. So Twitter seems to be responsible for this problem. That also explains why two different users got the same error. Case closed, I thought.

Except, it didn't explain one little thing. How did Twitter generate so many errors in such a short time? It not unusual for files to be somehow unreachable at the client, so it's entirely possible that a small fraction of Twitter's widget users will face an error. However, that would just be an error or two. It can't be tens of thousands of errors. I decided to find out which Twitter widget this is, and find the bug in their code. So, I pulled up my user's page again to find the Twitter widget that he was using, so that I could start looking at Twitter's JS code. That's when the case became far more complex.

The user didn't have any Twitter widgets on the site!

Is there anybody out there?

Back to square one, I thought, as I prepared my green tea. By now, I was too deep into this. I wanted to find and solve the problem once and for all. Two of my users had already noticed this error, and it wasn't cool that I wasn't helping them fix it. How is it that no one else has mentioned the error on the web anywhere? Of course, one reason is that hardly anyone even tracks JS errors in the first place. It's a shame, I thought. Maybe a lot of people face this error and are completely unaware of it.

Back at my laptop, I decided to ssh into my server, and find any other occurrence of this error across all my error logs. Ran the query. 57 sites had seen this error occur at some time or the other! All of them with similar characteristics. They all occurred in short bursts, tens of thousands of times! And all of them were on Chrome, at line 4! I was onto something, but I wasn't sure what it was. I needed to get more information.

So, I took all the user-agent strings of the browsers where this error occurred, and started looking at them for patterns. The browser version numbers were too varied to extract any meaningful information. It seems to occur on all versions of Chrome, I thought. Not too useful. Dead end.

OS specific?

It can't be OS specific, can it? It didn't make sense for a bug in Chrome to be OS specific. That era of browser bugs is long gone, and I've never experienced such a thing with Chrome. In any case, I started mining the UA strings again, and found that it occurred on all versions of Mac and Windows. No dice. There wasn't a single instance of Linux, but that's probably just a coincidence, I thought. It may be because Linux has a lower market-share on desktops.

With no other lead, and almost ready to give up, I was staring at the Google search results page for originalCreateNotification. Just two sensible hits. Pamela's gist, and a StackOverflow question. I decided to read the StackOverflow question as I kept pondering what the cause of the error might be. The question read:

I'm attempting to override the default functionality for webkitNotifications.createNotification and via a Chrome extension I'm able to inject a script in the pages DOM that does so. […]

A Chrome Extension! Why didn't I think of that! I decided to find out more about the author of the question, and landed up on his site, where he was advertising a Chrome Extension. Clicked through, and I landed up on the Chrome Web Store.

This extension is meant to create more native Webkit notifications on Linux Operating Systems which use the notify-osd notification framework. Notify-OSD is installed on Ubuntu by default and used for all system notifications.

This has to be it! It explains why the bug only exists on Chrome, why it occurred at the same place at all times, why it was independent of the site or its contents, and more importantly why it never occurred on Linux! Time to do a couple of quick checks to test my theory.

Fortunately, the extension code is open-source. With my heart racing, I pulled up the repo, and started browsing it to find a mention of originalCreateNotification, and there it was, staring me in the face! It was clearly an extension designed for Linux, so it made sense that the author might not have tested it on other platforms. I decided to give it a spin. I installed it on my Mac, and BAM! My console filled up with Uncaught TypeError: Cannot read property "originalCreateNotification" of undefined, on line 4!

I don't think I've been happier seeing an error.

I've now filed a bug report, asking the author to do what he can to make the extension fail gracefully on Mac and Windows. Meanwhile, I've added originalCreateNotification to my blacklist, so that none of my users will ever see this problem again.

Another day, another bug squashed. Just another perk of working with Errorception. You are tracking JS errors, aren't you? If you aren't, start now!