Friday 30 March 2012

The Tale of an Unfindable JS Error

At Errorception, I genuinely care about making people's front-end code bug-free. Often this means that when people email me asking for help with bugs they are seeing, I wholeheartedly help them track down and fix the bug.

Today morning I woke up to an email from a user, asking about a bug that had occurred nearly 4,000 times in the last two days!

Hey Rakesh,

This just showed up yesterday, and I'm really stumped by it. It's only on Chrome 18/19, and it's put us way over the daily limit.

<link to the bug>

Just wondering if you seen it before, as I found one reference to it being caused by the Twitter widget (https://gist.github.com/1878283). I've grep'd every piece of JS looking for it, but haven't found one mention. Any ideas?

<snip>

I rubbed my eyes and got ready to start looking at the user's site, almost certain that he must've missed something. I mean, how could he have an error in his site and not have a source for the error? That just doesn't make sense.

Looking at the error report, this is what it said:


Uncaught TypeError: Cannot read property "originalCreateNotification" of undefined

It's an inline error on the page, not an external JS file. This just has to be something on the site, I thought to myself. Hit the page, viewed the source, and there was nothing there. That's weird I thought, because Chrome usually does a good job of reporting line numbers correctly.

As I was brushing my teeth, I got the feeling that I had heard about this error before. originalCreateNotification. Where had I heard about this before? Toothbrush still in my mouth, I did a quick Google search. The only sensible hits were the gist the user mentioned above, and a StackOverflow question where the poster was asking for help with his code. His code used a variable called originalCreateNotification. That's an interesting coincidence I thought to myself, but I was sure that that's not where I had heard about this bug. Then, I decided to search my mailbox for this bug, wondering if someone had emailed me about it before.

There I found it. Another user had once mailed me about this error, and that it had occurred 17,000 times within a few minutes, all from the same client! I recollected that we had tried to track down the bug for quite some time, but we couldn't. The error interestingly never happened again. We just let the matter be, hoping to never see the bug again.

Two different users facing the same bug! Both the cases were in Chrome, and on line 4! That's just too much of a coincidence, I thought. Something's going on here.

Twitter?

I jumped to the gist that we had found. It was authored by Pamela Fox. I have tremendous respect for Pamela, and I know that she has her own internal error reporting system. The gist is the code she uses to catch and post errors to the server. She has a list of errors she ignores because its unfixable, and originalCreateNotification is one of them. Clearly, I wasn't the first to find this bug. She has a comment there, simply saying // Twitter widget. So Twitter seems to be responsible for this problem. That also explains why two different users got the same error. Case closed, I thought.

Except, it didn't explain one little thing. How did Twitter generate so many errors in such a short time? It not unusual for files to be somehow unreachable at the client, so it's entirely possible that a small fraction of Twitter's widget users will face an error. However, that would just be an error or two. It can't be tens of thousands of errors. I decided to find out which Twitter widget this is, and find the bug in their code. So, I pulled up my user's page again to find the Twitter widget that he was using, so that I could start looking at Twitter's JS code. That's when the case became far more complex.

The user didn't have any Twitter widgets on the site!

Is there anybody out there?

Back to square one, I thought, as I prepared my green tea. By now, I was too deep into this. I wanted to find and solve the problem once and for all. Two of my users had already noticed this error, and it wasn't cool that I wasn't helping them fix it. How is it that no one else has mentioned the error on the web anywhere? Of course, one reason is that hardly anyone even tracks JS errors in the first place. It's a shame, I thought. Maybe a lot of people face this error and are completely unaware of it.

Back at my laptop, I decided to ssh into my server, and find any other occurrence of this error across all my error logs. Ran the query. 57 sites had seen this error occur at some time or the other! All of them with similar characteristics. They all occurred in short bursts, tens of thousands of times! And all of them were on Chrome, at line 4! I was onto something, but I wasn't sure what it was. I needed to get more information.

So, I took all the user-agent strings of the browsers where this error occurred, and started looking at them for patterns. The browser version numbers were too varied to extract any meaningful information. It seems to occur on all versions of Chrome, I thought. Not too useful. Dead end.

OS specific?

It can't be OS specific, can it? It didn't make sense for a bug in Chrome to be OS specific. That era of browser bugs is long gone, and I've never experienced such a thing with Chrome. In any case, I started mining the UA strings again, and found that it occurred on all versions of Mac and Windows. No dice. There wasn't a single instance of Linux, but that's probably just a coincidence, I thought. It may be because Linux has a lower market-share on desktops.

With no other lead, and almost ready to give up, I was staring at the Google search results page for originalCreateNotification. Just two sensible hits. Pamela's gist, and a StackOverflow question. I decided to read the StackOverflow question as I kept pondering what the cause of the error might be. The question read:

I'm attempting to override the default functionality for webkitNotifications.createNotification and via a Chrome extension I'm able to inject a script in the pages DOM that does so. […]

A Chrome Extension! Why didn't I think of that! I decided to find out more about the author of the question, and landed up on his site, where he was advertising a Chrome Extension. Clicked through, and I landed up on the Chrome Web Store.

This extension is meant to create more native Webkit notifications on Linux Operating Systems which use the notify-osd notification framework. Notify-OSD is installed on Ubuntu by default and used for all system notifications.

This has to be it! It explains why the bug only exists on Chrome, why it occurred at the same place at all times, why it was independent of the site or its contents, and more importantly why it never occurred on Linux! Time to do a couple of quick checks to test my theory.

Fortunately, the extension code is open-source. With my heart racing, I pulled up the repo, and started browsing it to find a mention of originalCreateNotification, and there it was, staring me in the face! It was clearly an extension designed for Linux, so it made sense that the author might not have tested it on other platforms. I decided to give it a spin. I installed it on my Mac, and BAM! My console filled up with Uncaught TypeError: Cannot read property "originalCreateNotification" of undefined, on line 4!

I don't think I've been happier seeing an error.

I've now filed a bug report, asking the author to do what he can to make the extension fail gracefully on Mac and Windows. Meanwhile, I've added originalCreateNotification to my blacklist, so that none of my users will ever see this problem again.

Another day, another bug squashed. Just another perk of working with Errorception. You are tracking JS errors, aren't you? If you aren't, start now!

9 comments:

  1. Good going! That's one fine example of the extra effort you make to keep your users happy. Just hope that you don't get flooded with JS help requests :)

    You might just have to deal with undefined variable and missing ; } syntax errors coming your way in the coming days.

    ReplyDelete
  2. Thanks for the kind words, Dayson.

    Dealing with requests to understand errors hasn't been a problem so far. Most of the errors carry enough information for those who are familiar with their code, so it's usually obvious what needs to be done. It only stumps people when there's an error for which the source itself is unclear. Usually people only hit me up after they have tried everything else.

    Regardless, it's always a pleasure to be able to help people, isn't it?

    ReplyDelete
  3. Thats just typically you. Getting to the core of the problem. Anyways...I thought i was the last person to look at LOGs.

    Keep doing.. what you love to do.
    John

    ReplyDelete
  4. Hello Rakesh,

    I just wanted to thank you for your finding as now I can finally blacklist this error in my internal js-bug tracking system I use for apps I'm working on. Thanks!

    ReplyDelete
  5. Now when I google this error I get your blog and don't have to retrace your steps. Thanks for your sleuthing work!

    ReplyDelete
  6. A brilliant bit of work, well done. I just wanted to add something that's mentioned in the bug report but not here. This bug also appears in the GrowlChrome plugin, which I was using.

    ReplyDelete
  7. Do you publish your blacklist?

    ReplyDelete
    Replies
    1. Hi Jill,

      No, the blacklist isn't published, but not for any particular reason. I could just list them all here:

      1. If the error is from Firefox and the error message says "Error loading script", it's ignored.
      2. If the error message contains "originalCreateNotification" (as in this blog post), it's ignored.
      3. If the error message contains "atomicFindClose", it's ignored. It's due to a iPad browser called Atomic Browser.
      4. If the script URL contains jid1-ejhbjdxm9zr4tq, it's ignored. This is due to a plugin called Powerword Dictionary.
      5. If the script URL is "miscellaneous_bindings", it's ignored. Haven't been able to get to the bottom of this, but it has to do with Chrome extensions.

      Hope this helps.

      Delete
    2. This is an interesting topic. Here's my list of "odd" exceptions: https://github.com/tlk/window.onerror-blacklist/blob/master/blacklist.txt

      Delete