Tuesday 17 January 2012

Writing Quality Third-Party JS - Part 3: Planning for an API

In the first post in this series, I wrote about the fact that you don't own the page, and how that affects even little things in your code. In the second post in the series, I dived into a good deal of detail about how to bootstrap your code. In this third and final part of the series, I'll talk about how to make an API available to your users, and how to communicate with your server.

This is a three-part series of articles aimed at people who want to write JavaScript widgets (or other kinds of scripts) to make their application's data/services/UI available on other websites. This is the third in the series, discussing means to making an API available to your users, and communicating with your server.

So you want to provide an API

Why is this such a big deal? I mean, look at jQuery, or any library for that matter, right? You add a script tag to the page, and you have the API ready to use, right? How is this any different for a third-party script?

It's different depending on the way you choose to load your code. In Part 2: Loading Your Code, we went over why your code should be loaded asynchronously. We'll get back to that in a moment. Let's first consider the really bad case when your loading strategy requires that your code is loaded synchronously — Twitter's @Anywhere API, for example.

Again, to clarify, I'm sure Twitter has got their reasons to do what they do. I'm still going to call it bad because it is something you should not do.

So, from Twitter's examples:

This is really the simplest way to define an API. It's similar to how JS libraries, like jQuery, work. Add a script tag, and then use the library's code. Simple enough. Except, for reasons explained in Part 2, we should load our code in an asynchronous fashion.

Asynchronous loading

If your code is loaded in an asynchronous fashion, it means that the browser's parser will continue parsing and evaluating other script tags on the page before your own script is evaluated. This is great of course — it means that you have stepped out of the way when loading the page. Unfortunately, this also means that if someone attempts to access your API namespace, they might get errors, depending on whether your code has been downloaded and evaluated already or not. You now need a way of signaling to your API consumer that your code is ready to use. Facebook's documentation talks about how they do this, from the API consumer's perspective:

What Facebook requires you to do is to declare a global function called fbAsyncInit. When Facebook's JS SDK has loaded, it checks for the existence of this function in the global object (window), and then simply calls it if it exists. In fact, here are the relevant lines from the SDK that call this function:

The hasRun is irrelevant to us — it's an internal detail to Facebook. But note how they explictly check if the function exists, and if it does, they call it. I've removed an outer wrapper here for clarity — The code above is called in a setTimeout(0) to ensure that it's at the end of the execution stack. Chances are, you'd want to wait till the end of the execution stack too. You could either wait explicitly till the end, or fire it off as a setTimeout, like Facebook does.

To drive home the point, the flow works as follows:

  • Before starting to load the code, ask the user to define a global function that should be called when the third-party code has finished loading.
  • Start loading the third-party code asynchronously.
  • In the third-party code, when the code has finished loading, call the global function if it has been defined.

There are minor variations one can make on the pattern above. One I'd like to make is that of removing the need for a global function. I'd agree that it's a bit nit-picky to remove just one global variable, but take it for what it's worth.

The improvement is that fbAsyncInit has been replaced by FB.onReady. This removes the need for the global function definition. Also, when the FB's SDK loads, it will continue to use the FB namespace, so no more globals are created.

For APIs that are as complex as Facebook's, I think this is the best that can be done without adding more complexity. There are many more interesting things that can be done if you are willing to embrace say AMD support, but this is the very least required.

Other uses of having a predefined global

There are other uses of having a predefined global. For example, it could house all your initialization parameters. In FB's case, it might need the API key before making any API calls. This could be configured as members in the FB object, even before the API is actually loaded.

But Facebook is complex

Sometimes, APIs are not as rich as Facebook's. Facebook's API is rich, and allows for both reads and writes. Usually, widgets/snippets are much simpler than that.

Write-only APIs

This requires special mention, since the most frequently used third-party APIs are usually invisible, write-only APIs (ref: end of this post). Take Google Analytics for example. It only collects data from the page, and posts them to Google's servers. It doesn't read anything from Google's servers - it doesn't need to. The API is a classic write-only API. In such a case, initialization can be simplified drastically. In fact, this is what Errorception itself does too — admittedly blatantly copying the technique from GA.

If you follow this technique, you don't need a global onReady function to be defined, since it is immaterial to know when the code has loaded. All you need a queue that needs to be flushed to the server. This queue can be maintained as an array of ordered items. Since arrays already have a .push method on them, that is your API method! It's that simple!

So, both Google Analytics, and learning from it, Errorception, have a .push method on their only global variable (_gaq / _errs), because this global variable is essentially just a regular array! See how it's set up:

This doesn't stop you from doing what you'd expect a decent write API to do. For example, GA let's you do a bunch of configuration and record custom data, all using just the .push method.

In Errorception's case, we are recording JS errors, and despite the need to load our code late and asynchronously, errors must be caught as early as possible. So, we start populating this queue as early as possible. Our embed code itself does this, using the window.onerror event.

This way, errors are caught very early in the page lifecycle, without compromising performance one bit. Once our code has loaded, we simply start processing this queue to flush it to the server. Once the queue has been completely flushed for the first time, we simply redefine the global _errs object to now be an object (instead of an array), with just one .push method on it. So, all existing push calls will continue to work, and when push is called we can directly trigger internal code. This might break if someone has got a reference to the _errs object, and I decide to change the global. An alternative would be to leave the array untouched, and to poll the array to check for new members. Since polling just seems inefficient, and at the moment I don't have a public API anyway, I opted for redefining .push.

It's hard to read the minified code from Google Analytics, but it appears that Google does the exact same thing. They too seem to be redefining the global _gaq to add a .push that points to an internal method.

Communicating with your server

There are established ways to bypass the browser same-origin policy and communicate with remote servers across domains. Though usually considered to be hacks, there's no way around them in third-party scripts. Let's quickly round-up the most common techniques.

Make an image request with a query string

This is the technique Google Analytics uses. When the queue is flushed, the data is encoded into query strings, and an Image object is created, to load a image with the data passed in as a query string. Though the server responds with a simple enough 1x1 gif since it has to play well with the page, the query string data is recorded as what the client had to say.

Pros: Simple, non-obtrusive since the DOM is not affected. (The image need not be appended to the DOM.) Works everywhere.
Cons: Ideal only for client-to-server communication. Not the best way to have a two-way communication. Have to consider URL length limits. Can only use HTTP GETs.

JSON-P

You can alternatively create a JSON-P request. This is in essence similar to the image technique above, except that instead of creating a image object, we create a script tag. It comes with the down-side that we'll be creating script tags each time we want to tell the server something (and hence we'll have to aggressively clean up), but also has the upside that we have two-way communication since we can listen to what the server has to say.

Pros: Still simple. Two way communication, since the server can respond meaningfully. Works everywhere. Excellent when you want to read data from the server.
Cons: Still have to consider URL length limits. Only HTTP GETs. Requires DOM cleanup.

Posting in hidden iframes

This is the technique Errorception uses. In this method, we create a hidden iframe, post a form to that iframe, wait for the iframe to finish loading, then destroy the iframe. The data is sent to the server as POST parameters, but the response is not readable due to the domains not matching any more after the iframe has been POSTed.

Pros: Simple. HTTP semantics respected. Works everywhere. URL length limits don't apply.
Cons: Only client-to-server communication. Requires DOM cleanup.

CORS

Errorception will soon be moving to CORS while maintaining the iframes approach for backwards compatibility. The benefits over the iframe based approach for us is that there is no DOM cleanup required. Another obvious benefit is that you can read the response of the request as well, though this is not very critical for the fire-and-forget nature of write APIs like Errorception's.

Pros: Full control on HTTP semantics. No data size limits. No DOM alterations. Cons: Only works in newer browsers, hence must be supported by another method for the time being.

More elaborate hacks

Hacks can get pretty elaborate, of course. I had discussed on my personal blog a couple of years ago how we can use nested iframes to devise a ugly but workable method of establishing read-write communication. This was at a time when CORS wasn't around yet. Even today, you'd need this mechanism to deal with older browsers. This should be superseded by CORS now, though. Facebook still uses this as a fallback for older browsers, as do other read-write APIs like Google Calendar.

Wrapping up

This has been one massive article series! We've very quickly covered several topics to do with creating high quality third-party JS. This gives a decent birds-eye-view of the factors to consider, and possible solutions to problems, if you want to act like a well behaved citizen on the page. There's so much more details we can get into, but I have to draw the line somewhere. :)

Do let me know what you think in the comments. Have you come across other problems in writing your own third-party JS? What fixes did you use? I'll be only glad to know more.

9 comments:

  1. I was wondering if using socket.io for bi-directional communication or loading data from server in a 3rd party API is a wise thing to do?

    BTW awesome post..:)

    ReplyDelete
    Replies
    1. Sorry for the late response. For some reason, Blogger decided to put your comment into a moderation queue, and then decided not to notify me.

      The answer is dependent on what kind of API you are building. If your API can make-do with vanilla HTTP requests, I strongly suggest you use just HTTP. If that is insufficient, I suggest you look at the server-sent-events spec. It's not bi-directional, but it's very light-weight and easy to shim for older browsers. Only if all of that fails would I suggest socket.io. However, you should be aware that socket.io isn't as light-weight on the client as it could be, because it has to bundle in code for fallbacks for transports in case websockets are not available. Socket.io is awesome, but I would only use it if I absolutely need to.

      Delete
  2. Awesome series. I loved every part of it.
    Looking for more. Keep the good stuff going.

    ReplyDelete
  3. What if the .push method has been modified in the global scope?

    ReplyDelete
    Replies
    1. Nick,

      By global scope, I'm assuming you mean that the push method is redefined on your own array (_errs in Errorception's case or _gaq in Google Analytics' case) in some way. There's nothing that can be done in these cases. The assumption is that users will not mess with objects they don't own.

      Delete
  4. Excellent post. Thanks Rakesh, you are doing good stuff with the Errorception.

    ReplyDelete
  5. Rakesh,

    Nice post. I wanted to check on the security aspects of the third party API end points. As far as I could gather, Google Analytics allows posting to the Analytics end point using any client. Do errorception APIs allow such posting to API end points.

    ReplyDelete
    Replies
    1. Thanks. There are currently two things that causes data to get posted. THe first is that of sending custom metadata with every error documented here: http://blog.errorception.com/2012/11/capture-custom-data-with-your-errors.html The second is of posting Error objects manually, documented here: http://blog.errorception.com/2013/01/stack-traces-and-error-objects.html

      Delete