Hypermedia API Client Success

Mike Admundsen visited the Lonely Planet offices recently and spent all day discussing the current state of API development, and thoughts on what the future holds - for those brave enough to take the risk.

One thing that I walked away from the day-long event with was the thought, "Ok, so we've mostly been doing this for a decade and we've kinda painted ourselves into a corner, but there's a few people who are trying to do it the right way. Why can't we, today, start churning out hypermedia APIs and clients that can easily consume them?"

Well, the answer is: It's hard.

The solution is to make it easier. How do we do that? Let's start with how things work today.

Current Client Strategy

The way the API consumers write clients these days is based on having a large amount of out-of-band knowledge of the application that the API is exposing. For example, a client developer must read the documentation of an API to discover that you get a list of users at api.foobar.com/users with a GET. But after you get a list of users, what then?

Well, read more docs an discover that you can view, update or delete a specific user at api.foobar.com/users/1a883efc1d. Ok, that's good, but let's assume that there are orders, wishlists, and events that are related to users. How do I interact with those?

Off to the docs to discover that I get a user's orders at api.foobar.com/users/1a883efc1d/orders. If I want to then display the items in the order, I need to GET that order, discover the unique identifier and then GET from api.foobar.com/orders/8ee1babc07ed which provides me with an array of items in that order.

Rinse and repeat for every single aspect of the application state that the API exposes.

First, in this workflow, knowledge of the API has to be discovered by a client developer by reading all of the documentation for each resource and URI exposed. The developer needs to then have a mental map of how all those resources fit together for their client application.

Next, those resource URIs get translated into very specific client code that responds to user interactions and then calls the appropriate endpoint. Lots of functions are written like addUser(), updateOrder(), and listOrderItems() where the client developer instructs their software to hit the right URI and use the right HTTP method in order to perform the appropriate action (i.e. change the state of an application resource).

What do we have now?

We have clients and an API that are heavily coupled. Change a URI and all clients break. Change a POST to a PUT and all clients break. Add a key/property value to one of your representations, and clients that have written API schema validators will break. We need to write clients and servers that are more loosely coupled, so that the server can own its own namespace, change URLs, and the client remains unbroken. The client should care about state transitions that are provided by the server rather than being tightly coded against URLs.

How can we change the course and have the client and server speaking the same language, while remaining flexible?

Semantic Discovery

Having written a few API clients, I felt this gap, even if I couldn't verbalize it. That's what Mike was able to do. He called it the Semantic Gap.

Defining what a representation... um, represents

The media type

In discussing how to deal with creating a common definition for a particular resource, the current solution for truly common types on the Web is to create a new media type (see full list at the IANA site). These include standard image types like jpeg, png and gif. There's also standard text formats like text/xml, text/html, and text/css. These media types are a way for two systems to have a common language for understanding what is contained in a particular message.

Browser says, "The developer requested a resource at this URI and stated that it must be of type text/html. Please give me that."

Server says, "Ah yes, I have that resource and it is available as text/xml, text/json, and text/html. You wanted the HTML version, so here you go, browser."

Browser then says, "Thank you, old chap. Since I got the resource and the server verified that it was of type text/html, I know that I have to parse that resource and render that as an HTML document structure in the main viewport. Here we go..."

In that lovely, everyday conversation between a web browser and a server, they were able to use a third-party, agreed-upon, standard so that they both had an understanding about what kind of content was being exchanged, and the client could take the appropriate action. This exchange works fantastically for common meta-types of elements used in building things on the Web. Two systems can agree that something is a CSS file, but there's no understanding of what's contained in that file. Similarly, two systems agree that an image file of a particular format was requested and sent, but what that image file represents is not part of the conversation.

So what if you want to define something more specific? A true type.

I'm going to use Lonely Planet for an example here since we're going through the process of building a hypermedia API. We deal with resources like Places, Points of Interest [POI], and Travel Services.

For example, we could expose an API resource at the URI http://api.lonelyplanet.com/poi/9cdc3ba66af, and document that the representation of that point of interest would be delivered in the format of application/vnd.siren+json. But how do the systems know that it's a POI? Obviously, the developers of the client know it because they read it on our documentation page, but that is out of band information. It's not inherent in the communication taking place between the two systems.

In order to have the clients that would use our API understand exactly what type of data is contained in a representation, without the need for reading documentation, we could author an RFC draft that proposes a new media type called named text/poi. Unfortunately, even this is limiting because we would still be defining format more than substance. There's not an established understanding of what a POI is, how to let systems and user interact with it, and how we can facilitate state changes for a POI.

There's still a large amount of time and effort needed by a client developer to build that understanding into the client after reading out of band documentation.

The semantic profile

Mike discussed a new type of semantic tool called the Application Level Profile Semantics [ALPS] that would allow n disparate systems to all automatically understand what is being represented in a message - not just how it's being represented.

I'm about to do a terrible job of explaining how it all works, so take the opportunity, if you're interested, to read the specification (link above).

Here's an example from the mapping guidelines.

<!-- ALPS Profile -->
<alps>
  <link rel="self" href="http://alps.io/profiles/search" />
  <descriptor id="text" type="semantic">
  <descriptor id="search" type="safe">
</alps>

<!-- HTML4 Representation -->
<html>
  <head profile="http://alps.io/profiles/search">
  </head>
  <body>
    <form action="..." method="get">
      <input type="text" name="q" class="text" value="" />
      <input type="submit" class="search" />
    </form>
  </body>
</html>

In this quick example, you can see how an ALPS profile is used to define a common vernacular for what is involved in search, and then an example implementation of that profile in HTML.

  1. There should be a semantic element that represents the text descriptor on the profile. In HTML, this could be a div element, but since the profile also defines a state transition, it makes more sense to implement an input field that can either be auto-populated, or accept user input. To comply with the profile, it's given a class of text.
  2. There should be an element that represents the search descriptor and starts a safe state transition (such as a GET). In HTML, a good way to do that is a button on a form. To comply with the profile, it's given a class of search.

Let's make this as easy as possible

Many developers today are used to working with a semi-REST API. By this, I mean that many APIs available today provide a list of URLs that expose resources, and lots of documentation to explain what the resources are and how they related to each other (this is also called out-of-band information).

It is then up to the developer to build a tightly coupled client with all of the state transitions hard coded into the client software. The client explicitly defines what it available to the user of the client, and the server simply becomes a puppet for the client to manipulate.

A full REST API, more accurately described as a Hypermedia API, implements HATEOS in the representations sent to the client so that there's vastly less out-of-band information needed to be consumed by the developer and hard coded into the client. However, just providing links in the server response to provide guidance for state transitions, while very important, is only one part of the solution. We have to make it easy for client developers to understand them so that clients can be written in a more intelligent manner and not be so tightly coupled. The server should be able to own its own namespace and change how state transitions occur, with the client being blissfully unaware of it.

Therefore, we need more tooling for developers to work with Hypermedia APIs. Here's some ideas.

Idea #1: Hypermedia Explorer

How useful would it be to have visual explorer of a hypermedia API that shows nodes and edges in a link-graph style UI. It would also provide a key showing all the resources provided by the API, and types of state transitions. This would allow developers to filter the graph to find information pertinent to their current task, especially if the API is complex, with dozens, or hundreds, of resources.

Since there's no universal format for how resources are represented, then an explorer client would need to be extensible, much like modern text editors are like Sublime Text or Atom. Developers could write their own plugins that would accept Siren, JSON+P, HAL, or Collection+JSON when working with JSON formatted representations.

Idea #2: Extensible Code Generators

Client side

If there are agreed-upon semantics governing a type of resource - a person, a book, an invoice, etc. - then client code generation can become more automated by reading a profile, and then producing some boilerplate templates that implement the profile. In addition, a sample JSON or XML representation of a resource could be produced that the API developer could use.

  • Generate an HTML form
  • Generate a XHR request
  • Generate an <img> or <a> element
  • Generate a table for listing resource properties
  • Generate data stubs in JSON or XML for use in development

Server side

I'm currently imagining that a semantic profile could be a starting point for an API developer and have some stub code generated from it. An extensible code generator could have plugins that allows the developer to (1) read a profile for a Book, (2) generates a Ruby module/method that (3) stubs out a Siren representation of a Book.

  • Plugin #1: Read semantic profile
  • Plugin #2: Implements Siren stubs of a profile
  • Plugin #3: Generates a Ruby method

I'm still thinking about all this, but right I believe the best way to go is to build an extensible framework and then allow all the smart people in the world build the needed plugins.

Again, it really shouldn't be hard to work with an API. If we can automate the tedious aspects of developer, then we can focus more on solving problems. In five years, we should not still be having friendly arguments about how to version an API, or what media type is the best.

I want my kids to be able to build a hypermedia API that they can use to build their own applications about My Little Pony or Power Rangers. Because then, later in life if they choose to go into software development, they can focus on building products and not having the same arguments we're having today.