Gofer – HTTP clients for Node.js
We recently transitioned the main Groupon website to Node.js and we’ve documented how we dismantled the monoliths and some of the integration testing tools such as testium in earlier blog posts. One strong requirement for the new architecture was that all features would be implemented on top of HTTP services. Our best practices for working with those services led to the creation of
gofer: A wrapper around request that adds some additional instrumentation and makes it easier to manage configuration for different backend services.
We’d like to announce that
gofer is now open source!
The README file contains a detailed walkthrough on how to use the library in case you just want to see it in action. Read on for some thoughts about
gofer , calling HTTP services from node, and resilience in general.
One goal was to create a format that fits nicely into one self-contained section of our configuration. At the same time we wanted to have one unified config for all HTTP calls an app would make. Having a common pattern that all HTTP client follow means that certain best practices can be globally enforced. For example we have monitoring across apps that can tell us quite a bit about platform health beyond pure error rates.
The result looks something like this:
The semantics of this config are similar to passing an object into
request.defaults. “Every config setting is just a default value” means that it is relatively easy to reason about the effect of configuration settings. The options passed into
request for a call against
myApi are just the result of merging global defaults, the
myApi section of the config, and the explicit options on top of each other. For example:
Would be roughly equivalent to the following (given the above configuration):
If you just checked the
request docs and couldn’t find all the options, then that’s because
gofer supports a handful of additional options.
connectTimeout is described in more detail below (“Failing fast”) and is always available.
baseUrl is implemented using an “option mapper”. Option mappers are functions we can register for specific services that take a merged options object and return a transformed one. It’s an escape hatch when configuring the request options directly isn’t reasonable. If a service requires custom headers or has more complicated base url logic, we got it covered with option mappers.
The following option mapper takes an
accessToken option and turns it into the OAuth2 header Github’s API expects:
For every incoming request we create new instances of the API clients, passing in the
requestId (among other things) as part of the global defaults. This makes sure that all API requests contain the proper instrumentation.
Every open connection costs resources. Not only on the current level but also further down in the stack. While Node.js is quite good at managing high numbers of concurrent connections, it’s not always wise to take it for granted. Sometimes it’s better to just fail fast instead of letting resources pile up. Potential candidates for failing fast include:
Connect timeout (connectTimeout)
This should, in most cases, be very short. It’s the time it takes to acquire a socket and connect to the remote service. If the network connectivity to (or the health of) the service is bad, this prevents every single connection hanging around for the maximum time. For example wrong firewall setups can cause connect timeouts.
The time it takes to receive a response. The caveat is that this only captures the arrival of the headers. If the service writes the HTTP headers but then (for any reason) takes its time to actually deliver the body, this timeout will not catch it.
gofer currently does not support this
By default node will queue sockets when
httpAgent.maxSockets connections are already open. A common solution to this is to just set
maxSockets to a very high number, in some cases even
INFINITY. This certainly removes the risk of sockets queuing but it passes all load down to the next level without any reasonable limits. Another option is to chose a value for
maxSockets that is considered a healthy level of concurrency and to fail requests immediately once that level is reached. This (“load shedding”) is what Hysterix does for example.
gofer at least reports on sockets queuing and our policy is to monitor and treat this as a warning condition. We might add an option to actually fail fast on socket queuing in the future.
So one of the service calls failed. What now?
This is the easiest option. Just pass the error up the stack (or into
next when using express) and render an error page. The advantage is that no further resources are wasted on the current request, the obvious disadvantage is that it severely affects the user’s experience.
This is a very loaded term, in this case it’s meant as “omit parts of the current page instead of failing completely”. For example some of our pages contain personal recommendations as secondary elements. We can provide the core features of these pages even when service calls connected to personal recommendations would fail. This can greatly improve the resilience of pages but for the price of a degraded user experience.
Caching is not only valuable to reduce render times or to reduce load on underlying services, it can also help bridge error conditions. We built a caching library called cached on top of existing node.js modules that adds support for generating cached values in the background. Example of using
cached for wrapping a service call:
By configuring very high expiry times and low freshness times, we make sure that we have reasonably fresh data while pushing the actual service call latency out of the request dispatch and keeping the app responsive should the service ever go down.
The big gotcha is that this doesn’t do any cache invalidation but expiry, so it’s not easily applicable to data where staleness is not acceptable.
For instrumentation we use a central “gofer hub” that is shared between clients. It has two general responsibilities:
Add headers for transaction tracing (
X-Request-ID). This idea might have originated in the rails world , Heroku has a nice summary. Additionally every API request is assigned a unique
fetchIdwhich is passed down as the
Emit lifecycle events for all requests
The basic setup looks like this:
All available lifecycle events and the data they provide can be found in the API docs for gofer. At the very least they contain the
requestId and the
serviceName, endpointName, methodName
To more easily group requests, we use a hierarchy of
method. This allows us to build graphs and monitoring with different levels of precision and to drill down when necessary to quickly find the cause of problems. By default they are the config section the gofer uses (
serviceName), the path of the resource you access (
endpointName), and the HTTP verb used (
To get nicer
endpointNames we define all API calls we want to use in advance. This can be a little verbose but has the added benefit of being a free spell checker.
We can then use an instance of
MyService like this:
Since we didn’t provide an explicit methodName for the PUT call, it defaults to the HTTP method.
Want to give it a try?
You can find gofer on github and on npm today. There’s still a lot we want to improve and we’re excited to hear about your ideas on how to make API-based node apps simpler and more resilient. Let’s continue this discussion down in the comments or in Github issues!