Over the last few years, GraphQL has emerged as a very popular API specification that focuses on making data fetching easier for clients, whether the clients are a front-end or a third-party.
In a traditional REST-based API approach, the client makes a request, and the server dictates the response:
$ curl https://api.heroku.space/users/1
{
"id": 1,
"name": "Luke",
"email": "luke@heroku.space",
"addresses": [
{
"street": "1234 Rodeo Drive",
"city": "Los Angeles",
"country": "USA"
}
]
}
But, in GraphQL, the client determines precisely the data it wants from the server. For example, the client may want only the user's name and email, and none of the address information:
$ curl -X POST https://api.heroku.space/graphql -d '
query {
user(id: 1) {
name
email
}
}
'
{
"data":
{
"name": "Luke",
"email": "luke@heroku.space"
}
}
With this new paradigm, clients can make more efficient queries to a server by trimming down the response to meet their needs. For single-page apps (SPAs) or other front-end heavy client-side applications, this speeds up rendering time by reducing the payload size. However, as with any framework or language, GraphQL has its trade-offs. In this post, we'll take a look at some of the pros and cons of using GraphQL as a query language for APIs, as well as how to get started building an implementation.
Why would you choose GraphQL?
As with any technical decision, it's important to understand what advantages GraphQL offers to your project, rather than simply choosing it because it's a buzzword.
Consider a SaaS application that uses an API to connect to a remote database; you'd like to render a user's profile page. You might need to make one API GET
call to fetch information about the user, like their name or email. You might then need to make another API call to fetch information about the address, which is stored in a different table. As the application evolves, because of the way it's architected, you might need to continue to make more API calls to different locations. While each of these API calls can be done asynchronously, you must also handle their responses, whether there's an error, a network timeout, or even pausing the page render until all the data is received. As noted above, the payloads from these responses might be more than necessary to render your current pages. And each API call has network latency and the total latencies added up can be substantial.
With GraphQL, instead of making several API calls, like GET /user/:id
and GET /user/:id/addresses
, you make one API call and submit your query to a single endpoint:
query {
user(id: 1) {
name
email
addresses {
street
city
country
}
}
}
GraphQL, then, gives you just one endpoint to query for all the domain logic that you need. If your application grows, and you find yourself adding more data stores to your architecture — PostgreSQL might be a good place to store user information, while Redis might be good for other kinds—a single call to a GraphQL endpoint will resolve all of these disparate locations and respond to a client with the data they requested.
If you're unsure of the needs of your application and how data will be stored in the future, GraphQL can prove useful here, too. To modify a query, you'd only need to add the name of the field you want:
addresses {
street
+ apartmentNumber # new information
city
country
}
This vastly simplifies the process of evolving your application over time.
Defining a GraphQL schema
There are GraphQL server implementations in a variety of programming languages, but before you get started, you'll need to identify the objects in your business domain, as with any API. Just as a REST API might use something like JSON schema, GraphQL defines its schema using SDL, or Schema Definition Language, an idempotent way to describe all the objects and fields available by your GraphQL API. The general format for an SDL entry looks like this:
type $OBJECT_TYPE {
$FIELD_NAME($ARGUMENTS): $FIELD_TYPE
}
Let's build on our earlier example by defining what entries for the user and address might look like:
type User {
name: String
email: String
addresses: [Address]
}
type Address {
street: String
city: String
country: String
}
User
defines two String
fields called name
and email
. It also includes a field called addresses
, which is an array of Address
objects. Address
also defines a few fields of its own. (By the way, there's more to a GraphQL schema than just objects, fields, and scalar types. You can also incorporate interfaces, unions, and arguments, to build more complex models, but we won’t cover those for this post.)
There's one more type we need to define, which is the entry point to our GraphQL API. You'll remember that earlier, we said a GraphQL query looked like this:
query {
user(id: 1) {
name
email
}
}
That query
field belongs to a special reserved type called Query
. This specifies the main entry point to fetching objects. (There’s also a Mutation
type for modifying objects.) Here, we define a user
field, which returns a User
object, so our schema needs to define this too:
type Query {
user(id: Int!): User
}
type User { ... }
type Address { ... }
Arguments on a field are a comma-separated list, which takes the form of $NAME: $TYPE
. The !
is GraphQL's way of denoting that the argument is required—omitting means it's optional.
Depending on your language of choice, the process of incorporating this schema into your server varies, but in general, consuming this information as a string is enough. Node.js has the graphql
package to prepare a GraphQL schema, but we're going to use the graphql-tools
package instead, because it provides a few more niceties. Let's import the package and read our type definitions in preparation for future development:
const fs = require('fs')
const { makeExecutableSchema } = require("graphql-tools");
let typeDefs = fs.readFileSync("schema.graphql", {
encoding: "utf8",
flag: "r",
});
Setting up resolvers
A schema sets up the ways in which queries can be constructed but establishing a schema to define your data model is just one part of the GraphQL specification. The other portion deals with actually fetching the data. This is done through the use of resolvers. A resolver is a function that returns a field's underlying value.
Let's take a look at how you might implement resolvers in Node.js. The intent is to solidify concepts around how resolvers operate in conjunction with schemas, so we won't go into too much detail around how the data stores are set up. In the "real world", we might establish a database connection with something like knex. For now, let's just set up some dummy data:
const users = {
1: {
name: "Luke",
email: "luke@heroku.space",
addresses: [
{
street: "1234 Rodeo Drive",
city: "Los Angeles",
country: "USA",
},
],
},
2: {
name: "Jane",
email: "jane@heroku.space",
addresses: [
{
street: "1234 Lincoln Place",
city: "Brooklyn",
country: "USA",
},
],
},
};
GraphQL resolvers in Node.js amount to an Object with the key as the name of the field to be retrieved, and the value being a function that returns the data. Let's start with a barebones example of the initial user
lookup by id:
const resolvers = {
Query: {
user: function (parent, { id }) {
// user lookup logic
},
},
}
This resolver takes two arguments: an object representing the parent (which in the initial root query is often unused), and a JSON object containing the arguments passed to your field. Not every field will have arguments, but in this case, we will, because we need to retrieve our user by their ID. The rest of the function is straightforward:
const resolvers = {
Query: {
user: function (_, { id }) {
return users[id];
},
}
}
You'll notice that we didn't explicitly define a resolver for User
or Addresses
. The graphql-tools
package is intelligent enough to automatically map these for us. We can override these if we choose, but with our type definitions and resolvers now defined, we can build our complete schema:
const schema = makeExecutableSchema({ typeDefs, resolvers });
Running the server
Finally, let's get this demo running! Since we're using Express, we can use the express-graphql
package to expose our schema as an endpoint. The package requires two arguments: your schema, and your root value. It takes one optional argument, graphiql
, which we'll talk about in a bit.
Set up your Express server on your favorite port with the GraphQL middleware like this:
const express = require("express");
const express_graphql = require("express-graphql");
const app = express();
app.use(
"/graphql",
express_graphql({
schema: schema,
graphiql: true,
})
);
app.listen(5000, () => console.log("Express is now live at localhost:5000"));
Navigate your browser to http://localhost:5000/graphql
, and you should see a sort of IDE interface. On the left pane, you can enter any valid GraphQL query you like, and on your right you'll get the results. This is what graphiql: true
provides: a convenient way of testing out your queries. You probably wouldn't want to expose this in a production environment, but it makes testing much easier.
Try entering the query we demonstrated above:
query {
user(id: 1) {
name
email
}
}
To explore GraphQL's typing capabilities, try passing in a string instead of an integer for the ID argument:
# this doesn't work
query {
user(id: "1") {
name
email
}
}
You can even try requesting fields that don't exist:
# this doesn't work
query {
user(id: 1) {
name
zodiac
}
}
With just a few clear lines of code expressed by the schema, a strongly-typed contract between the client and server is established. This protects your services from receiving bogus data and expresses errors clearly to the requester.
Performance considerations
For as much as GraphQL takes care of for you, it doesn't solve every problem inherent in building APIs. In particular, caching and authorization are just two areas that require some forethought to prevent performance issues. The GraphQL spec does not provide any guidance for implementing either of these, which means that the responsibility for building them falls onto you.
Caching
REST-based APIs don't need to be overly concerned when it comes to caching, because they can build on existing HTTP header strategies that the rest of the web uses. GraphQL doesn't come with these caching mechanisms, which can place undue processing burden on your servers for repeated requests. Consider the following two queries:
query {
user(id: 1) {
name
}
}
query {
user(id: 1) {
email
}
}
Without some sort of caching in place, this would result in two database queries to fetch the User
with an ID of 1
, just to retrieve two different columns. In fact, since GraphQL also allows for aliases, the following query is valid and also performs two lookups:
query {
one: user(id: 1) {
name
}
two: user(id: 2) {
name
}
}
This second example exposes the problem of how to batch queries. In order to be fast and efficient, we want GraphQL to access the same database rows with as few roundtrips as possible.
The dataloader
package was designed to handle both of these issues. Given an array of IDs, we will fetch all of those at once from the database; as well, subsequent calls to the same ID will fetch the item from the cache. To build this out using dataloader
, we need two things. First, we need a function to load all of the requested objects. In our sample, that looks something like this:
const DataLoader = require('dataloader');
const batchGetUserById = async (ids) => {
// in real life, this would be a DB call
return ids.map(id => users[id]);
};
// userLoader is now our "batch loading function"
const userLoader = new DataLoader(batchGetUserById);
This takes care of the issue with batching. To load the data, and work with the cache, we'll replace our previous data lookup with a call to the load
method and pass in our user ID:
const resolvers = {
Query: {
user: function (_, { id }) {
return userLoader.load(id);
},
},
}
Authorization
Authorization is an entirely different problem with GraphQL. In a nutshell, it's the process of identifying whether a given user has permission to see some data. We can imagine scenarios where an authenticated user can execute queries to get their own address information, but they should not be able to get the addresses of other users.
To handle this, we need to modify our resolver functions. In addition to a field's arguments, a resolver also has access to its parent, as well as a special context value passed in, which can provide information about the currently authenticated user. Since we know that addresses
is a sensitive field, we need to change our code such that a call to users doesn't just return a list of addresses, but actually, calls out to some business logic to validate the request:
const getAddresses = function(currUser, user) {
if (currUser.id == user.id) {
return user.addresses
}
return [];
}
const resolvers = {
Query: {
user: function (_, { id }) {
return users[id];
},
},
User: {
addresses: function (parentObj, {}, context) {
return getAddresses(context.currUser, parentObj);
},
},
};
Again, we don't need to explicitly define a resolver for each User
field—only the one which we want to modify.
By default, express-graphql
passes the current HTTP request
as a value for context
, but this can be changed when setting up your server:
app.use(
"/graphql",
express_graphql({
schema: schema,
graphiql: true,
context: {
currUser: user // currently authenticated user
}
})
);
Schema best practices
One aspect missing from the GraphQL spec is the lack of guidance on versioning schemas. As applications grow and change over time, so too will their APIs, and it's likely that GraphQL fields and objects will need to be removed or modified. But this downside can also be positive: by designing your GraphQL schema carefully, you can avoid pitfalls apparent in easier to implement (and easier to break) REST endpoints, such as inconsistencies in naming and confusing relationships. Marc-Andre has listed several strategies for building evolvable schemas which we highly recommend reading through.
In addition, you should try to keep as much of your business logic separate from your resolver logic. Your business logic should be a single source of truth for your entire application. It can be tempting to perform validation checks within a resolver, but as your schema grows, it will become an untenable strategy.
When is GraphQL not a good fit?
GraphQL doesn't mold precisely to the needs of HTTP communication the same way that REST does. For example, GraphQL specifies only a single status code—200 OK
—regardless of the query’s success. A special errors
key is returned in this response for clients to parse and identify what went wrong. Because of this, error handling can be a bit trickier.
As well, GraphQL is just a specification, and it won't automatically solve every problem your application faces. Performance issues won't disappear, database queries won't become faster, and in general, you'll need to rethink everything about your API: authorization, logging, monitoring, caching. Versioning your GraphQL API can also be a challenge, as the official spec currently has no support for handling breaking changes, an inevitable part of building any software. If you're interested in exploring GraphQL, you will need to dedicate some time to learning how to best integrate it with your needs.
Learning more
The community has rallied around this new paradigm and come up with a list of awesome GraphQL resources, for both frontend and backend engineers. You can also see what queries and types look like by making real requests on the official playground.
We also have a Code[ish] podcast episode dedicated entirely to the benefits and costs of GraphQL.