We’re going to walk through the technical process of hooking into Google Drive’s API to source content on a website. We’ll examine the step-by-step implementation, as well as how to utilize server-side caching to avoid the major pitfalls to avoid such as API usage limits and image hotlinking. A ready-to-use npm package, Git repo, and Docker image are provided throughout the article.
But… why?
Table of Contents
At some point in the development of a website, a crossroads is reached: how is content managed when the person managing it isn’t technically savvy? If the content is managed by developers indefinitely, pure HTML and CSS will suffice — but this prevents wider team collaboration; besides, no developer wants to be on the hook for content updates in perpetuity.
So what happens when a new non-technical partner needs to gain edit access? This could be a designer, a product manager, a marketing person, a company executive, or even an end customer.
That’s what a good content management system is for, right? Maybe something like WordPress. But this comes with its own set up of disadvantages: it’s a new platform for your team to juggle, a new interface to learn, and a new vector for potential attackers. It requires creating templates, a format with its own syntax and idiosyncrasies. Custom or third-party plugins may need to be to vetted, installed, and configured for unique use cases — and each of these is yet another source of complexity, friction, technical debt, and risk. The bloat of all this setup may end up cramping your tech in a way which is counterproductive to the actual purpose of the website.
What if we could pull content from where it already is? That’s what we’re getting at here. Many of the places where I have worked use Google Drive to organize and share files, and that includes things like blog and landing page content drafts. Could we utilize Google Drive’s API to import a Google Doc directly into a site as raw HTML, with a simple REST request?
Of course we can! Here’s how we did it where I work.
What you’ll need
Just a few things you may want to check out as we get started:
Authenticating with the Google Drive API
The first step is to establish a connection to Google Drive’s API, and for that, we will need to do some kind of authentication. That’s a requirement to use the Drive API even if the files in question are publicly shared (with “link sharing” turned on). Google supports several methods of doing this. The most common is OAuth, which prompts the user with a Google-branded screen saying, “[So-and-so app] wants to access your Google Drive” and waits for user consent — not exactly what we need here, since we’d like to access files in a single central drive, rather than the user’s drive. Plus, it’s a bit tricky to provide access to only particular files or folders. The https://www.googleapis.com/auth/drive.readonly
scope we might use is described as:
See and download all your Google Drive files.
That’s exactly what it says on the consent screen. This is potentially alarming for a user, and more to the point, it is a potential security weakness on any central developer/admin Google account that manages the website content; anything they can access is exposed through the site’s CMS back end, including their own documents and anything shared with them. Not good!
Enter the “Service account”
Instead, we can make use of a slightly less common authentication method: a Google service account. Think of a service account like a dummy Google account used exclusively by APIs and bots. However, it behaves like a first-class Google account; it has its own email address, its own tokens for authentication, and its own permissions. The big win here is that we make files available to this dummy service account just like any other user — by sharing the file with the service account’s email address, which looks something like this:
[email protected]
When we go to display a doc or sheet on the website, we simply hit the “Share” button and paste in that email address. Now the service account can see only the files or folders we’ve explicitly shared with it, and that access can be modified or revoked at any time. Perfect!
Creating a service account
A service account can be created (for free) from the Google Cloud Platform Console. That process is well documented in Google’s developer resources, and in addition it’s described in step-by-step detail in the companion repo of this article on GitHub. For the sake of brevity, let’s fast-forward to immediately after a successful authentication of a service account.
The Google Drive API
Now that we’re in, we’re ready to start tinkering with what the Drive API is capable of. We can start from a modified version of the Node.js quickstart sample, adjusted to use our new service account instead of client OAuth. That’s handled in the first several methods of the driveAPI.js we are constructing to handle all of our interactions with the API. The key difference from Google’s sample is in the authorize()
method, where we use an instance of jwtClient
rather than the oauthClient
used in Google’s sample:
authorize(credentials, callback) { const { client_email, private_key } = credentials; const jwtClient = new google.auth.JWT(client_email, null, private_key, SCOPES) // Check if we have previously stored a token. fs.readFile(TOKEN_PATH, (err, token) => { if (err) return this.getAccessToken(jwtClient, callback); jwtClient.setCredentials(JSON.parse(token.toString())); console.log('Token loaded from file'); callback(jwtClient); });
}
Node.js vs. client-side
One more note about the setup here — this code is intended to be called from server-side Node.js code. That’s because the client credentials for the service account must be kept secret, and not exposed to users of our website. They are kept in a credentials.json
file on the server, and loaded via fs.readFile
inside of Node.js. It’s also listed in the .gitignore
to keep the sensitive keys out of source control.
Fetching a doc
After the stage is set, loading raw HTML from a Google Doc becomes fairly simple. A method like this returns a Promise of an HTML string:
getDoc(id, skipCache = false) { return new Promise((resolve, reject) => { this.drive.files.export({ fileId: id, mimeType: "text/html", fields: "data", }, (err, res) => { if (err) return reject('The API returned an error: ' + err); resolve({ html: this.rewriteToCachedImages(res.data) }); // Cache images this.cacheImages(res.data); }); });
}
The Drive.Files.export endpoint does all the work for us here. The id
we’re passing in is just what shows up in your browsers address bar when you open the doc, which is shown immediately after https://docs.google.com/document/d/
.
Also notice the two lines about caching images — this is a special consideration we’ll skip over for now, and revisit in detail in the next section.
Here’s an example of a Google document displayed externally as HTML using this method.
Fetching a sheet
Fetching Google Sheets is almost as easy using Spreadsheets.values.get. We adjust the response object just a little bit to convert it to a simplified JSON array, labeled with column headers from the first row of the sheet.
getSheet(id, range) { return new Promise((resolve, reject) => { this.sheets.spreadsheets.values.get({ spreadsheetId: id, range: range, }, (err, res) => { if (err) reject('The API returned an error: ' + err); // console.log(res.data.values); const keys = res.data.values[0]; const transformed = []; res.data.values.forEach((row, i) => { if(i === 0) return; const item = {}; row.forEach((cell, index) => { item[keys[index]] = cell; }); transformed.push(item); }); resolve(transformed); }); });
}
The id
parameter is the same as for a doc, and the new range
parameter here refers to a range of cells to fetch values from, in Sheets A1 notation.
Example: this Sheet is read and parsed in order to render custom HTML on this page.
…and more!
These two endpoints already get you very far, and forms the backbone of a custom CMS for a website. But, in fact, it only taps the surface of Drive’s potential for content management. It’s also capable of:
- listing all files in a given folder and display them in a menu,
- importing complex media from a Google Slides presentation, and
- downloading and caching custom files.
The only limits here are your creativity, and the constraints of the full Drive API documented here.
Caching
As you’re playing with the various kinds of queries that the Drive API supports, you may end up receiving a “User Rate Limit Exceeded” error message . It’s fairly easy to hit this limit through repeated trial-and-error testing during the development phase, and at first glance, it seems as if it would represent a hard blocker for our Google Drive-CMS strategy.
This is where caching comes in — every time we fetch a new version of any file on Drive, we cache it locally (aka server-side, within the Node.js process). Once we do that, we only need to check the version of every file. If our cache is out of date, we fetch the newest version of the corresponding file, but that request only happens once per file version, rather than once per user request. Instead of scaling by the number of people who use the website, we can now scale by the number of updates/edits on Google Drive as our limiting factor. Under the current Drive usage limits on a free-tier account, we could support up to 300 API requests per minute. Caching should keep us well within this limit, and it could be optimized even further by batching multiple requests.
Handling images
The same caching method is applied to images embedded inside Google Docs. The getDoc
method parses the HTML response for any image URLs, and makes a secondary request to download them (or fetches them directly from cache if they’re already there). Then it rewrites the original URL in the HTML. The result is that static HTML; we never use hotlinks to Google image CDNs. By the time it gets to the browser, the images have already been pre-cached.
Respectful and responsive
Caching ensures two things: first, that we are being respectful of Google’s API usage limits, and truly utilize Google Drive as a front end for editing and file management (what the tool is intended for), rather than leaching off of it for free bandwidth and storage space. It keeps our website’s interaction with Google’s APIs to the bare minimum necessary to refresh the cache as needed.
The other benefit is one that the users of our website will enjoy: a responsive website with minimal load times. Since cached Google Docs are stored as static HTML on our own server, we can fetch them immediately without waiting for a third-party REST request to complete, keeping website load times to a minimum.
Wrapping in Express
Since all this tinkering has been in server-side Node.js, we need a way for our client pages to interact with the APIs. By wrapping the DriveAPI
into its own REST service, we can create a middleman/proxy service which abstracts away all the logic of caching/fetching new versions, while keeping the sensitive authentication credentials safe on the server side.
A series of express
routes, or the equivalent in your favorite web server, will do the trick, with a series of routes like this:
const driveAPI = new (require('./driveAPI'))();
const express = require('express');
const API_VERSION = 1;
const router = express.Router(); router.route('/getDoc')
.get((req, res) => { console.log('GET /getDoc', req.query.id); driveAPI.getDoc(req.query.id) .then(data => res.json(data)) .catch(error => { console.error(error); res.sendStatus(500); });
}); // Other routes included here (getSheet, getImage, listFiles, etc)... app.use(`/api/v${API_VERSION}`, router);
See the full express.js file in the companion repo.
Bonus: Docker Deployment
For deployment to production, we can can run the Express server alongside your existing static web server. Or, if it’s convenient, we could easily wrap it in a Docker image:
FROM node:8
# Create app directory
WORKDIR /usr/src/app
# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available ([email protected]+)
COPY package*.json ./
RUN npm install
# If you are building your code for production
# RUN npm ci --only=production
# Bundle app source
COPY . .
CMD [ "node", "express.js" ]
…or use this pre-built image published on Docker Hub.
Bonus 2: NGINX Google OAuth
If your website is public-facing (accessible by anyone on the internet), then we’re done! But for our purposes where I work at Motorola, we are publishing an internal-only documentation site that needs additional security. That means Link Sharing is turned off on all our Google Docs (they also happened to be stored in an isolated and dedicated Google Team Drive separated from all other company content).
We handled this additional layer of security as early as possible at the server level, using NGINX to intercept and reverse-proxy all requests before they even make it to the Express server or any static content hosted by the website. For this, we use Cloudflare’s excellent Docker image to present a Google sign-on screen to all employees accessing any website resources or endpoints (both the Drive API Express server and the static content alongside it). It seamlessly integrates with the corporate Google account and single-sign-on they already have — no extra account needed!
Conclusion
Everything we just covered in this article is exactly what we’ve done where I work. It’s a lightweight, flexible, and decentralized content management architecture, in which the raw data lives where Google Drive, where our team already works, using a UI that’s already familiar to everyone. It all gets tied together into the website’s front end which retains the full flexibility of pure HTML and CSS in terms of control over presentation, and with minimal architectural constraints. A little extra legwork from you, the developer, creates a virtually seamless experience for both your non-dev collaborators and your end users alike.
Will this sort of thing work for everyone? Of course not. Different sites have different needs. But if I were to put together a list of use cases for when to use Google Drive as a CMS, it would look something like this:
- An internal site with between a few hundred to a few thousand daily users — If this had been the front page of the global company website, even a single request for file version metadata per user might approach that Drive API usage limit. Further techniques could help mitigate that — but it’s the best fit for small to medium-size websites.
- A single-page app — This setup has allowed us to query the version numbers of every data source in a single REST request, one time per session, rather than one time per page. A non-single-page app could use the same approach, perhaps even making use of cookies or local storage to accomplish the same “once per visit” version query, but again, it would take a little extra legwork.
- A team that’s already using Google Drive — Perhaps most important of all, our collaborators were pleasantly surprised that they could contribute to the website using an account and workflow they already had access to and were comfortable using, including all of the refinements of Google’s WYSIWYG experience, powerful access management, and the rest.