Home » Blogs » Engineering » JavaScript SEO strategy for client-side rendered web apps — Part I

JavaScript SEO strategy for client-side rendered web apps — Part I

Pawan Wagh

Home » Blogs » Engineering » JavaScript SEO strategy for client-side rendered web apps — Part I

JavaScript SEO strategy for client-side rendered web apps — part I

Interesting case study how we strategized & implemented Javascript SEO strategy for one of our client’s product.

A Little Background

This web application is built on React/Redux using NodeJS at the backend. A Single Page Application (SPA) is a complex situation, because there is no more than a single page index.html, which when loaded will be composed using JavaScript, both vanilla and by some framework. Being a SPA utilizing client-side rendering hinders the whole process of Google crawling web pages which are dynamically constructed & linked to a specific web page or URL of interest.

The Case

Google was not crawling all the web pages on site that have been rendered with ReactJS. Since, these web pages are powered by React and the data for the application is dependent on backend REST API’s. Also, the web app didn’t have unique URLs for every web page of interest since what to render was stored in redux state ultimately resulting in 100s of web pages linked & rendered on same URL.

Problem statements

  1. SEO strategy for an existing web app without or less impact on the existing code base.

  2. Strategy for implementing user friendly URLs with every webpage having an unique URL. One constraint being, if an user’s search took him to such link, then click must take the user to the relevant web page.

Research & available solutions

During our research, we found that most articles would talk about methods like SSR (Server Side Rendering) or pre-rendering — most common way people solved this issue.

One another way for keeping your existing web app unhurt & without compromising SEO benefits across different search engines & crawlers is dynamic rendering.

Dynamic rendering requires your web server to detect crawlers (for example, by checking the user agent). Requests from crawlers are routed to a renderer, requests from users are served normally. Where needed, the dynamic renderer serves a version of the content that’s suitable to the crawler, for example, it may serve a static HTML version. — Google developers docs

However, in our case, this was a time-consuming task and not feasible. SSR lead to re-architecting & rewriting whole web application whereas pre-rendering & dynamic rendering added additional infrastructure and technical complexity. So we decided to come up with our own solution.

The Solution

To understand the solution, first we need to grasp the current google crawling process for javascript application. Google now supports crawling & indexing Javascript web applications via a certain process using headless Chromium. Googlebot processes JavaScript web apps in three main phases:

  1. Crawling — When Googlebot fetches a URL from the crawling queue by making an HTTP request it first checks if you allow crawling. Googlebot reads the robots.txt file.

  2. Rendering — Googlebot queues all pages for rendering, unless a robots meta tag or header tells Googlebot not to index the page. Once Googlebot’s resources allow, a headless Chromium renders the page and executes the JavaScript.

  3. Indexing — Googlebot parses the rendered HTML for links again and queues the URLs it finds for crawling. Googlebot also uses the rendered HTML to index the page.

Our Strategy —

1. URL params over state managed data — Linking every webpage of interest to an unique URL. Because all webpages were served by solr, so the backend APIs for serving the dynamic content were fast enough to serve content with lowest latency possible hence we decided to go for URL params. Now, we needed content identifiers to tell the web application which content to fetch & render by looking at the URL. So, the content category & content title were decided as content identifiers and made as a part of URL params. Embedding the content information into URL which needs to be rendered made the ReactJS component responsible for webpage stateless because now the component was independent from application state and integration became pretty much seamless. This changed the URL structure from https://www.example.com/pages to a more user friendly format — https://www.example.com/pages/category/content-title. This is the part which helps us in Crawling phase. Below is an example of code snippet for the router, to accept dynamic URL params.
import { WebPage } from “./”;
export default {
 path: “pages”,
 childRoutes: [
   { path: “:category/:title”, component: WebPage },
 ],
};
2. Speed — Being a SPA, the web application downloaded the whole javascript bundle at once. This lead to a waste of processing power & delay to render the relevant content. If the googlebot identifies that application rendering is too time and resource consuming, then it may choose to not index the web page and come back later. Implementing code splitting — splitting the main JS bundle into small JS chunks, generated multiple & small JS chunks (we decided to go for route based code splitting) and loading the relevant code module & component on demand when required using “Dynamic imports”. Above technique made the content serving web page lighter and quicker to load & render, since we’re only downloading bare minimum JS which is necessary to render the web page. This is the part which helps us in rendering phase.

3. Dynamic meta tags — Now, we need to tell the googlebot about the webpage to show the data preview to the user when google displays the crawled link as a search result. We don’t want the user to scratch his head to understand and imagine about the content of webpage/link. A little description of what the link is about helps the user why the user should visit that link. We decided to go for react-helmet, a javascript plugin built to take control of the <head> section of pages built with ReactJS from within the React components. Implementing react-helmet is as simple as the below code snippet. Though, you may need more meta tags to describe your content or web page in more detail. This is the part which helps us in indexing phase.
import React from "react";
import {Helmet} from "react-helmet";
class WebPage extends React.Component {
  componentDidMount() {
   const { category, title } = this.props.match.params;
   this.getContent(category, title);
  }
  getContent = (category, title) => {
    this.props.actions.getContent(category, title);
  }
render () {
    const { title, canonicalLink } = this.props.content;
    return (
      <div className="webpage-wrapper">
        <Helmet>
          <meta charSet="utf-8" />
          <title>{{ title }}</title>
          <link rel="canonical" href={{canonicalLink}} />
        </Helmet>
        …
      </div>
    );
  }
};
4. Indexing API — Google provides us with indexing API to notify the google when webpages are added or removed. This allows Google to schedule pages for a fresh crawl, which can lead to higher quality user traffic instead of waiting for google to come & visit your website. Below is a code snippet to notify google when a new URL is added or an existing URL is updated. This is the part which helps us in indexing phase and keeps the added web pages up-to-date with google indexing.
Send the following HTTP POST request to https://indexing.googleapis.com/v3/urlNotifications:publish endpoint
{
  "url": "https://www.example.com/category/content-title",
  "type": "URL_UPDATED"
}
We call this API every time content/webpage is added or updated or deleted in the product.

References —

  1. https://developers.google.com/search/docs/guides/javascript-seo-basics
  2. https://developers.google.com/search/docs/guides/dynamic-rendering
  3. https://developers.google.com/search/docs/guides/fix-search-javascript
  4. https://developers.google.com/search/apis/indexing-api/v3/using-api
  5. https://reactrouter.com/web/example/url-params
  6. https://medium.com/@ohsiwon/code-splitting-with-dynamic-import-test-and-learn-28bc2a06d1b8

JavaScript SEO strategy for client-side rendered web apps — part I was originally published in Prodio DesignWorks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Story published on September 16, 2020

more in Engineering

Zoom Meeting Integration in web application — Part 1

Zoom Meeting Integration in web application — Part 1Case study how we integrated Zoom Meetings via REST APIs for one our client’s products.A Little BackgroundThis product

Pawan Wagh

Embedding OpenSource Product into Our React HRM Product

We have an HR Management Product NeoHRM and wanted to build an interesting feature that captures the culture of the firm and helps build a community around knowledge management

Vatsal Shah

How Prodio leveraged NodeJs async/await to modify Zerodha APIs - Part 1

We are building something that helps to manage risk, buckets your capital into meaningful allocations, automates your trade, and monitors your trading actions & decisions for

Pawan Wagh