March 18, 2019
With the recent developments in the JavaScript field, it’s tempting to start using the new ‘modern’ website building approach. However not understanding the repercussions of that decision can be devastating for your SEO.
To understand the SEO implications of using complex javascript frameworks, let’s take a look at the difference between traditional CGI (common gateway interface) website building standard used since 1993.
With the traditional CGI deployment, the HTML is formulated before it’s presented to the client (web browser/crawler). The process could slightly differ depending on back-end frameworks used, however, the result is always fully or partially rendered before it is passed on to the server which in turn sends it back to the browser. The advantage of this method is that the content received by the browser is mostly ready to use and will load extremely fast with proper optimisations e.g. Amazon.co.uk. This approach uses JavaScript as a UI support rather than the main logic processor.
The modern Javascript Framework e.g. ReactJS methodology is to handle most of data rendering on the Client’s side (the browser), this can include Routing (deciding how the page link gets handled by the web application). It works by delivering basic HTML structure to the browser and initialising JavaScript framework to handle the rendering. This requires a chain of requests to come back from the server successfully which increases the initial loading time greatly. The selling point of this approach is that once you’ve loaded everything up, you can navigate quickly through the web application without fully reloading the page.
Here is an infographic that shows the support of different search engines for JavaScript frameworks:
Looking at the graphic we can assume that ReactJS is supported by the google bot? – No, it’s a lot more complicated than that.
To understand why, we need to have the knowledge of how crawlers get our URLs and index them.
Crawlers are simple programs that go to different URLs and process the HTML they receive to make decisions about rankings. They are not browsers and they do not process any JavaScript as far as we know it. The process of extraction of meaningful data from a markup language like HTML is called parsing. As it stands, there is no easy way to parse JavaScript as it is not a markup language but a scripting language, so it needs to be interpreted by the browser or NodeJS.
Therefore it is a huge effort to interpret Javascript as a browser simulation and a lot of additional server resources are required.
Have you ever wondered what the website code looks like, right clicked on the page and chosen view source? You can safely assume that the crawler can see all that’s displayed to you on that view unless special server rules are in place. The crawler will only read what is immediately returned from the server. So if you’re running a one page JavaScript app and all that is sent is some wrapping HTML then that is what will get indexed.
For more complex debugging try command (Linux or Mac): curl –user-agent “Googlebot/2.1 (+http://www.google.com/bot.html)” http://exmaple.com
You’re probably confused as the green ticks on the infographic suggest the support.
Yes, there is limited support but it’s not the Crawler interpreting the website. Google has developed a “Web Rendering Service” that is a separate piece of software and it will run at different times to the main Crawler.
1st week – crawl homepage (crawler) and schedule the “Web Rendering Service” to visit the page —> render homepage using “Web Rendering Service” and find all the links and relevant info (does not follow links on its own)
2nd week – crawl homepage (data from “Web Rendering Service” is used to crawl links previously not seen) —> render homepage using “Web Rendering Service” and find all the links and relevant info (does not follow links on its own)
As you can see those 2 software pieces don’t run together very well and the Crawler runs on its own schedule independently of what the “Web Rendering Service” is doing as well as it is the ultimate decision maker in the process of indexing your website. You can also notice that there is a minimum of 1 week lag in indexing pages returned by the “Web Rendering Service” which can be very undesirable for quickly changing content e.g. e-commerce shops, news website. If you have a large website, it could take an unreasonably long amount of time to index every page.
It’s also important to understand that the “Web Rendering Service” has a hard cut off point after 5 seconds of loading, which means that if your website loads for longer than 5 seconds the service will quit and not pass anything back to the crawler. This will cause your website to be invisible to Google.
“Web Rendering Service” is an experimental technology and is flaky at best. You cannot expect proper indexing and for Google to see exactly what you see on your browser. It’s been reported that it requires most of JavaScript to be inline to be even considered for processing. As you can imagine it’s very expensive to run this service and it’s very unlikely that Google will increase the 5 seconds load limit in the future.
But not all hope is lost.
There are mechanisms that can make your website visible to the crawler:
The main idea behind those mechanisms is to get the HTML rendered before it’s received by the browser like the CGI method I described at the beginning of this article where the server will serve pre-rendered HTML for the search engine and non-rendered for the standard user. We can confirm that this method works when your website loads more than 5 seconds and the “Web Rendering Service” sees nothing. However, we cannot confirm what the penalties SEO penalties can be applied if the Crawler and “Web Rendering Service” do not agree on the content seen. The user-agent detection is critical here and any small error can cost rankings or apply long term penalties.
It is a cool, trendy and new way of making websites, however, the tradeoffs in the area of SEO are too big at the present: