Hands on with ScreamingFrog Log File Analyser 2.0

ScreamingFrog are best known for their truly excellent SEO Spider Tool. It’s one of the tools I personally use and rely on the most when I’m doing a technical audit of a site. The hierarchical information that can be gleaned and exported from a crawl is a real treasure trove and, with the addition of Javascript rendering its capable of giving an accurate snapshot of sites regardless of how they’ve been built.

Their credentials in the SEO tools market are beyond reproach, which made the (relatively) recent launch of their Log File Analyser – now up to version 2.0 – an exciting prospect. Log file analysis is an increasingly well-understood element of a technical SEO audit, helping to paint a complete picture of how GoogleBot and other user agents are discovering, crawling and caching content and pages on a site.

As sites become larger and more complex, the importance of this can’t be overstated. Log analysis is the best way to understand how effectively your crawl budget is being utilised and to track down niggling problems in terms of information architecture and site structure.

Most servers will capture access log files but extracting useful data from them can be a challenging task; whether it’s by using Excel or one of the existing log analysis services such as Splunk – it’s not that intuitive a process.

There’s definitely a gap in the market for a more instinctively parsable log analysis tool. So how close does ScreamingFrog Log File Analyser come to filling that role?

 

Importing Data

LFA allows you to straight-up import the raw .log file as downloaded directly from the server. No need to worrying about converting or formatting the data first, simply drag and drop or use the browse function.

Additionally, the software allows for importing of URL lists in csv or xls/xlsx (Excel files pre and post 2004) – these can then be used to perform match comparisons with the data from the log file – particularly useful if you have a subset of important URLs that you’re looking to analyse.

Helpfully, since the tool is intended primarily as an SEO tool (most other tools of this type pitch themselves as multi-use) LFA will also automatically identify and segregate those requests made by the GoogleBot for you, ensuring that you always have a dataset purely for what is most likely the important spider crawling your site.

 

Analysing Data

The “Overview” tab is an excellent dashboard with all the major metrics you’re likely to be tracking while performing log file analysis. It also includes handy line graphs that break down the response codes, events and URLs accessed over the period of the log. These can be filtered by time period and also by the particular Bot whose behaviour you’re interested in.

AWay from the overview, LFA breaks down the data onto a series of tabs – each containing a number sortable, filterable, exportable columns and rows, echoing the familiar setup of the SEO Spider Tool. As with the Spider Tool you can also filter this data on the fly by searching any alphanumerical string or, if you prefer, use the Tree view to see a more hierarchical breakdown.

The “URL” view contains the following:

  • URL
  • Last Response Code
  • Time of Last Response
  • Content Type
  • Average Bytes
  • Average Response Time (in miliseconds)
  • Overall number of events
  • Number of Bot Requests (this is further broken down into additional columns for commonly analysed bots: Googlebot (and mobile/smartphone variants), Bingbot, Yandex, Baidu).

Immediately this gives you a large amount of data that is ripe for analysis. A few examples of how quickly sorting this data by 1 or more of the columns can help you find problem pages:

  1. Average Bytes & Average Response Time: find pages that respond particularly slowly to requests by search engine spiders – these pages are likely to perform worse organically as slower pages can be penalised as a result of higher bounce rates and other negative user behaviour factors. You may even find that extremely slow pages are either ignored or cached less regularly than better optimised pages – these pages are a prime opportunity to be run through GTMetrix or another page speed testing tool.
  2. Last Response Code, Time of Last Response & Num Events: Check for pages that are returning a 400 Not Found, 500 Server Error or 30X Redirect header response in the ‘Last Response Code’ column. Then by referencing the “ToLR” and “Num Events” find out how many times, and how regularly, these valueless pages are being crawled. This is excellent for finding content that has recently been deleted or removed but is still receiving regular attention from search engine bots – allowing you to redirect or repurpose the URL as necessary.

For more specialised analysis the tool’s other, more focused, tabs offer expanded data:

  • Response Codes: analyse response codes for each requested URL over the period of the log file – for example: has a valid page requested 96 times responded back with 96 200 OK responses or has it been an inconsistent mixture of 200 OK and 404 Not Found? The simple true/false ‘Inconsistent’ flag will help you track these pages down for further analysis.
  • User Agents: break down your data primarily by the User Agent rather than the particular URL – find out which agents are responding the quickest and what header responses are being returned in what quantity.
  • Referers: discover the page that provided the link to make this request, how quickly your site responded and the number of errors. Useful for finding out where your referral traffic is coming from.
  • Directories: breaks down URLs to follow the directory (or folder) structure of your site, displaying the number and type of events that each directory has received. This is a further powerful tool for assessing which specific areas of your site (rather than at a page level) Google is spending most of its time crawling – a great way of analysing and optimising crawl budget.
  • IPs: rather than relying on the user agent to identify the request-maker, this breaks it down further to the individual IP address. Unusual data here could (but doesn’t always!) represent negative SEO attacks or other malicious intent.
  • Events: the full and rather intimidating list of Events over the time period of the log, identified by timestamp, method, response code and user agent.
  • Imported URL data: as mentioned previously, this allows for the importing of a more curated URL list which can then be used as comparison data in any of the other tabs via the ‘View’ dropdown to find URLs that are missing from either dataset or to combine the to sources.

Exporting Data

Each site you analyse is stored in a separate Project file, but chances are you’re going to want to output segments of your data and, thankfully, this is simple and flexible in the ScreamingFrog LFA tool.

Each tab has its own export function, which in the case of the URL and Response Code tabs can be filtered to pull out specifically what you’re looking for.

  • URL – export by content type (text/html, JS, CSS, etc)
  • Response Code – export by the specific response code type, or a list of any inconsistent responses

If you’ve imported URL data you can also select whether this data should be included or not, or to pull out a list of matching/missing rows.

Your data is then kicked out in a nicely formatted CSV or Excel file, ready for further analysis or presentation.

Conclusion

Overall I found ScreamingFrog’s Log File Analyser an excellent way to perform an SEO-focused log file analysis job. The fact that it’s built specifically with a technical SEO user in mind was, to me, a huge boon and I’d recommend it to anybody who is looking to undertake this kind of analysis.

A limited free demo (restricted to a single project of 1,000 rows) and full licensed versions are available at https://www.screamingfrog.co.uk/log-file-analyser/. A single license for the full version is £99 per year.

about the author: "Blueclaw's Senior Technical SEO likes canonical tags, URL parameters and long walks on the beach (alright, site migrations). Can typically be found tinkering with the innards of the nearest eCommerce site."
filed under: SEO Tools / Technical SEO | tags: / /