URL Parameter Handling in Google Webmaster Tools

I find myself spending more and more time digging around in Google Webmaster Tools lately. When I started in SEO I thought that Analytics would be where the action was - that’s where all the exciting traffic and revenue stuff is, after all.
But since moving into a more technical role here at Blueclaw it’s hard to deny that Webmaster Tools has a wealth of useful data and, for my money, it’s one of the best ways to keep track of a site’s interactions with the Big G.
Admittedly, it helps that it’s free and that we ensure that it’s properly configured on every site we work with.
One of the more obscure aspects of GWMT (though I’m starting to see it referenced more regularly these days - which is perhaps a sign of its increased efficacy of late) is the ‘URL Parameters’ section hidden away under ‘Crawl’. It comes with an exciting warning that messing around with it might lead to many of your pages disappearing from search, which is an ominous idea.
Ominous, that is, unless you have a shedload of potentially duplicate pages floating around in Google’s index and a burning desire to trim them down.
You might also find that your site’s ratio of indexed pages is really low, but some of what you do have in there looks low quality - URLs that include question marks followed by odd parameters like utm_source, replytocom, stuff that looks like session IDs and if you’re an Ecommerce vendor you might discover hundreds of variations on what you thought was a pretty simple product page.
If Google is spending ages indexing all these low quality pages, it could well be missing out on some of your quality stuff - or downgrading its opinion of your site overall.
What are URL Parameters?
A URL parameter is essentially a query string that contains data that does not fit conveniently into the hierarchical path structure of the site.
The question mark separates the hierarchical part of the URL from the query string, while the field-value pair separated by an equals sign (e.g. in the parameter utm_source=twitter, ‘utm_source’ is the field and ‘twitter’ is the value) is the parameter itself. These can be strung together by use of the ‘&’ (ampersand) symbol or a semi-colon.
These parameters can be performing various duties on your URLs; they may simply be for tracking (such as with session IDs, Analytics segmentation etc.), performing specific functions such as enabling a print view or, often in an Ecommerce environment, changing the way a page displays by sorting, limiting or otherwise changing the page’s content.
For example, if you’re running an Ecommerce site with several categories and subcategories you may have included ways to order the products available in each one, as well as parameter-based filters to help users find the exact kind of product they’re searching for. This could lead to a situation where you have the following URLs being crawled:
http://www.example.com/boys/shoes
http://www.example.com/boys/shoes?limit=16
http://www.example.com/boys/shoes?order=asc
http://www.example.com/boys/shoes?order=desc
http://www.example.com/boys/shoes?colour=red
That’s quite a few URLs for what is effectively the same content - now imagine this over every category and subcategory on your site, it can really add up.
You might want to try resolving this problem by setting up Robots.txt rules to block indexation of some of the parameters (limit/order for example), or you could try and append ‘noindex, follow’ to certain URLs that include a query string.
Lastly, there’s the solution offered by the rel=”canonical” tag - you could canonise these altered URLs back to the base.
But let’s go with the scenario that you don’t have the knowledge, developer-support or access to the code base to make these changes yourself.
That’s where the URL Parameters section in GWMT is going to come very handy. Equally, I’d suggest that even if you do have these previously mentioned solutions in place, URL Parameter handling should be set up as a double-safe measure.
Categorising & Dealing With URL Parameters
The URL parameters section gives you a powerful suite of tools to deal with duplicate content and indexation issues that this kind of URL can cause. Essentially it gives you a fine level of control over the visibility to Google’s crawler on a parameter-by-parameter basis.
To start configuring the behaviour for a parameter, navigate to the ‘URL Parameters’ section of GWMT and click the ‘Add Parameter’ button.
In the ‘Parameter’ field you want to add the ‘field’ part of the particular field-value pair you want to configure. So in the case of our example URL http://www.example.com/boys/shoes?order=asc we’d enter ‘order’. The next box asks you whether this parameter changes how the content on the page is seen by the user - if you’re unsure, compare it to the version of the page without any parameters (e.g. http://www.example.com/boys/shoes). For now let’s imagine that this parameter doesn’t change the content on the page:

As you can see in the image we’ve chosen from the dropdown box the option ‘No: Doesn’t affect page content (ex: tracks usage)’ and I’ve also expanded the box for ‘show example URLs’ - this’ll show you any URLs that Google has recently crawled which feature the parameter that you’re configuring - this is a good way of checking that you’re configuring the right thing.
If there are no examples of recently crawled URLs, it’s likely that the parameter isn’t in use on your site, or that Google isn’t crawling those particular URLs for another reason (such as your site being locked down while still in development).
If you press ‘Save’ at this point the Parameter will be configured as a ‘Representative URL’. This tells Google that if it locates many that URLs differ only in this parameter (so not the hierarchical part of the URL or the value in the field-value combination), Googlebot will only crawl one representative URL, rather than all of them.

Now let’s look at an example where instead of ‘No’ we’ve chosen ‘Yes: Changes, reorders, or narrows page content’. This expands the box and allows for further configuration. The first box we’ll have to deal with asks the question ‘How does this parameter affect page content?’ and offers the following choices:
- Sorts - Sorts content as specified by the parameter. For example, displays product listings sorted by name, by brand, or by price. (e.g. ?price=75-100)
- Narrows - Displays a subset of content specified by the parameter. For example, filters for only dresses in size M. (e.g. ?size=medium)
- Specifies - Specifies what the page is about (for example, the subject, audience, item number, etc). (e.g. ?subject=shoes)
- Translates - Displays content in the language (for example, English or Klingon) specified by the parameter (e.g. ?language=english)
- Paginates - Displays a specific page of a long article or a paginated series of pages, such as on an Ecommerce site. (e.g. ?language=english)
Check how the content changes once the parameter is applied - does it flick you to the next page? Does it translate the content into another language? Does it simply reorder or filter the items on the page? By comparing the URL that features the parameter to the base URL you’ll likely be able to work this out.
You can now move on to the second dropdown box, ‘Which URLs with this parameter should Google Crawl?’. Your options are as follows:
- Let Googlebot decide - This will allow Googlebot to make the decision whether to index the page or not, essentially its default behaviour before any parameter is configured.
- Every URL - Overriding GoogleBot’s default behaviour, this will attempt to index every page that features this particular parameter.
- Only URLS with Value - The dropdown box in this option will include a list of the values (from the field-value combination) it has encountered, if you select this option that specific value will be indexed but no others.
- No URLs - Google will not attempt to index any URLs that include this parameter, this is useful for anything that you’re concerned will cause duplicate content issues but should be used with care.
When you’re happy with what you’ve configured, hit the ‘Save’ button and it’ll be added to the list of configured URL Parameters.