Fixing Webmaster Tools crawl errors for malformed URLs

18 Jun 2014 Which Way Now?

It's always useful to keep an eye on Google Webmaster Tools. It is capable of highlighting crawl errors which can in turn affect your search rankings on Google. For a website dependent on revenue from organic search traffic, that's pretty important!

Well, I recently noticed some crawl errors in Google Webmaster Tools:

Crawl error notification
"Googlebot couldn't crawl your URL because your server either requires authentication to access the page, or it is blocking Googlebot from accessing your site."

The interesting thing here is the URL that is listed is one that didn't exist - note the %20 at the end of the URL, which is a URL encoded space character. The URL is linked to from external sites, including from one article one a fairly prominent news site.

And that's just one example. There were many different URLs on my site that were linked externally, but were malformed in some way. Some were missing the .html. Some were missing whole sections of the URL. Most, however, were linked from what look like spammy content generation sites. But in this case, with a link from a popular site, I wanted to get their link working.

The fact that the link is broken means I'm leaving a bit of Google SEO juice "on the table". I can't get the source to change their URL, but I know the alternative to this is setting up a "Moved permanently" response for that URL and forwarding to the actual representative URL. In this case I want the URL https://www.blisshq.com/tour/index.html%20 to actually link to refer to https://www.blisshq.com/tour/index.html.

Using Amazon S3 static web hosting this has become quite easy. I simply upload a file called "index.html " (note the space) and set a redirect from that to the intended "index.html" file.

I created an empty file called "index.html ":

touch "/tmp/index.html "

Then I uploaded it:

s3cmd put /tmp/index.html\  s3://www.blisshq.com/tour/

Then I set the redirect in the S3 console:

Set redirect

And that worked!

Thanks to The Nick Page, for the image above.
blog comments powered by Disqus