Fixing Webmaster Tools crawl errors for malformed URLs
18 Jun 2014
It's always useful to keep an eye on Google Webmaster Tools. It is capable of highlighting crawl errors which can in turn affect your search rankings on Google. For a website dependent on revenue from organic search traffic, that's pretty important!
Well, I recently noticed some crawl errors in Google Webmaster Tools:

"Googlebot couldn't crawl your URL because your server either requires authentication to access the page, or it is blocking Googlebot from accessing your site."
The interesting thing here is the URL that is listed is one that didn't exist - note the %20
at
the end of the URL, which is a URL encoded space character. The URL is linked to from external sites, including
from one article one a fairly prominent news site.
And that's just one example. There were many different URLs on my site that were linked externally, but
were malformed in some way. Some were missing the .html
. Some were missing whole sections of
the URL. Most, however, were linked from what look like spammy content generation sites. But in this case,
with a link from a popular site, I wanted to get their link working.
The fact that the link is broken means I'm leaving a bit of Google SEO juice "on the table". I can't get the
source to change their URL, but I know the alternative to this is setting up a "Moved permanently" response
for that URL and forwarding to the actual representative URL. In this case I want the URL
https://www.blisshq.com/tour/index.html%20
to actually link to refer to https://www.blisshq.com/tour/index.html
.
Using Amazon S3 static
web hosting this has become quite easy. I simply upload a file called "index.html
" (note the space)
and set a redirect from that to the intended "index.html
" file.
I created an empty file called "index.html
":
touch "/tmp/index.html "
Then I uploaded it:
s3cmd put /tmp/index.html\ s3://www.blisshq.com/tour/
Then I set the redirect in the S3 console:

And that worked!
Thanks to The Nick Page, for the image above.