Saturday, May 19, 2012

Login Page is found in Google with the query "inurl:admin/login.asp"

How is it possible that my page /admin/login.asp is found in Google with the query "inurl:admin/login.asp" while it isn't with the "site:www.domain.xx" query?
I've this line of code in my robots.txt:
User-agent: *
Disallow: /admin/
And this in the HTML code of the page:
<meta name="robots" content="noindex, nofollow" />

You can check on Google Webmaster if the robots.txt is interpreted correctly by Google. You can also request the removal of a URL from the index there.

When you find the URL in the Google search result page (SERP), does it have the same title as found in your tag? And does it also have a description / snippet?
What I think is happening is that Google knows about the URL from a link on your site, so it'll attempt to crawl and index it. However, since it's blocked by robots.txt, it's not allowed to crawl the page, hence it can't see the noindex meta tag that's on your login page.
Since it doesn't know that it shouldn't index the page, Google will add the URL to it's index. However, pages like this tend to only have a title and URL in the SERP, and they almost always don't have a description/snippet. Sometimes the title in the SERP looks like they've crawled the page, but what they're actually doing is trying to generate a title based on the anchor text of the links that are pointing at it.
The sure fire way of having the page not show up in the SERP is to remove the Disallow: /admin/ command, and allow Googlebot to crawl the page and see the noindex,nofollow meta tag.
The noindex command will remove the page from the SERPs, and the nofollow will help inform Googlebot not to give priority to the links that it finds on your login page (this will help maintain your crawl efficiency, but does not guarantee Google won't crawl the links it finds on the page).

No comments:

Post a Comment