PDA

View Full Version : Hiding your Print version from the robots


SwizzleSkids
01-25-2005, 11:11 PM
I posted a thread here quite a while ago on how to block the search engines from indexing the Print version of your articles. (asp)

At first I just put this in my robots.txt:

User-agent: *
Disallow: PrintArticle.aspx

I hoped this would disallow the robots from indexing the Print pages - but now that my site is live it appears to not be the case. Shortly after launch my site was crawled by the Yahoo Slurp bot. Searching Yahoo for my url brings up the pages that Yahoo has indexed:


Now - if I do a search for a sentence that appears in one of my articles - this one, for example:


A Yahoo search for:
"Approximately 32 million people in the United States wear contact lenses. This is about 12 percent of the total population."

Returns these results:

Notice the Print version of the article showing up #1 - with the real version nowhere in sight. So not only are the Print versions crawled and indexed - they are more optimized due to not having outgoing links. (my guess, anyway)

If the Print version of our articles start coming up for our targeted keywords we'll lose a lot of traffic and click-through.

My next try is to put these meta tags into the PrintArticle.html template:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="ROBOTS" CONTENT="NOINDEX">

In theory - this will stop the robots from archiving or indexing the pages.

Here's hoping...