How to get Google to index only part of a Webpage

As of today, there is no way to explicitly have Google index only certain parts of a single Webpage.  I am writing this post in order to show my need for partial page indexing support and to discuss a few possible solutions for this today.  iStock_000005882596XSmallMy need for this stems from the Personalization of WebPages, a feature that makes the web better for people.  I hope that this post can create discussion between us and maybe someone from Google will be kind enough chime in and provide us with guidelines on how to design websites where we want Google to ignore parts of a WebPage in their index.

The premise of my need is that I have WebPages that have content for everyone, and content customized for the specific user viewing the webpage.  I lump this user specific content under the term “Personalization”.

image

Personalization is a powerful mechanism for making the Web more useful for people.  Most of us are familiar with the personalization on Amazon.com where they target items to you based on your viewing and buying habits.

On www.VastRank.com (a College Review Website site that I created) when a user is viewing a College Profile page, they are also shown partial college reviews for other colleges that they may be interested in.  I go deep into the personalization implementation in my Google I/O presentation Using AJAX APIs to Navigate User-Generated Content.  Here is a slide from that deck that illustrates the “personalized” suggestions that are shown to users on the right hand side of college profile pages.

image

The main issue here is that Googlebot is now going to see partial user reviews for other colleges on different college profile pages. This causes two major issues:

  • Duplicate content problems
  • Text from other college reviews show up in the search results for Other College Profile pages in the Google.com Search

Should Google support robots-nocontent?

One implementation that would solve my problem would be if Google implemented Yahoo’s robots-nocontent tag.  I recently tweeted back and forth on this topic with Google’s Matt Cutts on Twitter:

Jon Kragh

“Google (@mattcutts) please support Robots-Nocontent – I have partial content tailored for each specific user, that should not be indexed”

Matt Cutts

mattcutts@jonkragh we looked at how many sites use robots nocontent on the web and it was miniscule, so we decided not to do it.

Jon Kragh

@mattcutts if Google blessed Robots-Nocontent more people would use it-should personalized content (i.e. suggestions) be loaded via AJAX?

So at this point, it looks like robots-nocontent is not one of Google’s top priorities.

robots-nocontent is not right either

Initially I looked at robots-nocontent because it was the closest thing I could find for a partial content indexing solution.  However, robots-nocontent is so general, that it might not be the best long-term solution for the web.  Looking back at what I have conceptually, I have content for everyone, and content for a particular user.

image

A better solution than robots-nocontent would be to create a new tag for this scenario that has more meaning.

robots-user-specific-content

I suggest a new tag that can denotes content that will be different for each user that visits the website: robots-user-specific-content.  For the time being Google could ignore that content, much like an implementation of robots-nocontent would work.  However, having this extra meaning would allow for possible extensions to Google in the future, where a user could search with their identity and get results back from websites with content personalized for that user.

What can I do now?

For now I am stuck!  I am considering loading personalized content through AJAX to avoid having it indexed.  However, this would be a guess on my part on Google’s indexing algorithm.  Will Google index AJAX content?  Will it penalize me because I’m loading different content in a section of a page via AJAX each time Googlebot visits my site?  This is where I would like your feedback, and hopefully some guidance from Google!

Cheers,

Jon

6 Responses to “How to get Google to index only part of a Webpage”

  1. Scott Clark

    I’ve had success by using borderless iframes which pull content from URLs I’ve blocked from indexing via robots.txt. CMSs or static sites can do this.

    It’s a major hassle, but sometimes needed.

  2. Jon Kragh

    Scott, I switched from AJAX to iFrames for this as well. Each technique seems to work. I had to switch to iFrames because of some intermingled FaceBook Connect content (FBML) that was not working when being loading dynamically via ajax.

    I added the noindex robots meta tag to the iFrame content.

    So far my rankings/SERP are looking good.

  3. sonic

    how about looking into these tags:


  4. sonic

    er last one didn’t show the tags:

    <!–googleoff: index–&rt;
    <!–googleon: index&rt;

  5. Hamilton

    What is on this page works perfectly for me and many others :
    http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html

  6. Keshav Arora

    See https://developers.google.com/search-appliance/documentation/46/admin_crawl/Preparing#pagepart. It discuss the usage of and . The content between Off and On will not be indexed. You can use robots-nocontent tag for Yahoo bot. But I could not find a solution to stop other bots named BingBot, Blekko, Ask, Alexa, Amazon, eBay and others.

Leave a Reply