As of today, there is no way to explicitly have Google index only certain parts of a single Webpage. I am writing this post in order to show my need for partial page indexing support and to discuss a few possible solutions for this today. My need for this stems from the Personalization of WebPages, a feature that makes the web better for people. I hope that this post can create discussion between us and maybe someone from Google will be kind enough chime in and provide us with guidelines on how to design websites where we want Google to ignore parts of a WebPage in their index.
The premise of my need is that I have WebPages that have content for everyone, and content customized for the specific user viewing the webpage. I lump this user specific content under the term “Personalization”.
Personalization is a powerful mechanism for making the Web more useful for people. Most of us are familiar with the personalization on Amazon.com where they target items to you based on your viewing and buying habits.
On www.VastRank.com (a College Review Website site that I created) when a user is viewing a College Profile page, they are also shown partial college reviews for other colleges that they may be interested in. I go deep into the personalization implementation in my Google I/O presentation Using AJAX APIs to Navigate User-Generated Content. Here is a slide from that deck that illustrates the “personalized” suggestions that are shown to users on the right hand side of college profile pages.
The main issue here is that Googlebot is now going to see partial user reviews for other colleges on different college profile pages. This causes two major issues:
- Duplicate content problems
- Text from other college reviews show up in the search results for Other College Profile pages in the Google.com Search
Should Google support robots-nocontent?
One implementation that would solve my problem would be if Google implemented Yahoo’s robots-nocontent tag. I recently tweeted back and forth on this topic with Google’s Matt Cutts on Twitter:
“Google (@mattcutts) please support Robots-Nocontent – I have partial content tailored for each specific user, that should not be indexed”
@mattcutts if Google blessed Robots-Nocontent more people would use it-should personalized content (i.e. suggestions) be loaded via AJAX?
So at this point, it looks like robots-nocontent is not one of Google’s top priorities.
robots-nocontent is not right either
Initially I looked at robots-nocontent because it was the closest thing I could find for a partial content indexing solution. However, robots-nocontent is so general, that it might not be the best long-term solution for the web. Looking back at what I have conceptually, I have content for everyone, and content for a particular user.
A better solution than robots-nocontent would be to create a new tag for this scenario that has more meaning.
I suggest a new tag that can denotes content that will be different for each user that visits the website: robots-user-specific-content. For the time being Google could ignore that content, much like an implementation of robots-nocontent would work. However, having this extra meaning would allow for possible extensions to Google in the future, where a user could search with their identity and get results back from websites with content personalized for that user.
What can I do now?
For now I am stuck! I am considering loading personalized content through AJAX to avoid having it indexed. However, this would be a guess on my part on Google’s indexing algorithm. Will Google index AJAX content? Will it penalize me because I’m loading different content in a section of a page via AJAX each time Googlebot visits my site? This is where I would like your feedback, and hopefully some guidance from Google!