Recently, Google decided to toss one of my bigger websites into the so called Google “Sandbox.” This has happened a few times to me for my other sites. Right now, I consider myself to be a little more educated on SEO and Google’s practices in general. I’ve come up with a couple of theories on why I think Google has decided they dont like my sites.
1. Google’s biggest problem these days is duplicated content. The CMS that’s the biggest natural offender is WordPress. I’m not knocking WordPress because I use it on the vast majority of my sites. But, while doing some research, I’ve noticed that the number of cached pages vs. the number of supplemental pages for this particular site is 253-cached/241-supplemental. I’m sure the low ratio of cached vs. supplemental pages is a cause of the site being taken out of the searches.
2. This is more of a how they put my site in the Sandbox than a why it’s there. As my site went into this Sandbox, the website’s PR decided to jump from its PR4 ranking to PR0. It did this for a good 5 days as it slowly settled down to its normal PR4. This is probably how google offsets a site to be unranked for any keywords in the search engine index.
The best way to combat the duplicate content in WordPress is to make a robots.txt file that disallows things like pages, categories, feeds, and any “wp-” labeled files. This usually is a direct cause of having a ton of duplicate content in google. The next would be any static text on any page.
As in any case I’m open to some other peoples experiences with the sandbox and getting out of it.
So after about a day of playing with the duplicate content on my site and adding some various things to my site in question, I’m back out of the Google Sandbox. I’ve figured for this site I’m working with, my problem was the duplicate content, but not only that, almost all the pages google had indexed were part of the duplicate content.
I’ve concluded that my percentage of duplicate content was at about 97% compared to what it’s at now, which is 79%. Roughly anything above the 80% mark means your site will get dropped off the search index. You can figure this out by dividing the number of pages you have in the supplemental index by the number of pages cached.