Is it still duplicate content with "noindex"?
A good few years ago I started another company to run some local city websites. At one point I had created a few thousand city sites and was making some decent coin from selling ads to local businesses. The idea didn't scale unfortunately - although I could easily sell enough ads to make a good living none of the people that signed up for other cities could sell anything. In truth, I doubt they tried.
Then, one day, Google killed the concept anyway. When they launched their local search they "coincidentally" changed their algorithms to severely affect the "city network" sites like mine. Literally overnight the search terms I checked went from top 3 results to page 3 - if I was lucky!
So, I turned the site over to the community to use as they wish, put some google ads on there to try and pay the hosting and forgot about it. (In fact, Dozing Dogs CMS came from this codebase, so it wasn't a waste). The sites are still going, and new articles are still being published by people today even though I haven't touched the site in years.
So, how did Google kill OurLittleNet? Amongst other things, they introduced a test for duplicate content. If they found multiple copies of content on different sites they assumed that you were trying to spam them, and penalised you. Unfortunately, although I wasn't a spammer I had done exactly what they didnt like.
For example, if I published an article that was relevant to everyone in Atlanta I displayed it on every city site around Atlanta (there 50-150 cities in every decent sized metro area). Effectively there were 50-150 copies of that article. Remember, this was years ago before I'd even noticed SEO spammers who stole content; in today's SEO world I probably wouldn't do the same thing.
But one thing still bugs me a little - and any SEO experts out there please explain this - once I realized my mistake (probably 2+ years ago now) I modifed the code so that only the original article was indexed. All "copies" on other city sites gets this header:
<meta name="robots" content="noindex,follow">
Look at these examples:
Original: http://sandyspringsga.ourlittle.net/Entertainment/ArtCulture/Musicals/AnnieAuditions
Copy: http://roswellga.ourlittle.net/Entertainment/ArtCulture/Musicals/AnnieAuditions
Shouldn't this make Google happy, since only one article is indexed? The 50+ copies are all flagged as not to be indexed. Pagerank is usually about 4/10 for my city sites now, so they're certainly not banned, but I still wonder if the duplicate content is an issue for OLN inside Google's "black box"...
ASP.NET 3.5 Web Hosting: 3 Months FREE + FREE Setup - CLICK HERE!