I ran into an interesting situation tonight. A fellow professional in the SEO community launched a story on Digg teaching you how to get a site indexed by Google very quickly–a subject dear to my heart, after having HitTail fully indexed in under a month. The author was advocating the long-standing method of authoring an article and allowing it to be syndicated through all the various methods that can occur these days. And there are many. I disagreed, which was not very well received.
As a blog author these days, you can hardly keep your content from being syndicated. It’s just part of the game, but I’m close to giving the advice to turn off the RSS feeds, or to just give a minimal excerpt–not yet, but close. It’s very easy and tempting to let all your data out, and ping the world every time you post. But there is the additional temptation to deliberately take the “syndicated author” approach, and write your story deliberately for other sites to carry. You then submit them to the many article submission sites that make such articles available to websites as a free service.
My comment was along the lines of just using original content, which was misconstrued by this author as meaning that the articles were not original. I totally conceded the point that HE was using original content. Rather, my message was to the webmasters of the world that THEY in fact would be better off with original content than with pulling down the same articles as everyone else. The duplicate content penalty often referred to by SEOs may not as serious as it’s made out to be today. But it’s going to catch up as the syndication problem worsens. Let me explain.
Today, it’s very difficult for search engines to recognize and differentiate users of syndicated content from the original source. Add the re-mixing of content that occurs by just using the first paragraph of many articles on a single page, and you have a highly effective page for search, but of decreasing originality and value to the visitor.
Decreasing value, you ask? Yes, this syndicate and re-mix approach was fine when people first came up with the technique, because it was an insignificant percentage of overall content on the Internet, and didn’t detract from, and hardly competed with the original source. But today, due to the ease of syndication and coupled with the motivation to do so due to AdSense is flip-flopping those ratios. At the current rate, it won’t be long before the ratio flip-flops, and MOST content is syndicated, and that’s bad news for those sites, because like so many systems that get out of balance, the center cannot hold. The time is coming when syndicated content will out of necessity have to be filtered.
The Google updates name Jagger and BigDaddy greatly dealt with all the DMOZ open directory clones. That was easy in comparison, because the content was more-or-less the same everywhere. Syndication spam won’t be nearly as easy, because they have to deal with more subtle questions.
It used to be that Google bragged about the number of pages of the Internet that it had indexed. A much-discussed spammer recently released a number of pages onto the Internet that alone rivaled the entire size of the Internet from a few years ago. In addition to quantity, the quality of the pages also varies as the ability remix content within pages increases. As the ability to syndicate and remix approaches infinite, the bragging right for search engines will not be how MUCH of the Internet they’ve indexed, but how LITTLE of the Spam-net that they actually serve. And the task is a subtle and difficult challenge for the engines, akin to separating the wheat from the chaff. Let me explain.
Say a single syndicated article is picked up and republished in 8000 locations all over the Internet, and say that article contains a link back to the author’s site. Search engines have at least three questions they must answer…
- Should they let the links count in determining relevancy?
- Should they let the article count at all as part of the Internet landscape?
- If they DO let the article count, which incident of the article should be served in the results?
Like so many of these issues, the fix is made difficult because of the legitimate exceptions. When filtering mirror sites designed to manipulate search engines, Google must be careful not to eliminate Tucows altogether, a heavily mirrored site, but rather to eliminate duplicates down to the one original and most authoritative site. Tucows is not an abuser. It is just an exception to the mirror site rule.
Similarly, with mirrored content. Many authors merely want to be read by as broad an audience possible in order to create their reputation. They release their writing onto the Internet in much the same way that Open Source authors release software. It was in this spirit the Creative Commons licenses were created, and perhaps they (along with some metadata tagging) is part of the solution. But again, we have an exception preventing the filters from being overly Draconian in their rule. Syndicated content must not be an automatic sign of abuse. This is doubly true due to how with blogging it is very easy to accidentally let your content be syndicated through RSS feeds and the ping system.
So, where does this lead us?
If the search engines continue to perform the chore of crawling, filtering, indexing and serving search results, then we may be in the position of making our content LOOK one-of-a-kind to survive the filtering process. This may mean turning off syndication, and expecting your content to be read only on your site. Sure, this cuts down distribution and POTENTIALLY readership. But if the filters are good, it also means that everyone specifically interested in what you have to say is even more likely to find you. The signal-to-noise ratio improves.
Do I recommend turning off your RSS feeds? Do I recommend turning off the pings? Not yet. If anything, consider cutting back your data feed to only an excerpt. But this is bad for people reading your blog over Bloglines, or people such as me reading over my phone. And a general trend towards cutting back RSS feeds I think is a bad thing. Once again, the real solution must come from the brain trusts that now work at the Search capitals of the world. Can Google, Yahoo and MSN intelligently recognize the original source or “epicenter” of syndicated content? Or maybe it’s an opening for a new player in the search space?
Can anyone de-dupe to exactly the correct authoritative source? What will the outcry be when all those pages drop out of search?
It’s not an easy problem to fix, but like email spam, it’s one so bad that it threatens to crush the very system.
For now, my recommendation would be just to keep on HitTailing. The recommendations that HitTailing suggests are going to be off the beaten track, therefore keeping you off the radar of many of the auto-search-and-syndicate bots. YOUR site will start to have a footprint markedly different from sites that only regurgitate other peoples’ content. When these theoretical anti-syndication filters come into play and the next BigDaddy occurs, you’ll more likely be one of the people standing by saying you don’t know what the big deal is about.