On average, blogs in the top 10 are 8% more likely to get indexed by both Google and Technorati than they are to be indexed by Google only. Considering that Google already admits to some level of bias in their system (part of the foundation for PageRank is that sites with higher PageRanks get indexed more often), it is a bit worrisome, especially if the trend holds across the whole of Technorati’s universe. If Google favors indexing more popular sites more often, a clear opprtunity for world-live-web search engines like Technorati would be in the long tail of less-often-indexed sites but Technorati seems to ignore that opportunity and concentrate on the top sites. What that will translate into is a direct reproduction of the power laws when it comes to indexing of blogs.
Tristan Louis has the first installment of an analysis of site rankings in Technorati. Interesting stuff. I think there is a simple explanation for the “bias,” that in taking a top-down approach to the Web—that is, by trying to index everything, a very difficult undertaking—the indexer, whether Technorati or Google, becomes dependent on the largest sites to rank links. This is a problem we’ve been wrestling with at Persuadio, taking a bottom-up approach, first finding a broad sample of links relating to a particular topic and digging into the network to get a more granular view of who is actually driving conversations about the topic.
If you focus on blogs at the top of a power curve, you get more power curves due to selection bias. Google provides wider coverage, though still biased by an algorithmic emphasis on sites pointed to by other sites that are, in turn, broadly linked to by other sites, because it is not constrained by Technorati’s focus on blog dialogue. I agree with Tristan that there’s a huge opportunity for Technorati simply in rejiggering its notion of “top” blogs.