Another area people believe (and worry) that we’re going is in aggregating keywords for vertical industries, and selling it. We’re not. This site runs on trust. How can you be assured? It’s a Nash Equilibrium. If we violate your trust, we both lose. It’s better to play fair. I won’t bore you with the details, but Google on the Ultimatum Game to get an idea of why this works. The only people who can sell your data are those who you’ve entered an agreement with (comScore, Alexa, etc.), or those who don’t need your cooperation in any way and can still get the data (your ISP, HitWise, search engines themselves, etc.).
Anyway, we really don’t have as much data as people think. And there are SO MANY other ways this data is being aggregated and sold, that we won’t touch it–just like we’re not trying to enter the already crowded analytics space.
HitTail is all about collecting as LITTLE of THE MOST IMPORTANT data as possible. In other words, we’re opting to have massive amounts of users, and minimum amounts of data–storing only what is most important to provide actionable recommendations.
This is quite different from other data gathering systems. There is absolutely no cross-site “fertilization”. Your keyword data is your own, and your competitive intelligences comes only from the fine details of visits to your own site. When this visit satisfies a HitTail criteria, we store it. Otherwise, we don’t. So who does? And who is selling it to your competitors?
The people who aggregate keyword data across industries and sell it can take one of three approaches (as far as I can tell). First, they can run snoop-ware on your personal computer that reports back the details of your searches and site visits. Alexa, comScore and even the Google Toolbar (with PageRank turned on) fall into this category. You know you have this software running if you run a network traffic sniffer and watch the traffic on port 80 on your own PC’s IP address. If you see communication go out to anything other than the site you’re visiting, you know that you have this sort of software running. If you want to see this, find the program called WinDump, and execute this command (for example). It will show you the Google Toolbar chatter.
windump -i2 -s1000 -A host toolbarqueries.google.com > test.txt
Do some surfing, then look a the text file! Data collected in this fashion is very skewed, based on those who “voluntarily” installed the software. For example, Webmasters regularly install Alexa to see how well their sites are ranking in the top-100,000 sites. But Webmasters also disproportionately visit such sites as Webmaster World, artificially inflating its rank as a proportion of all the world’s sites.
The second and probably the most effective approach is data mining by Internets Service Providers (ISPs). ISPs have privileged access to large swaths of wonderful statistical data, because THEIR network sniffers can monitor the traffic of everyone who uses them as an ISP, or any traffic that happens to be traveling through their routers. It is a much less skewed cross-section than the snoop-ware approach. But because ISPs are mostly regional, the data is still skewed by geography. The way to solve this is to use the data from many ISPs. But ISPs don’t cooperate in this fashion, so it takes outside businesses who are specifically in the business of brokering such data to cut deals with many ISPs. HitWise is probably the most popular example of this sort of company. And if I were to recommend buying aggregate data from some company to get an overview of keywords for writing for the long tail of search, this may be what I recommend. But why pay, when you can get it for free? And that brings us to method #3.
In addition to snoop-ware, and snooping ISPs, the search engines themselves know your keyword searches (of course!). The popular products here are all over the board, from WordTracker to the keyword suggestion tools built into Google AdWords and Yahoo Search Marketing. But the most interesting (and free) development recently has been Google Trends, which allows you to enter a keyword and see how much relative traffic exists on that keyword versus other words. I’ll write on this much more later, because it’s a great tool to identify long tail keywords that are “on the edge” of being worth it.
But back to WordTracker, which is hugely popular in this space. WordTracker collects its data from the search engines owned by the third-tier meta search sites owned by InfoSpace: DogPile, MetaCrawler and WebCrawler. So THEIR data is skewed based on the profile of people who choose NOT to use Google, Yahoo or MSN in this day and age. It’s a nitpicking, but still critical and under-addressed point. Yes, WordTracker data is skewed. Perhaps the best data from the search engines comes from the keyword suggestion tools built into AdSense and Yahoo Search Marketing, because it is from the horse’s mouth.
So to recap, keyword aggregation can be done with snoop-ware, snooping ISPs or the horse’s mouth (the search engines, themselves).
With all these players in the keyword aggregation and resale space, it’s hardly worthwhile for us to be yet another. Likewise, we don’t want to be yet another analytics package in an already crowded space. Add to that the fact that helping you write for your own long tail of search relies on trust. And what you end up with is a unique partner in your mission to get and keep customers–a tech-savvy partner who helped launch Amazon, Priceline and Vonage.
It costs us very little to aid low-traffic sites and help build them into high traffic sites. Our approach is so difficult to understand that it takes an 8-minute demo to explain (in this beta soft-launch phase). And even then, only the elite early adopters will really get it–until Chris Anderson’s gospel goes mainstream. So, jump onboard and join HitTail – the long tail keyword tool that really works!