So, eventually we’re going to have to start deleting old Search Hit referrals. It’s just a fact of life, and will probably be one of the things separating the eventual Premium service from the free service. But to make HitTailing a viable endeavor, we had to pull off something that few other companies have been able to successfully accomplish — and that’s letting you surf your search hit referrals in real-time. Log file data is massive. I mean it’s gargantuan even for one site. That’s what you see happening in the “Ajax data grid” when you step forward and back in your Search Hits.
Services like HitTail that basically record log file data for every pageload of every site they service is a Herculean task worthy of… well, worthy of Google. And as you will recall, Google rolling out Urchin as Google Analytics took their service down for quite awhile. They still managed to record everyone’s data (I think) but the analytics part was VERY slow to update. So, who is little ol’ Connors Communications to attempt to go even one step further?
Well, for one, we’re not Google, and every move we make is not major news. I was at a Google PowWow last night hosted by NextNY to learn about their 500-person operation in New York City, and they commented on why so many of their services use “invites”. It helps throttle the massive popularity surges of their new services. Connors’ popularity surge problem isn’t as severe as Google’s, but still, we have our own methods of throttling the data, plus some very unique approaches to serving up the data, allowing us to keep pace with the sudden rising popularity (for us) of HitTailing. How much data are you surfing with the Ajax datagrids?
We already store millions of records. In short order, it will be billions. And this is not stuff we take the time to “index” and serve out static copies. If you visit your Search Hits tab, click Next and Back, it will have the up-to-the-second new data. And even though you’re just seeing your own site’s data, you’re stepping forward and back through a table containing millions and millions of records–served in real-time, updated right under your nose! There are some technical breakthroughs here innovated by Connors. It’s admirably sustaining the load put on it by the still-growing list of beta tester sites.
In general, Analytics software needs to process and distill log file data down to a form that can be kept long-term. It’s unrealistic archiving your log files forever for WebTrends use, so WebTrends keeps its own optimized database so it can continue to generate reports into the past. It’s the equivalent of “generating an index” and throwing away the original data. But this causes problems drilling down to the granular detail that you actually need for search engine optimization, and the HitTailing process.
HitTailing records every single search hit, and so far, never throws away a single record. We won’t be able to keep that up forever, but it won’t matter, because once your keywords are extracted and moved down the tabs, the original search hit is less important. Eventually losing that data is the price of “free”. For eventual premium subscribers, our plan is to let you surf back through that data to the moment you began HitTailing. But for the moment, performance under the load of millions of updating real-time records is as snappy as it was when there were only thousands of records. Amazing!