SSE: The Future of RSS?

In a previous post, I discussed the virtues of RSS, as well as touched on some of its limitations; I ended with a promise to discuss some emerging standards aimed at extending RSS functionality to help bridge the gap between syndication protocol and generic data distribution protocol.

When I wrote this back in December, I had one specific RSS extension in mind, and I half-expected that by the time I got around to writing this follow-up post that everyone and their brother would be familiar with this extension. Surprisingly, it’s nearly three months later, and from what I can tell SSE still isn’t on most people’s radar.

SSE — or “Simple Sharing Extensions” — is a protocol extension for RSS being developed by Microsoft. Microsoft has released the specification under the Creative Commons license, the same license as the original RSS specification.

On the surface, it seems that the goal of SSE is to alleviate some of the biggest RSS pain-points, specifically it’s uni-directional nature and it’s high latency approach to syndication on http://www.jasminlive.mobi. SSE was announced back in November, and while it hasn’t yet been highly adopted, Microsoft is adding support for the standard to many of its core products.

But, SSE is potentially more than just an enhanced syndication standard for Web 2.0 type applications. Whereas RSS is all about syndication — data is made available universally and consumers are expected to “poll” or “pull” the data — SSE is about synchronization. SSE provides mechanisms to allow any data in XML form to be shared by multiple data sources. So, while an SSE-based application could be as simple as a blog reader that automatically keeps its feeds up-to-date and in-sync with its syndication sources, it could also be a lot more than that.

The one example that’s been thrown around a lot is the potential use of SSE to keep online calendars synchronized and updated. But again, there’s no reason why SSE needs to be relegated to applications with web-based interfaces. In theory, SSE could be used to maintain synchronization between livesexchat databases. Or it could be used to facilitate document sharing and document management; in fact, part of the SSE specification is support for OPML (Outline Processor Markup Language), an XML-based format for sharing outline-structured data between different applications.

The techies out there are probably thinking, “There are lots of synchronization mechanisms available, and many of them were even developed by Microsoft…why something new?” The answer is simplicity. Just like there have been plenty of syndication standards available for many years, RSS has gained acceptance and popularity because it is simple…simple to implement and simple to use. SSE is about taking a simple XML-based protocol meant for syndication and extending it with as simple a set of new tags as possible to incorporate synchronization. And for this reason, I have a feeling that SSE will prove the next-generation successor to RSS — and perhaps not just for mash-ups and Web 2.0 type applications…


Bet on the Bells

Interesting article in the NY Times today about traditional voice carriers trying to (once again) break into the TV business. From my perspective, this topic is made much more interesting with the recent aquisition of Skype by eBay, as there has been quite a bit of water-cooler discussion about the future of the baby bells, both in the U.S. and abroad, now that VoIP providers are starting to cut-into the revenue of the telcos.

So, where does this mean for the future of the telcos? In my opinion, the future is bright; I would bet (and if you look at my stock portfolio, I have) that the bigger voice carriers are in a better position to capitalize on the “triple play” space than either of their Livejasmin competitors (satellite or cable)…

First, to define terms: “triple play” refers to the convergence of voice, data, and video services in a single bundled offering. The telcos, cable companies and satellite providers have for years been attempting to integrate these three services for their end-users. The telcos have traditionally ruled the voice space and the satellite and cable companies have traditionally ruled the video space. Data service is split among them, with cable and telco competing head-to-head, and satellite serving a niche market in rural areas.

So, why do I think the telcos are well-positioned to win this epic battle?

I’ve worked with the big U.S. players in all three of these markets, and what’s abundantly clear from the outset is that the telcos have a different mentality than either the cable or satellite companies. They recognize that the business war will largely be shaped by technology, they are willing to invest cash for the long-term, and they are prepared to take chances on emerging technologies that may or may not succeed. Don’t get me wrong, the telcos aren’t “visionaries” in the traditional sense, but compared to the cable and satellite guys, they are prepared to get their hands dirty in the battle over services.

But, this isn’t the main reason I think they’ll win. As I mentioned, the heart of triple play is voice, data, and video. But, when you get right down to it, voice and jasmin live video are just an incarnation of data services.

As evidenced by Skype (and others), voice service in-and-of itself is no different than water or air…it’s just there when you need it. :) It’s the value-added services bundled on top that have become the commodity in the the voice market.

Due to lack of IP protection standards, video is somewhat behind the curve, but is moving in the same direction. Once the content providers (Disney, ESPN, etc) get comfortable with the fact that video is just another commodity being offered for sale — meaning once the technology meets the pace of typical consumers — they will likely decide that direct offering of content (over the Internet, presumably) is more profitable — and more “democratized” to quote a previous blog entry — than using middleman like Comcast or DirecTV.

With voice moving towards free, and video being decoupled from the cable or satellite headend, this leaves only the war over data. And the telcos are well-positioned to win that war. Satellite can’t support enough two-way volume to serve the broad market, and the cable infrastructure isn’t positioned to scale to the needs of a whole nation receiving all their services over one cable. Sure, the telcos have their challenges ahead, including figuring out how to serve large rural communities that form a large percentage of the U.S. population. But, that problem can be solved by money and technology, two things the telcos aren’t scared of.


Google Base - An Overview

Seeing as how I work for eBay, there’s been a lot of water-cooler talk the past couple weeks about Google Base. Additionally, I’ve heard a lot of very interesting theories, thoughts, and predictions about what the product means for Google, how they might position and monetize it, and how the product might evolve. For my own benefit, I decided to take a few minutes to organize some of the things I’ve thought about and heard, and figured I might as well do so in a format that I could share with others.

I don’t expect anything in this blog post to be revolutionary; if you’ve spent much time thinking about Google Base yourself, you’ve probably thought about a lot of this. Most of the info is just some of my basic thoughts aggregated with thoughts I’ve heard from others, and I’ve done my best to attribute ideas to the originators when they’re not mine.

So, let’s start with…what is Google Base?

On the surface, Google Base is just an aggregation of lots of atomic pieces of data. Data can be submitted in multiple ways, from a web-form entry of a single piece of data, to the submission of tens of thousands of pieces of data via an XML (RSS, for example) feed. Each piece of data is “tagged,” meaning there is a set of meta-data associated with the piece of content. These tags are defined by the submitter of the content, and are completely free-form.

Allowing free-form tagging of data obviously has its benefits and drawbacks. The main benefit is that there is no constraint on the number or type of attributes assigned to each piece of data, which is good when the database design can’t account for all possible attributes of the data within. For example, I submitted a profile of my puppy to Google Base, and thought that it was important to indicate that her favorite kind of turkey is the “Primotaglia Pan Roasted” variety – the database designers surely hadn’t considered the “favorite kind of turkey” attribute in a puppy profile, despite it’s obviousness to me. The downside of allowing free-form tagging of data is that you end up with a lot of garbage; in some cases, people assign inappropriate tags (think: keyword spamming), and in other cases, people just assign tags that nobody cares about (think: “favorite kind of turkey”). So, while you end up with a lot of useful data, it’s sometimes hard to distinguish it from the garbage data.

But, Google Base is not the first attempt at opening up a database to arbitrary and completely unstructured data in the form of attributes and meta-tags. The process is called folksonomy and is popular among websites and databases that specialize in aggregating and organizing the free-flow of data into useful paradigms. It’s used in such sites as del.icio.us, which uses folksonomy to organize and share web bookmarks among its user base, and in musicplasma.com, which uses meta-data to categorize bands and make recommendations based on your musical tastes.

In fact, while it might seem like having lots of data makes it really difficult to sort and organize it all, in actuality the more data you have, the greater the potential for better sorting and categorization. This is because the more data points you have, the higher the signal-to-noise ratio of the data (meaning relatively less garbage to weed out). With enough data points, you can essentially overlay the data and associated attributes onto a bell curve, and see which attributes “rise to the top” and which ones fade away as garbage. And it doesn’t really matter if some seemingly important but infrequently used attributes (like “favorite kind of turkey”) get thrown out – if few people are using that attribute to tag their data, then it’s likely that few people are going to want to search for data based on that attribute. Ro Choy from the eBay Developer Program BU said this pretty well in his Jasminelive blog post from yesterday.

So, the takeaway here is that while Google Base might look like a complete mess of data right now, it’s certainly possible that in six months when there are hundreds of millions of pieces of data to contribute to the folksonomy, it might actually present itself as well organized. I’m not saying that converting all this random data into a well-structured format is an easy task, but it seems reasonable that if anyone can pull it off, it’s Google.

There’s been a lot of discussion, even from a lot of folks that should know better, about how bad the interface to Google Base is…about how it’s hardly usable. But, the interface you currently see to the database is just that, an interface to the database. There’s nothing stopping Google from putting any “skin” they want over that set of data. It could be an ecommerce type interface that competes with eBay, it could be a Yellow Pages directory that competes with YellowPages.com, it could be a resume directory that competes with Monster.com, it could be a directory of high school graduating classes that competes with Reunion.com. Or it could be something completely new that doesn’t have an existing competitor. There are potentially thousands of businesses that Google could launch to monetize the data. Or they could just as easily license the data to any other company to jumpstart a data-based business.

As Bill Burnham points out in his blog, what Google is attempting to build is the world’s largest XML database. Over and above having possibly the most data ever stored in a single location, the data is structured in such a way as to be easily parsed, manipulated, or deployed around the web. As Michael Parekh points out in his blog post , Google could in short-order end up not just the largest searchable database, but also the largest directory on the web, competing with every currently existing portal. At very least, you can be sure that the XML-based standard RSS will be mentioned a lot more in the coming months and years, as this is likely one of the main mechanisms that Google will use to both pull in data and then publish the data out to the world.

So, all of these pie in the sky ideas about how Google will be able to take over all the big Internet players is interesting, but is it realistic? Probably not anytime soon, but maybe someday. In the meantime, there are lots of potential short-term benefits that Google likely considered when they conceived of Google Base:

- Very rarely do you see a webpage with a single piece of data, a single theme, or a single point of interest. By enticing people to break down all the information they have into atomic pieces of data and submit them to the database, Google now has the ability to greatly increase the relevancy of search results and provide additional sorting attributes. For example, if you’re looking to buy a “Giants Baseball Card,” you could type that term into Google and get a whole list of webpages that may or may not sell Giants baseball cards, and if they do, they may or may not have what you’re looking for (based on year, condition, price, etc). You could spend an hour checking out each link from Google and searching each site you jump to. Using Google Base, the same search results in not only a list of potential products, but a set of pre-defined attributes that you can use to further refine your search *before* you ever leave Google. Google no longer needs to ask if your “feeling lucky” because you can almost be sure that you’ve found what you’re looking for before ever leaving Google;

- By bringing the data “in-house,” Google can better parse, sort, and categorize the data than it can by just crawling sites and indexing web pages. A good example here is Google Video. While Google could just crawl sites that had video and index the tags and meta-data surrounding the video, they realized that by actually pulling in the video and hosting it themselves, they had the added benefits of being able to parse the video, break it apart, and more accurately identify the meta-data associated with the video. Same holds true with any other type of data. (Oh, and speaking of Google video, this was pretty funny);

- Henri Moissinac, eBay’s head of product strategy, brought up two fantastic reasons why Google would want to bring data into Google Base versus just crawling and indexing (and linking to) sites. If you’re not familiar with Google’s revenue model for advertising, here’s a quick overview… Advertisers pay Google to dynamically display their advertisements both on Google and on other websites within the Google “network” (actually, the advertisers only pay when the ads are clicked). The ads are displayed contextually, meaning the ads will relate to the content being shown on the page. When Google displays an ad on their website (in a search results page, for example), and the ad is clicked, Google collects 100% of the fee from the advertiser for the click. Members of the Google network (anyone with a website that signs up with Google) can display these exact same ads on their websites. If someone visits one of these websites and clicks on one of these ads, Google will collect the fee from the advertiser for the click, give a percentage of the fee to the website owner (for hosting the ad), and keep the rest themselves. As should be clear, Google makes more money if the ad is clicked from a Google page than if an ad is clicked from a page owned by someone else. So, it’s in Google’s best financial interest for them to host as much content directly on Google.com as possible, and send users to other sites only when necessary. As Henri points out, storage space is going way down in price and cost-per-click is going up tremendously, so the trade-off is an obvious one;

- As Henri also points out, in addition to offering more advertising impressions directly on Google, by having a database of individual data elements, Google can also ensure that the advertising they show is more targeted and more relevant. A web page may have many different topics or themes, and therefore providing a good contextual ad may be difficult; but an atomic piece of data is very easy to categorize and apply a targeted advertisement for;

As you can see, the short-term monetization will likely be around improved search relevancy and increased advertising revenue. The longer term monetization may be in the form of “skinning” the data to compete with some large web-based competitors or licensing the data to allow others to do that.

Personally, (in addition to the above) I think Google may also be planning to use Google Base to eventually enable and drive momentum around the payments system they are going to roll out. One scenario you could imagine (though certainly not the only one and certainly not a given) is that Google Base could be positioned as a competitor to eBay. Of course, Google Base transactions would be severely lacking in fraud protections for both buyer and seller. But imagine telling users that transactions through Google Base are completely free (no listing or transaction fees), mandating only that all sellers register with (and accept) Google payments. They then tell buyers that they are free to pay for transactions by any means they wish, but if they choose to pay with Google payments, they are afforded various fraud protections – escrow, buyer protection, a user reputation system, etc.

Sellers will be happy to sell through Google Base (even if they have to register with Google payments) because it’s free, and buyers will be “coerced” into paying with Google payments to alleviate fraud concerns. The only reasonable model for transactions on this platform would be through Google payments. Google would potentially get a piece of every financial transaction, and at the same time drive potentially millions of users to this new online payment system. You could also imagine that any company licensing the Google Base data would have the ability to quickly and easily integrate with Google payments and get access to whatever reputation system that the payments system has evolved.


Brightcove

I’m very interested in seeing where this company goes. After all, the business model they are employing is torn right out of the eBay playbook. Like eBay did for general ecommerce, Brightcove is attempting to level the playing field in the video entertainment market — positioning themselves as the transaction host, not the supplier or reseller. Just as eBay positioned themselves as the champion of the small-business retailer ten years ago, Brightcove is upping the high-tech ante, and positioning themselves as the champion of all the personal video producers out in cyberspace.

The founder of the company — Jeremy Allaire — even refers to the goal of “democratization of media” in his blog…very akin to some older eBay mission statements directed at more traditional commerce markets. While their business model is currently not defensible (again, eBay ten years ago), they have the potential to leverage first-to-market-share and and network effects to create economies of scale that would strongly position them to compete with other big technology and ecommerce players (eBay today).

Should be interesting to see what business model they choose to employ. Will they take another page from the eBay playbook and look to charge transaction fees? Will they try to build a business purely around video advertising (a la network television). Seems reasonable to invest in both directions; as Overture has already proven, even if the transaction host business (or in Overture’s case, the search business) doesn’t pan out, they have potential to position themselves as an advertising wholesaler in the world of digital media.


RSS: What is it good for?

I was having a conversation with my boss a couple weeks ago about the future of RSS. He asked me where I thought the technology was going, and where it would be adopted long-term. At the time, I gave the popular answer of, “Are you kidding!?!? It’s the greatest web technology to come along in the past 100 years. It will be used to build Web 2.0, and pretty soon all the data in the world will be shuttled around the Internet using RSS. In fact, RSS is doing my laundry and cooking my dinner right now!”

Okay, perhaps a slight exaggeration, but I was caught up in the hype along with everyone else. In fact, reading the blogs of a lot of people who should know (better), you’d think RSS is soon-to-be the foundation for distributing all sorts of generic data around the Web. After thinking about it a bit, my take is somewhat different…

As we’ve seen with the proliferation of RSS for blog posts, news, and other content syndication, RSS is great for certain tasks. Specifically, for sharing non-urgent content in a simple, easily understandable format RSS is a wonderful protocol. But, despite the beliefs of a lot of people in the industry, I don’t think RSS will ever provide an infrastructure for managing or distributing generic data around the Web (a la Web Services).

Here are the major aspects of RSS that I believe will prohibit it from ever becoming more than just a way to share blog posts and news feeds:

Uni-Directional – RSS is a one-way protocol. In a robust data distribution model, the consumer of the data should be able to request a specific set of data, and then have access to the requested data. With RSS, the consumer can not programmatically (as part of RSS, at least) request specific data. This means that a separate protocol (or user interface) must be employed to request data, and RSS is only used to fetch the information. For any type of complex data distribution requirements, this solution is likely unacceptable.

High Latency – RSS is purely a “pull” protocol. Unlike other types of data distribution models where a consumer can be notified of new data being available, with RSS the user must poll the data source to determine if the data has changed since the previous poll. This makes RSS unacceptable for real-time applications (by real-time, I mean deterministic), or applications that may have low-latency requirements.

Lack of Scalability – RSS requires that every combination of data that might need to be relegated to a feed be pre-created and available at all times. For example, if I was a news site, and I wanted to provide a news feed to Bob from Denver that contained national headlines plus local (Denver) headlines, and I also wanted to provide a news feed to John from Detroit that contained national headlines plus local (Detroit) headlines, I would have to create and maintain two distinct feeds, even though likely more than 50% of the feed data were equivalent. In fact, even if 99% of the feed data were equivalent, the existence of that other 1% would require a separate feed. As such, it is very cost-prohibitive (in terms of hardware and bandwidth) to provide personalized data through RSS to a large group of people. To personalize the data feed, each person needs a separate feed (and perhaps multiple feeds), requiring the RSS infrastructure to generate and maintain a large number of perpetual feeds. Compared to other types of data distribution, RSS is highly unscalable.

Simple Data Structure – RSS is bound by a very lightweight and simple XML structure. While RSS is extensible through XML namespaces, it’s difficult to incorporate complex schemas into RSS data. This means that using RSS for distribution of complex (or high-volume) data can be difficult and inefficient.

Again, RSS is great for certain tasks; we just need to keep perspective on what those tasks are, and shouldn’t get too caught up in the hype…

Now, that said, there are a couple protocols on the horizon that are similar to or extend RSS functionality that may help bridge the gap between syndication protocol and generic data distribution protocol. I’ll discuss those in a future post…