On data markets and YQL gotchas
I just submitted the second article in my current PayPal X DevZone series on data markets (read the first installment here) and want to share a couple of things I learned as a sneak peak of sorts. I also want to call out a potential YQL gotcha I discovered developing this second article.
As a part of the article, I put together a table summarizing the data market features that matter to me. What I learned from that exercise: There is a lot of variability in supported programming languages from market to market.
If you’re considering different markets for your data needs, you should investigate their available libraries, including third party packages, up front. That way you won’t get any surprises after you’ve already committed to a particular dataset (even worse if you had to pay something for that data). For example, I want to use markets that support Python-based development, and not all of the markets I investigated do. Whatever your language(s) of choice, I would encourage you to read the article once it’s published for more details.
Another thing that jumped out at me: It is critically important that a market provides a good search interface to help locate pertinent datasets. Some of the markets I investigate in my article do, others do not. I’ll let you draw your own conclusions after you read the piece, but suffice it to say that I’m partial to the ones that make surfacing datasets simple and quick.
Now on to the YQL gotcha: As part of my article, I developed a simple example that pulls Twitter user influence metrics out of an Infochimps dataset. It does this using the Infochimps provided YQL influence data table. My original YQL statement was naively:
This returns the expected influence metrics when executed in the YQL console:
Click here for the solution to the YQL problem available in the complete post on the PayPal X Developer Network.