Skip to content

Selling Digital Goods in Data Markets, Part 4: Data Subsets and RESTful APIs

April 13, 2011

This is the fourth and final installment in my data markets series. I’ve previously introduced data markets and their features, showed you how to select a market and begin extracting data from it, and explored how premium dataset purchases are handled in two commercial markets. The third article also showed you a simple Python-based example of how to integrate pay-per-dataset PayPal X Embedded Payments into a data market.

This article will focus upon steps required if we were to build a commercial data market out of the previous article’s example. We will discuss micropayments for subsections of a given dataset and what some of the key resources and calls in a general purpose data market REST API might be.

Paying for data subsets via micropayments

The previous article mentioned the DevZone analysis dataset created for a separate series. The first couple of dataset entries (rows) are shown below to give you a feel for each of the columns in the comma separated data. Click here to access the DevZone dataset data dump as CSV.

Date and time published Article or blog post? Title URL
7/21/2010 10:39:00 article A Brief History of Micropayments
7/21/2010 22:03:00 article Power to the People: What You Can Do with the PayPal APIs

In the previous article of this series the DevZone dataset was used as an example of a per-dataset purchase. Per-dataset purchases are well and good if the level of granularity you want to provide is at the entire-dataset level.

But what if in the above example you knew that you only wanted data for articles related to micropayments? The first article would be pertinent, the second one not. In that case you’d like a data market that allows you to purchase data subsets that pertain to your needs rather than only allowing you to buy entire datasets, all or nothing.

One way to provide such support would be via REST+JSON with subsets of data purchased on a per API call or database row/item access basis. I’ve written quite a bit elsewhere on about REST+JSON as part of what I call the “one true web path“. Please click here to read some of my previous posts defining the path and why it is a good thing.

Once we’ve bought into a REST+JSON based data solution, we next need to figure out how to handle per API call purchases. As we discussed in the previous article, Embedded Payments allow data vendors to lower their transaction costs making micropayments financially viable. Implementing this would allow for the sale of individual data subsets, even single data rows or items.

So what would we need to add to the simple example in the previous article to enable per-row and -item data purchases? At a minimum:

  • A search interface for locating pertinent datasets (see more on the API below)
  • Once the user finds the data they want to buy, they need to be able to query it to locate any particular data subset of interest (again, see below)
  • For each query that returns data, our micropayments-enabled data market needs to either prompt the user to pay (if exploring directly) or else deduct microfunds from the user’s account
  • Once payment is made (or deferred), data subset access or download should be permitted

A key component of realizing our data subset micropayments vision is a generalized data market REST API. Let’s move to that next.

Specifying a RESTful data market API

A well designed RESTful web API should use OAuth for authentication and specify resources that may be manipulated via HTTP GET, PUT, POST, and DELETE methods. You can explore how OAuth works through related resources (click here). In this article we’ll focus upon the REST API itself.

For an example, see the Twitter REST API‘s list of resources on the right side of the Twitter API Documentation page. You’ll note a large number of resources such as timeline, tweets, user information, trends, and much more that can be accessed and manipulated using the Twitter API.

What then are the resources we should provide in a data market RESTful API? For starters, we need to provide access to dataset and entry information. Imagine a SSL-secured base REST API service URL of:

where version is a version number of the API we’re trying to use (for example, this might be “1” or “v1” for the first version of our data market API). Given that base URL we might specify dataset and entry related API calls such as:

  • GET – returns JSON representation of one or more datasets’ key information (name, size, date added, number of entries, list of entries by their unique IDs, etc.) based upon id (datasets’ unique ID in the data market) and/or containing terms passed in via request parameters
  • GET – searches for datasets similar to the one specified by request parameter id, returning results as JSON list of datasets by their unique IDs; other parameters may be useful to allow for searching by date added to the market, size of dataset, minimum number of entries, etc.
  • GET – return representation of one or more specific data entries’ key information (name, size, date added, array of key:value pairs containing entry data, etc.) based upon request parameter id (entries’ unique ID in the data market)
  • GET – searches for data entries similar to the one specified by id and/or containing terms specified in request parameters, returning results as JSON list by unique IDs

For each additional resource you wish to expose you need to provide a RESTful interface to access it. For example, besides dataset and entry related calls, we may also wish to provide datamarket level calls, e.g. GET might return key overall data market information (number of datasets, size of data stored, etc.).

For more information on RESTful API design, especially if you have SQL experience but haven’t implemented a RESTful API before, see Apigee’s “REST API design for SQL programmers” slideset (click here to access).

Click here to read the complete article on the PayPal X Developer Network including a discussion of key takeaways from this series and what we might expect to see from data markets in the future.

Infochimps Twitter influence metrics dataset API console

From → Uncategorized

Comments are closed.