Skip to content

February developer highlights

Another month, another flurry of developer tips, examples, and payments news.  Let’s review, shall we?

I published the second part of my two part “Divining DevZone Insight from Filtered Feeds and Deep Pages” article series in February.  The first article in the series delved into the sources of content and took the initial approach of using YQL within Yahoo Pipes to get at blog post, article, and book excerpt data.  It also uncovered problems with some initial assumptions about the RSS streams providing the data.  This second and final article in the series uses the RSS property numItems and some Python code to solve the remaining issues and analyze the stream data.

This data is used to produce topic-filtered DevZone content bundles, a sitemap of sorts.  It also discusses the topics via a set of graphs showing topic mentions across DevZone content.

Top 20 mentioned DevZone topics

It you’re interested in using YQL in a programming language, especially in Python, I’d recommend checking this article out.

February blog posts included:

Read the complete post on the PayPal X Developer Network to access links to related development and payments news.

Notes from the week of 2011-02-27

Cover of 'PayPal APIs:  Up and Running'

PayPal X Platform

Big data

Wireless and mobility

APIs and development

Personal things

Running

PayPal, Apple, and Google fight for your subscriptions

With Apple set to announce iPad 2 on March 2nd, I think now’s a good time to talk a bit about several recent announcements around online subscription payments.  Specifically, Apple’s new App Store subscription service, Google’s immediate counter with One Pass, and PayPal’s public take on the whole thing.

Apple announced long-requested support for App Store subscriptions on February 15th.  Here’s how Apple describes the capability in their press release:

Subscriptions purchased from within the App Store will be sold using the same App Store billing system that has been used to buy billions of apps and In-App Purchases. Publishers set the price and length of subscription (weekly, monthly, bi-monthly, quarterly, bi-yearly or yearly). Then with one-click, customers pick the length of subscription and are automatically charged based on their chosen length of commitment (weekly, monthly, etc.). Customers can review and manage all of their subscriptions from their personal account page, including canceling the automatic renewal of a subscription. Apple processes all payments, keeping the same 30 percent share that it does today for other In-App Purchases.

That last bit about 30% has been something of a sticking point for many.  And there have been others.  Many of them boil down to a general feeling that Apple is charging too much while giving publishers too little back.  A major point of contention:  Access to subscriber information for publishers.

Read more about developers’ dislike of the new App Store subscriptions details via O’Reilly Radar and Ars Technica.

Amidst the angst from Apple’s announcement launch, Google went public with their One Pass service.  Google’s one line description of One Pass lays out the key point of their proposition right up front:

Google One Pass is a payment system that enables publishers to set the terms for access to their digital content.

Here’s the high level introductory video from the main One Pass site:

Google is clearly targeting publishers not happy with Apple’s model.  Whereas Apple is seen as dictacting terms from on high, Google emphasizes that their system “enables publishers to set the terms”.

Click here to read the rest of the post on the PayPal X Developer Network, including a look at PayPal’s strategy and why it just might beat Apple and Google at the e-subscriptions game.

Web API power tools: PayPal transactions via YQL

In response to my ongoing MoSoLo series‘ posts on web API power tools including YQL, PayPal Developer Evangelist Praveen Alavilli pointed me to some recent work he’s done to support PayPal transaction search including details via YQL:

Bill – you should look at the two new yql tables that I’ve added to our github account: https://github.com/paypalx/yql-tables – they provide simple interfaces to PayPal’s Transaction Search and Details APIs. You can combine them with other APIs/Web Services like google/yahoo maps to map the zip codes or generate reports of the items being sold, etc.

These YQL interfaces currently support querying into the PayPal sandbox test environment.  Praveen has stated he would consider adding a flag for developers to use to indicate whether they want to make a sandbox or production environment call.  If that is added, I’ll post a note here.

To see these new transaction search and detail capabilities in the YQL console, load the console with “Show Community Tables” enabled (click here to load it now).  Once you’ve done that, scroll down in the “Data Tables” listing at lower right and find the “PayPal” options:

PayPal transaction bindings in YQL

From there you can select either paypal.transactions or paypal.transactions.details for a template YQL call to request information back from the PayPal X Platform on PayPal transactions or transaction details, respectively.

Click here to read the complete post on the PayPal X Developer Network including example YQL statements to query PayPal’s server.

Notes from the week of 2011-02-20

PayPal X Platform

Apigee API console for Facebook

Big data

APIs and development

Personal things

  • Norton SONAR is automatically removing my git executable and there appears to be nothing I can do save turn off SONAR. Epic FAIL! #
  • Congrats again to my friends @dreasoning on their IQT strategic investment. I'm honored to assist you in your mission! #
  • I applied for the O'Reilly Blogger Review Program http://oreil.ly/eaVm3f via @oreillymedia (available products http://bit.ly/dJoNkl) #
  • Slideshare has launched free public web meetings http://bit.ly/hgDmij and they're starting with great guest speakers http://bit.ly/g27vez #
  • I received my first @OReillyMedia blogger program http://oreil.ly/eaVm3f book today and am fired up to read & review asap #
  • I'm loving the convenience of Roku including streaming Amazon VOD and Netflix. Want to try at 10% off? http://roku.tellapal.com/a/clk/bbsfV #
  • Trivia tonight at a local Rotary function really took me back to high school academic team days. And yes, we won a trophy. #

Running

  • Signed up for the Post Oak "troad" Half Marathon http://bit.ly/eMzJhJ to keep myself honest the next couple of weeks #
  • Ran 5.11 miles in 52 mins and felt good. Warm weather after the snow. First run in shorts in a long time! http://bit.ly/g92Lyb #
  • Ran 3.69 miles in 42 mins and felt great. Split paces 7:40, 7:19, 7:59, 8:03, 7:57, 9:31, 8:29, 8:58, 8:46, and 8:12. http://bit.ly/gLu7wn #
  • Ran 3.01 miles in 39 mins and felt great. Easy stop-n-go run with my children. Garmin off for part of the run so dis… http://bit.ly/ffkX4N #
  • Ran 2.74 miles in 24 mins and felt good. Short pace run on a warm afternoon. http://bit.ly/fBoVnZ #

Facebook should go all the way

Facebook has started adding hCalendar and hCard microformat markup to the millions of “events” listed in their site.

Facebook loves microformats, but are they doing them right?

In theory, this could free up the date, time, location, and related event information for linking and use by other sites and applications.  I’ve previously written about Facebook’s moves toward open standards in a series on this blog (click here to read about their path towards opening up the Facebook Platform and APIs).  This takes Facebook one step further down that path.

As was noted on microformats.org:

Facebook’s deployment of hCalendar is just the latest in their series of slow but steadily increasing support for open standards and microformats in particular. Over two years ago Facebook added hCard support to their user profiles. Last year they announced support for OAuth 2.0, as well as adding XFN rel-me support to user profiles, thus interconnecting with the world wide distributed social web. They proudly documented their use of HTML5. And now, millions of hCalendar events with hCard venues.

But is this step only a half step?

Including microformat markup only gets us half-way to where Facebook could take event information, and especially the location information from hCard markup, on the web.

Click here to read the full post on the PayPal X Developer Network and learn how and why Facebook should finish what they’ve started with Events.

Is this the content you want from X.com?

I recently wrote a two part series of technical articles for X.com on “Divining DevZone Insight from Filtered Feeds and Deep Pages” (read part 1 and/or part 2 by clicking the link for one or both).  These articles showed you how to use Yahoo Pipes and YQL along with some Python to pull data down from various RSS feeds, manipulate it to filter out the feed item details, and then use those details to rank key content topics.  If you missed either article I would encourage you to go back and read them from the links above.

What I want to do here is get your comments, please, on whether the content we’re writing is in line with what you want from X.com.  Please take this as an opportunity to tell us how we’re doing.

Let’s look at the results from the harvest+analysis articles.  I generated several charts showing the number of topic mentions across the content we wrote for the DevZone, blog posts as well as articles and book excerpts, from the launch in July 2010 through 7 February 2011.  Here’s a high level view of the number of mentions of all seventy topics that were analyzed (labels turned off so the general trend isn’t obstructed by clutter; they are turned on when we zoom in in subsequent charts):

https://www.x.com/servlet/JiveServlet/downloadImage/102-3243-5-4599/600-371/20110207_article.devZone.analysis.chart.all_70_devzone_topics.png

We’ve been making a concerted effort to provide a lot of DevZone coverage of ‘mobile’ topics (the topic with the most mentions above), and it shows.  Zooming in on the top twenty topics you see that in fact quite a few are mobile related:

https://www.x.com/servlet/JiveServlet/downloadImage/102-3243-5-4600/600-371/20110207_article.devZone.analysis.chart.top_20_devzone_topics.png

Based upon separate hit analyses conducted by Travis Robertson and myself, mobile content is exactly what you are looking for.  But if we’re wrong on that, please tell us what you would prefer by leaving a comment below.

Now let’s look at the particular mobile related topics we’ve been covering so far:

https://www.x.com/servlet/JiveServlet/downloadImage/102-3243-5-4601/600-371/20110207_article.devZone.analysis.chart.mobile_devzone_topics.png

The bars above again shows per-item mentions of each topic across all DevZone blog posts, articles, and book excerpts from the DevZone’s inception through the cut-off date last week, almost seven months total.  Taking the ‘mobile’ bar as an overall indicator we see that a bit more than half of the mobile related content discussed Android.  A similar amount of iPhone + iOS + iPad mentions were made, though they often ooccured in the same items so the overall concentration was on Android versus “i”-content.

Is that the right balance?  Or would you like to see more “i”-coverage in our material?

I’m also curious to hear your take on our mobile wallet, QR code, and NFC coverage.  Too little, too much, or just right?

Please click here to read the full post on the PayPal X Developer Network including additional PayPal and programming language analyses. You can also leave your comments on that post.

Divining DevZone Insight from Filtered Feeds and Deep Pages: Part 2, The Analysis

In the previous article in this series we looked at Devzone developer RSS feeds from which we wanted to harvest data. We discussed our initial approach and then dove into implementing it using Yahoo Pipes and YQL. Please read that post now (click here to access it) if you haven’t already.

We encountered some problems along the way, and promised a solution in this follow-on article. Ready? Here we go!

The RSS property numItems is a good thing

By the end of the previous article, we had created a Yahoo Pipe which used YQL to fetch items from all six of the pertinent DevZone RSS feeds. We discovered that the feed items returned were being severely limited in number by the RSS server. We needed a way to get around that limitation to get as many of the items as possible (preferably, all of them).

After a little searching, I turned up the answer: In order to get more results, I needed to ask the server nicely using the numItems query parameter.

The numItems parameter indicates to the RSS server that I’d like to receive the number of feed items indicated, if possible. If the server is configured to allow more items when asked, I should be able to get more than the default.

Keeping in mind the fact that the DevZone blog feed contained upwards of 170 posts and would increase quite a bit over time, the documents had feed more than sixty and was increasing but at a slower rate, and the pre-cutover individual feeds each contained less than twenty items and weren’t going to increase any more, I constructed the following YQL request to attempt to get every item in every feed:

select * from rss where url in ('https://www.x.com/people/ptwobrussell/blog/feeds/posts?numItems=20', 'https://www.x.com/people/billday/blog/feeds/posts?numItems=20', 'https://www.x.com/people/travis/blog/feeds/posts?numItems=20', 'https://www.x.com/blogs/Ethan/feeds/posts?numItems=20', 'https://www.x.com/community/feeds/blogs?community=2133&numItems=1000', 'https://www.x.com/community/feeds/documents?community=2133&numItems=200')

Running the pipe gave me all of the blog posts from all five blog feeds (yeah!) but none of the article or book excerpt items from the DevZone “Documents” feed (boo!). Now what?

I’ll spare you the details, but suffice it to say that after experimenting with the numItems value for the documents request I found that I could set it as high as ‘38‘ and receive that specified number of items back. If I set it to ‘39‘ or any higher, I got nothing back from that feed. Not nice. Given the need to move on to analyzing the data, however, I decided to roll with all the data I could get and then add back in the twenty-some missing document items later.

So to summarize, at this point I had a Yahoo Pipe that would return most of the desired feed items:

Here is the information for that pipe for you to use or clone as you see fit:

Pipe Feed location Lightweight data
All six feeds, numItems set, date sorted RSS JSON

Developing the ultimate solution

About the time I had the pipe above ready to use, I began noticing some inconsistencies in the data returned using it. Every once in a while, a “Refresh” in the Pipes debugger would fail to load any items, or would load many fewer than was expected. As I was contemplating what to do about that issue, I encountered the final piece of the ultimate solution puzzle: Python YQL, a library for making YQL queries in Python programs.

I had already been bumping into the edges of the Pipes model a bit. Manipulating RSS streams was straightforward, but what if I wanted to save some of the data out for analysis in other tools or archival? And although Pipes does provide a Loop module, some of the nested operations I envisioned during my analysis work would definitely be difficult, if not impossible, if I stayed strictly within the Pipes box.

On the other hand, Python YQL gave me the option of plopping the YQL select statement I’d already developed in Pipes directly into “real code”. Once I had the data flowing into my Python program, I could do just about anything I wanted with it. File I/O, filtering, etc. would be a cake walk in Python. I was sold!

Here then is the plan I implemented to collect, store, organize, analyze, and share the DevZone feed data:

  1. I would create two Python programs, one to conduct the harvest and one to perform analysis; this separation would let me harvest independently of analysis
  2. I would use the YQL developed previously in Pipes to collect the feed data into my Python harvest program, which would save out the portions I needed to a CSV file; this would be the input for the analysis program, and it would also allow me to perform additional analysis with a number of tools (Google Docs or any other tool supporting CSV)
  3. My analysis program would read in the harvested data CSV file along with a separate topic list CSV file; it would then filter the data against each of the DevZone topics, producing a topic-filtered CSV output file for each topic along with a topics statistics CSV file containing the key stats from the topics analysis
  4. I would add in the few missing documents’ data where needed myself (everything above was automated, but this part not so much)
  5. Once I had all of the feed items accounted for, I would create a bit.ly bundle for each topic (using the topic-filtered files) and include those bundle URLs in the topic statistics file
  6. Final step: Explore the topic filtered data and share what I learned in this article and beyond

My Python programs, devzone.harvest.py and devzone.analyze.py, are both available via github. Click here to access the repository and grab a copy of the source.

Python YQL could not be easier to use. We simply import the yql module, get access to a public (non-authenticated) YQL connection, then execute a YQL query against that connection. Here’s a snippet of code from my harvest program showing how easy it is to fetch the document feed data using a YQL select:

import yql
y = yql.Public()
articlequery = "select * from rss where url in ('https://www.x.com/community/feeds/documents?community=2133&numItems=38')"
articles = y.execute(articlequery)

Results are returned as a yql.YQLObj containing rows. Each row contains a dictionary whose value contains key:value pairs for one RSS feed item.

Once I’ve fetched the data from this and the other feeds, I save it out into a devzone.harvest.csv file for use by the analysis program and other tools. Note that as I write each row out via a Python DictWriter named ‘csvwriter‘, I add in a field I’ll use later to indicate if a given item is an article/book excerpt or a blog post. I also do some trimming on the date field to remove the unneeded day of the week and timezone information that was included in the RSS feed’s pubDate fields. Again, the Python code couldn’t be much simpler:

for row in articles.rows:
    row["articleOrBlog"] = "article"
    date = row["pubDate"]
    date = date[5:-4]
    row["pubDate"] = date
    csvwriter.writerow(row)

and here’s a look at the first line of the output devzone.harvest.csv file (note the reserved but currently empty second to last field and that I’ve removed the article HTML content from the final field for brevity):


31 Jan 2011 18:31:26,article,PayPal and the Road to Adaptive Payments,https://www.x.com/docs/DOC-3191,,{content of article would be here}

That’s about it for the interesting bits of the harvest program. You can see the complete source code listing for devzone.harvest.py by clicking here. I’ve tried to comment everything liberally to make it easy to follow along.

Now we’re ready to perform some analysis. Specifically, we want to perform the topic filtering described in the plan steps above. After devzone.analyze.py reads in the devzone.harvest.csv data from the harvest program and saves a copy of it minus the actual item content back out for use in other tools, it’s ready for its own critical bit, the topic filtering:

csvtopics = open("devzone.topics.csv", "rb")
topicreader = csv.reader(csvtopics, dialect='excel')
csvnumitems = open(devzonedir+"devzone.topics.items.csv", "wb")
numitemswriter = csv.writer(csvnumitems, dialect='excel')

for topic in topicreader:
    currenttopic = topic[0]
    topicfile = (currenttopic.replace(' ', '')).replace('.', 'dot')
    csvcurrenttopic = open(devzonedir+"devzone.analysis.topic."+topicfile+".csv", "wb")
    topicwriter = csv.DictWriter(csvcurrenttopic, fieldnames=['pubDate', 'articleOrBlog', 'title', 'link', 'hitCount'], restval='', extrasaction='ignore', dialect='excel')
    csvinput.seek(0)
    items = 0
    for row in itemreader:
        if re.search(currenttopic,row['title']) or re.search(currenttopic,row['description']):
            topicwriter.writerow(row)
            items += 1
    numitemswriter.writerow([currenttopic, items])
    print topicfile, "topic contains", items, "items"
    csvcurrenttopic.close()

Let’s walk through what that code does.

First, it opens up the devzone.topics.csv input file created previously. The topic list I used for this article is available in the github repository (click here). This file lists each of the major multipart blog and article series along with the significant technology and payments groupings that appear in the DevZone content. In effect, it specifies the content categories into which we’re going to slot the various feed items to build our content sitemap. Here are the first few rows of the topics file used for this article:

.Net
Adaptive Accounts
Adaptive Payments
Alternative Ways to Fund Your Project
analytics
Android
{...}

Note that I generated the current topics by hand, refining it over several analysis passes as the topic categories became clear to me. I would like to explore automatically generating this from the content itself in the future (see below for more on that).

For each topic, the analysis program works through each content item, checking to see if that item’s title or content contains the given topic under consideration. A Python regular expression search (re.search) is used for this check. If the topic is mentioned in the item’s title or content, then that item is added to the topic’s topic-specific output file and the topic item count is incremented. At the end of each item row-level pass, the analysis program writes the total number of items for the current topic under consideration. This total is written to devzone.topics.items.csv, which is a key file for our later analysis (more below).

With relatively little code, I.ve been able to do some pretty neat things. I’ve pulled down hundreds of blog posts, articles, and book excerpts from the PayPal server, sliced and diced them including adding some fields for my own use, performed topic filtering, and output several CSV data files for later analysis. The only thing I did beyond that was to manually add in the twenty-six (as of this writing) missing article items where needed. With that the data set is complete and it’s time to look at the results.

Click here to read the full article on the PayPal X Developer Network including the resulting table of DevZone topics and the analysis of the content.

Top 20 DevZone topics

Web API power tools: One tool to bind them all

In my last post we examined a three step process for learning about a new web API, prototyping its use in a console, and then copying the calls you developed into your own script or program.

That approach works well, but there is one major problem:  It requires you to learn a new API for each RESTful service you want to try out.

Wouldn’t it be nice if there was one mechanism you could learn that allowed you to make calls into just about any web API, or even plain old web pages?  In fact there is, and that’s the subject of this post:  the Yahoo! Query Language (YQL).

I first wrote about YQL for the PayPal X Developer Network in a two part DevZone article series on harvesting and analyzing RSS feed data (click here to read the first part on harvesting the data; the second part will be publishing soon).  You can read more about YQL in detail in those articles, so for this post I just want to cover the high level capability to show you its usefulness in your web API-based development.

So what is YQL?  YQL looks like SQL but enables you to access live data from the Internet.  That definition doesn’t really do it justice, however.

The bigger point about YQL is that it includes many built-in bindings to various web APIs so that you can use almost every major web API in a consistent manner, with similar calls across the many possible APIs available to you.  Yahooligans summarize this capability as:

select * from Internet

Here’s why.  In the first article in my harvest+analysis series I showed how you could use YQL to load DevZone “Blog” RSS feed data:

select * from rss where url in ('https://www.x.com/community/feeds/blogs?community=2133')

In the second part of the series you see how you can use that same YQL in an application.  The article shows a Python example using the Python YQL client library, but you can use it in other languages too with the appropriate library.  Barring direct support in a particular language of interest, however, you can always make the equivalent REST call provided for any given YQL statement you put together in the YQL console.

For instance, if you enter the above YQL into the console and test it with diagnostic information turned off and the server response set to ‘XML’ (JSON is the other option), the console gives you this output:

Copying the REST call for a new YQL query out of the console

In this case, the REST query copied out of the highlighted portion of the console is:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20rss%20where%20url%20in%20('https%3A%2F%2Fwww.x.com%2Fcommunity%2Ffeeds%2Fblogs%3Fcommunity%3D2133')

Click the REST query above to see the server response live in your browser.  You can use this REST call in your application code, no matter the programming language or environment.  Nifty, huh?

But what if you want to use that same DevZone data, or data from a different RSS feed or server, in an application where you also need access to other web APIs?  No problem, YQL supports many other web API queries too.  It does this via Open Data Tables created with just a little bit of XML.  Click here to learn more about the data tables.

For example, suppose you want to search Twitter for tweets containing “PayPal”.

Read the complete post on the PayPal X Developer Network to learn how to use the Twitter search API in YQL.

Tulsa Running Club digital newsletter

I’ve enjoyed the monthly TRC newsletter since I joined the club, but I’m particularly happy to see it go all-digital. Saving money + color images = why not?

My only request for changes at this point would be to make all of the embedded URLs links so the reader could jump strait to resources that interest them. Other than that, fantastic!

Design a site like this with WordPress.com
Get started