Another month, another flurry of developer tips, examples, and payments news. Let’s review, shall we?
I published the second part of my two part “Divining DevZone Insight from Filtered Feeds and Deep Pages” article series in February. The first article in the series delved into the sources of content and took the initial approach of using YQL within Yahoo Pipes to get at blog post, article, and book excerpt data. It also uncovered problems with some initial assumptions about the RSS streams providing the data. This second and final article in the series uses the RSS property numItems and some Python code to solve the remaining issues and analyze the stream data.
This data is used to produce topic-filtered DevZone content bundles, a sitemap of sorts. It also discusses the topics via a set of graphs showing topic mentions across DevZone content.

It you’re interested in using YQL in a programming language, especially in Python, I’d recommend checking this article out.
February blog posts included:
- Web API power tools: Steps to RESTful programming bliss – discussed my general approach to creating RESTful web API calls using API documentation, a console such as those from Apigee, and cURL
- Web API power tools: One tool to bind them all – focuses on using YQL to make a variety of web API calls; example YQL select statements along with the REST API calls generated by them
- Web API power tools: PayPal transactions via YQL – discusses the relatively new PayPal YQL bindings and how to use them
- Is this the content you want from X.com? – my request to you to look at the topic content we created in the article discussed above and make your own requests and recommendations for content you’d like to see added to the PayPal X Developer Network DevZone.
- Facebook should go all the way – why and how Facebook could truly blow the doors off of event location information, and location information overall, by allowing more than just the “Big 3” search engines to index its events which now contain places geo-information
- PayPal, Apple, and Google fight for your subscriptions – All three want you to use their subscription payment platfom, but which will you choose?
Read the complete post on the PayPal X Developer Network to access links to related development and payments news.
PayPal X Platform
- Advanced @PayPalX payment flows via MPL on iOS http://bit.ly/h6rTxJ #
- How to make @PayPalX transaction queries using YQL http://bit.ly/fke0Qw #
- Free @PayPalX Mobile Express Checkout (MEC) chapter http://bit.ly/fXzh2w from the new "PayPal APIs: Up and Running" http://amzn.to/h6PyTJ # (disclaimer: I provided tech review and a quote for the back cover of “PayPal APIs: Up and Running“)
- Building an online market using Python, @PayPalX Adaptive Payments, and Google App Engine http://bit.ly/e0ZIOM #
- Inter-penetrating orgs http://bit.ly/e1SWwY discussion with @TimOReilly is not nearly as dirty as it sounds (via @travisro on @PayPalX blog) #
Big data
- Quora: Who are the major data market players? Answer: http://qr.ae/gdBJ #
Wireless and mobility
- To Apple: "We are the App Store" http://bit.ly/fHLdpM is spot-on, pay attention #
- Particularly love the opening image and discussion: "Why I don't care very much about tablets anymore" http://bit.ly/h64Kci via @arstechnica #
- Android 3.0 platform and SDK tools release http://bit.ly/eaqkji #
APIs and development
- One of the reasons I lean towards Python http://bit.ly/gNXuaR (easier to read, hence great for maintainability, examples, and tech articles) #
- And yet, despite the simplicity of Python code, you do have problems such as this http://bit.ly/e3mnoW (careful, code slingers!) #
- Search StackOverflow and HackerNews book topics via HackerBooks http://bit.ly/fIE3Wn (e.g. "Python NLTK" returns: http://bit.ly/gQa6kh) #
- jsFiddle http://bit.ly/gNsnnk is another interesting web development editor (docs are here http://bit.ly/fpSa0d) #
- Complete @OReillyMedia "2010 State of the Computer Book Market" report http://bit.ly/gCR427 (PDF) is worth reading in its entirety #
- Must try: Pattern, a Python web mining module http://bit.ly/fzhxp5 bundling data retrieval, text analysis, and viz tools, via @ptwobrussell #
- #Ruby concurrency explained http://bit.ly/i4Xo8K (good points for understanding the issues with other languages, too) #
- On the causality of scaling http://bit.ly/h5kJLi and how to design for it http://bit.ly/ggrI3I from @merbist #
- Excellent PHP cross reference http://bit.ly/gpmnmK for #WordPress hackers, via @Nectarineimp #
Personal things
- Up all night with a sick baby. He appears to be a bit better, cautiously optimistic. Hope to try a little "real" but bland food later. #
- FYI I've started sharing conferences I'm tracking, attending, and speaking at here: http://bit.ly/hMrqdz @lanyrd #
- Amazon Prime Instant Video coverage from @arstechnica http://bit.ly/h66y14 and @engadget http://engt.co/dQtpMX #
- The @wsj on the surprising benefits of distractibility http://on.wsj.com/hJ3Uxn Fascinating! #
- Track tweets related to today's #STS133 Space Shuttle Discovery launch at http://bit.ly/hZ8cfv #
- STS-133 official NASA launch blog http://bit.ly/ecHPm9 #
Running
- Yesterday: Wind 30mph, gusting to 40, yeah! Final long slow run before Post Oak Half Marathon. http://bit.ly/dQNHsb #
- Ran 4.01 miles in 38 mins and felt great. http://bit.ly/gV3CBy #
- Ran 3.13 miles in 29 mins and felt good. Short tempo run in prepping for Post Oak this weekend. Warm and win… http://dailymile.com/e/NHz1 #
- Going to tear it up at the Post Oak Half tomorrow. OK, I'll accept "finish with dignity" if that's what's in… http://dailymile.com/e/NV4h #
In response to my ongoing MoSoLo series‘ posts on web API power tools including YQL, PayPal Developer Evangelist Praveen Alavilli pointed me to some recent work he’s done to support PayPal transaction search including details via YQL:
Bill – you should look at the two new yql tables that I’ve added to our github account: https://github.com/paypalx/yql-tables – they provide simple interfaces to PayPal’s Transaction Search and Details APIs. You can combine them with other APIs/Web Services like google/yahoo maps to map the zip codes or generate reports of the items being sold, etc.
These YQL interfaces currently support querying into the PayPal sandbox test environment. Praveen has stated he would consider adding a flag for developers to use to indicate whether they want to make a sandbox or production environment call. If that is added, I’ll post a note here.
To see these new transaction search and detail capabilities in the YQL console, load the console with “Show Community Tables” enabled (click here to load it now). Once you’ve done that, scroll down in the “Data Tables” listing at lower right and find the “PayPal” options:
From there you can select either paypal.transactions or paypal.transactions.details for a template YQL call to request information back from the PayPal X Platform on PayPal transactions or transaction details, respectively.
Click here to read the complete post on the PayPal X Developer Network including example YQL statements to query PayPal’s server.
PayPal X Platform
- What are your favorite web API tools? Mine via the @PayPalX DevZone blog: http://bit.ly/gztUHu and http://bit.ly/ewQB3J #
- My latest @PayPalX post: Mr. Zuckerberg, tear down this wall! http://bit.ly/iiqQ7L on how and why Facebook should go all the way on events #
- Payments developers: eBay and @PayPalX conferences will co-locate at Innovate 2011, October 12-13 in Moscone West http://bit.ly/ejslDT #

Big data
- Nice @nytimes graphic helps you wrap your head around the Obama proposed budget http://nyti.ms/i3Uc6t reminds me of http://oreil.ly/fNVECR #
APIs and development
- Do you have a Python IDE preference? I'm looking at Komodo and PyDev http://bit.ly/exd4zs #
- Django Packages http://bit.ly/fMM6Yy is a great source of apps and source #
- Python has closed the gap on PHP according to the Tiobe Index http://bit.ly/e9EqPz via @ptwobrussell #
- O'Reilly collects all their #Python information together on this meta-page http://oreil.ly/erlVDt #
- Great "Moving from Python 2 to Python 3" cheat sheet http://bit.ly/htCRvD recommended by Paul Barry http://bit.ly/fm1SEe #
- Commandlinefu tips and tricks for vim lovers http://bit.ly/h5tdPF (yes, vi, that's how I roll!) #
- Schedule for @reddirtrubyconf is available now at http://bit.ly/eTaRI4 #
- Hacking @GoogleDocs via its List API http://bit.ly/fTvora plus some 21 GDocs secrets http://bit.ly/fq6XLE from PC World #
- For anyone waiting on a @Cloud9IDE beta login: It took 4 weeks to receive mine http://bit.ly/hZoND5 #
- Handy tips for cleaning up #git directories and files http://bit.ly/eALzu9 from @matthewmccull (ignore, clean, and reset) #
- Great #git workshop materials http://bit.ly/hDxVQ0 from @matthewmccull #
- Free online "Pro Git" book http://bit.ly/iausim from Scott Chacon @progitbook #
- #Git cheat sheet http://bit.ly/hZ3AeQ including this info from its creator http://bit.ly/geNSGQ #
- Common #git workflow http://bit.ly/hJfmc2 from @wycats includes good tips for someone coming to git from svn #
Personal things
- Norton SONAR is automatically removing my git executable and there appears to be nothing I can do save turn off SONAR. Epic FAIL! #
- Congrats again to my friends @dreasoning on their IQT strategic investment. I'm honored to assist you in your mission! #
- I applied for the O'Reilly Blogger Review Program http://oreil.ly/eaVm3f via @oreillymedia (available products http://bit.ly/dJoNkl) #
- Slideshare has launched free public web meetings http://bit.ly/hgDmij and they're starting with great guest speakers http://bit.ly/g27vez #
- I received my first @OReillyMedia blogger program http://oreil.ly/eaVm3f book today and am fired up to read & review asap #
- I'm loving the convenience of Roku including streaming Amazon VOD and Netflix. Want to try at 10% off? http://roku.tellapal.com/a/clk/bbsfV #
- Trivia tonight at a local Rotary function really took me back to high school academic team days. And yes, we won a trophy. #
Running
- Signed up for the Post Oak "troad" Half Marathon http://bit.ly/eMzJhJ to keep myself honest the next couple of weeks #
- Ran 5.11 miles in 52 mins and felt good. Warm weather after the snow. First run in shorts in a long time! http://bit.ly/g92Lyb #
- Ran 3.69 miles in 42 mins and felt great. Split paces 7:40, 7:19, 7:59, 8:03, 7:57, 9:31, 8:29, 8:58, 8:46, and 8:12. http://bit.ly/gLu7wn #
- Ran 3.01 miles in 39 mins and felt great. Easy stop-n-go run with my children. Garmin off for part of the run so dis… http://bit.ly/ffkX4N #
- Ran 2.74 miles in 24 mins and felt good. Short pace run on a warm afternoon. http://bit.ly/fBoVnZ #
Facebook has started adding hCalendar and hCard microformat markup to the millions of “events” listed in their site.
In theory, this could free up the date, time, location, and related event information for linking and use by other sites and applications. I’ve previously written about Facebook’s moves toward open standards in a series on this blog (click here to read about their path towards opening up the Facebook Platform and APIs). This takes Facebook one step further down that path.
As was noted on microformats.org:
Facebook’s deployment of hCalendar is just the latest in their series of slow but steadily increasing support for open standards and microformats in particular. Over two years ago Facebook added hCard support to their user profiles. Last year they announced support for OAuth 2.0, as well as adding XFN rel-me support to user profiles, thus interconnecting with the world wide distributed social web. They proudly documented their use of HTML5. And now, millions of hCalendar events with hCard venues.
But is this step only a half step?
Including microformat markup only gets us half-way to where Facebook could take event information, and especially the location information from hCard markup, on the web.
Click here to read the full post on the PayPal X Developer Network and learn how and why Facebook should finish what they’ve started with Events.
I recently wrote a two part series of technical articles for X.com on “Divining DevZone Insight from Filtered Feeds and Deep Pages” (read part 1 and/or part 2 by clicking the link for one or both). These articles showed you how to use Yahoo Pipes and YQL along with some Python to pull data down from various RSS feeds, manipulate it to filter out the feed item details, and then use those details to rank key content topics. If you missed either article I would encourage you to go back and read them from the links above.
What I want to do here is get your comments, please, on whether the content we’re writing is in line with what you want from X.com. Please take this as an opportunity to tell us how we’re doing.
Let’s look at the results from the harvest+analysis articles. I generated several charts showing the number of topic mentions across the content we wrote for the DevZone, blog posts as well as articles and book excerpts, from the launch in July 2010 through 7 February 2011. Here’s a high level view of the number of mentions of all seventy topics that were analyzed (labels turned off so the general trend isn’t obstructed by clutter; they are turned on when we zoom in in subsequent charts):

We’ve been making a concerted effort to provide a lot of DevZone coverage of ‘mobile’ topics (the topic with the most mentions above), and it shows. Zooming in on the top twenty topics you see that in fact quite a few are mobile related:

Based upon separate hit analyses conducted by Travis Robertson and myself, mobile content is exactly what you are looking for. But if we’re wrong on that, please tell us what you would prefer by leaving a comment below.
Now let’s look at the particular mobile related topics we’ve been covering so far:

The bars above again shows per-item mentions of each topic across all DevZone blog posts, articles, and book excerpts from the DevZone’s inception through the cut-off date last week, almost seven months total. Taking the ‘mobile’ bar as an overall indicator we see that a bit more than half of the mobile related content discussed Android. A similar amount of iPhone + iOS + iPad mentions were made, though they often ooccured in the same items so the overall concentration was on Android versus “i”-content.
Is that the right balance? Or would you like to see more “i”-coverage in our material?
I’m also curious to hear your take on our mobile wallet, QR code, and NFC coverage. Too little, too much, or just right?
Please click here to read the full post on the PayPal X Developer Network including additional PayPal and programming language analyses. You can also leave your comments on that post.
In the previous article in this series we looked at Devzone developer RSS feeds from which we wanted to harvest data. We discussed our initial approach and then dove into implementing it using Yahoo Pipes and YQL. Please read that post now (click here to access it) if you haven’t already.
We encountered some problems along the way, and promised a solution in this follow-on article. Ready? Here we go!
The RSS property numItems is a good thing
By the end of the previous article, we had created a Yahoo Pipe which used YQL to fetch items from all six of the pertinent DevZone RSS feeds. We discovered that the feed items returned were being severely limited in number by the RSS server. We needed a way to get around that limitation to get as many of the items as possible (preferably, all of them).
After a little searching, I turned up the answer: In order to get more results, I needed to ask the server nicely using the numItems query parameter.
The numItems parameter indicates to the RSS server that I’d like to receive the number of feed items indicated, if possible. If the server is configured to allow more items when asked, I should be able to get more than the default.
Keeping in mind the fact that the DevZone blog feed contained upwards of 170 posts and would increase quite a bit over time, the documents had feed more than sixty and was increasing but at a slower rate, and the pre-cutover individual feeds each contained less than twenty items and weren’t going to increase any more, I constructed the following YQL request to attempt to get every item in every feed:
select * from rss where url in ('https://www.x.com/people/ptwobrussell/blog/feeds/posts?numItems=20', 'https://www.x.com/people/billday/blog/feeds/posts?numItems=20', 'https://www.x.com/people/travis/blog/feeds/posts?numItems=20', 'https://www.x.com/blogs/Ethan/feeds/posts?numItems=20', 'https://www.x.com/community/feeds/blogs?community=2133&numItems=1000', 'https://www.x.com/community/feeds/documents?community=2133&numItems=200')
Running the pipe gave me all of the blog posts from all five blog feeds (yeah!) but none of the article or book excerpt items from the DevZone “Documents” feed (boo!). Now what?
I’ll spare you the details, but suffice it to say that after experimenting with the numItems value for the documents request I found that I could set it as high as ‘38‘ and receive that specified number of items back. If I set it to ‘39‘ or any higher, I got nothing back from that feed. Not nice. Given the need to move on to analyzing the data, however, I decided to roll with all the data I could get and then add back in the twenty-some missing document items later.
So to summarize, at this point I had a Yahoo Pipe that would return most of the desired feed items:
Here is the information for that pipe for you to use or clone as you see fit:
| Pipe | Feed location | Lightweight data |
|---|---|---|
| All six feeds, numItems set, date sorted | RSS | JSON |
Developing the ultimate solution
About the time I had the pipe above ready to use, I began noticing some inconsistencies in the data returned using it. Every once in a while, a “Refresh” in the Pipes debugger would fail to load any items, or would load many fewer than was expected. As I was contemplating what to do about that issue, I encountered the final piece of the ultimate solution puzzle: Python YQL, a library for making YQL queries in Python programs.
I had already been bumping into the edges of the Pipes model a bit. Manipulating RSS streams was straightforward, but what if I wanted to save some of the data out for analysis in other tools or archival? And although Pipes does provide a Loop module, some of the nested operations I envisioned during my analysis work would definitely be difficult, if not impossible, if I stayed strictly within the Pipes box.
On the other hand, Python YQL gave me the option of plopping the YQL select statement I’d already developed in Pipes directly into “real code”. Once I had the data flowing into my Python program, I could do just about anything I wanted with it. File I/O, filtering, etc. would be a cake walk in Python. I was sold!
Here then is the plan I implemented to collect, store, organize, analyze, and share the DevZone feed data:
- I would create two Python programs, one to conduct the harvest and one to perform analysis; this separation would let me harvest independently of analysis
- I would use the YQL developed previously in Pipes to collect the feed data into my Python harvest program, which would save out the portions I needed to a CSV file; this would be the input for the analysis program, and it would also allow me to perform additional analysis with a number of tools (Google Docs or any other tool supporting CSV)
- My analysis program would read in the harvested data CSV file along with a separate topic list CSV file; it would then filter the data against each of the DevZone topics, producing a topic-filtered CSV output file for each topic along with a topics statistics CSV file containing the key stats from the topics analysis
- I would add in the few missing documents’ data where needed myself (everything above was automated, but this part not so much)
- Once I had all of the feed items accounted for, I would create a bit.ly bundle for each topic (using the topic-filtered files) and include those bundle URLs in the topic statistics file
- Final step: Explore the topic filtered data and share what I learned in this article and beyond
My Python programs, devzone.harvest.py and devzone.analyze.py, are both available via github. Click here to access the repository and grab a copy of the source.
Python YQL could not be easier to use. We simply import the yql module, get access to a public (non-authenticated) YQL connection, then execute a YQL query against that connection. Here’s a snippet of code from my harvest program showing how easy it is to fetch the document feed data using a YQL select:
import yql y = yql.Public() articlequery = "select * from rss where url in ('https://www.x.com/community/feeds/documents?community=2133&numItems=38')" articles = y.execute(articlequery)
Results are returned as a yql.YQLObj containing rows. Each row contains a dictionary whose value contains key:value pairs for one RSS feed item.
Once I’ve fetched the data from this and the other feeds, I save it out into a devzone.harvest.csv file for use by the analysis program and other tools. Note that as I write each row out via a Python DictWriter named ‘csvwriter‘, I add in a field I’ll use later to indicate if a given item is an article/book excerpt or a blog post. I also do some trimming on the date field to remove the unneeded day of the week and timezone information that was included in the RSS feed’s pubDate fields. Again, the Python code couldn’t be much simpler:
for row in articles.rows: row["articleOrBlog"] = "article" date = row["pubDate"] date = date[5:-4] row["pubDate"] = date csvwriter.writerow(row)
and here’s a look at the first line of the output devzone.harvest.csv file (note the reserved but currently empty second to last field and that I’ve removed the article HTML content from the final field for brevity):
31 Jan 2011 18:31:26,article,PayPal and the Road to Adaptive Payments,https://www.x.com/docs/DOC-3191,,{content of article would be here}
That’s about it for the interesting bits of the harvest program. You can see the complete source code listing for devzone.harvest.py by clicking here. I’ve tried to comment everything liberally to make it easy to follow along.
Now we’re ready to perform some analysis. Specifically, we want to perform the topic filtering described in the plan steps above. After devzone.analyze.py reads in the devzone.harvest.csv data from the harvest program and saves a copy of it minus the actual item content back out for use in other tools, it’s ready for its own critical bit, the topic filtering:
csvtopics = open("devzone.topics.csv", "rb") topicreader = csv.reader(csvtopics, dialect='excel') csvnumitems = open(devzonedir+"devzone.topics.items.csv", "wb") numitemswriter = csv.writer(csvnumitems, dialect='excel') for topic in topicreader: currenttopic = topic[0] topicfile = (currenttopic.replace(' ', '')).replace('.', 'dot') csvcurrenttopic = open(devzonedir+"devzone.analysis.topic."+topicfile+".csv", "wb") topicwriter = csv.DictWriter(csvcurrenttopic, fieldnames=['pubDate', 'articleOrBlog', 'title', 'link', 'hitCount'], restval='', extrasaction='ignore', dialect='excel') csvinput.seek(0) items = 0 for row in itemreader: if re.search(currenttopic,row['title']) or re.search(currenttopic,row['description']): topicwriter.writerow(row) items += 1 numitemswriter.writerow([currenttopic, items]) print topicfile, "topic contains", items, "items" csvcurrenttopic.close()
Let’s walk through what that code does.
First, it opens up the devzone.topics.csv input file created previously. The topic list I used for this article is available in the github repository (click here). This file lists each of the major multipart blog and article series along with the significant technology and payments groupings that appear in the DevZone content. In effect, it specifies the content categories into which we’re going to slot the various feed items to build our content sitemap. Here are the first few rows of the topics file used for this article:
.Net Adaptive Accounts Adaptive Payments Alternative Ways to Fund Your Project analytics Android {...}
Note that I generated the current topics by hand, refining it over several analysis passes as the topic categories became clear to me. I would like to explore automatically generating this from the content itself in the future (see below for more on that).
For each topic, the analysis program works through each content item, checking to see if that item’s title or content contains the given topic under consideration. A Python regular expression search (re.search) is used for this check. If the topic is mentioned in the item’s title or content, then that item is added to the topic’s topic-specific output file and the topic item count is incremented. At the end of each item row-level pass, the analysis program writes the total number of items for the current topic under consideration. This total is written to devzone.topics.items.csv, which is a key file for our later analysis (more below).
With relatively little code, I.ve been able to do some pretty neat things. I’ve pulled down hundreds of blog posts, articles, and book excerpts from the PayPal server, sliced and diced them including adding some fields for my own use, performed topic filtering, and output several CSV data files for later analysis. The only thing I did beyond that was to manually add in the twenty-six (as of this writing) missing article items where needed. With that the data set is complete and it’s time to look at the results.
Click here to read the full article on the PayPal X Developer Network including the resulting table of DevZone topics and the analysis of the content.
In my last post we examined a three step process for learning about a new web API, prototyping its use in a console, and then copying the calls you developed into your own script or program.
That approach works well, but there is one major problem: It requires you to learn a new API for each RESTful service you want to try out.
Wouldn’t it be nice if there was one mechanism you could learn that allowed you to make calls into just about any web API, or even plain old web pages? In fact there is, and that’s the subject of this post: the Yahoo! Query Language (YQL).
I first wrote about YQL for the PayPal X Developer Network in a two part DevZone article series on harvesting and analyzing RSS feed data (click here to read the first part on harvesting the data; the second part will be publishing soon). You can read more about YQL in detail in those articles, so for this post I just want to cover the high level capability to show you its usefulness in your web API-based development.
So what is YQL? YQL looks like SQL but enables you to access live data from the Internet. That definition doesn’t really do it justice, however.
The bigger point about YQL is that it includes many built-in bindings to various web APIs so that you can use almost every major web API in a consistent manner, with similar calls across the many possible APIs available to you. Yahooligans summarize this capability as:
select * from Internet
Here’s why. In the first article in my harvest+analysis series I showed how you could use YQL to load DevZone “Blog” RSS feed data:
select * from rss where url in ('https://www.x.com/community/feeds/blogs?community=2133')
In the second part of the series you see how you can use that same YQL in an application. The article shows a Python example using the Python YQL client library, but you can use it in other languages too with the appropriate library. Barring direct support in a particular language of interest, however, you can always make the equivalent REST call provided for any given YQL statement you put together in the YQL console.
For instance, if you enter the above YQL into the console and test it with diagnostic information turned off and the server response set to ‘XML’ (JSON is the other option), the console gives you this output:
In this case, the REST query copied out of the highlighted portion of the console is:
Click the REST query above to see the server response live in your browser. You can use this REST call in your application code, no matter the programming language or environment. Nifty, huh?
But what if you want to use that same DevZone data, or data from a different RSS feed or server, in an application where you also need access to other web APIs? No problem, YQL supports many other web API queries too. It does this via Open Data Tables created with just a little bit of XML. Click here to learn more about the data tables.
For example, suppose you want to search Twitter for tweets containing “PayPal”.
Read the complete post on the PayPal X Developer Network to learn how to use the Twitter search API in YQL.
I’ve enjoyed the monthly TRC newsletter since I joined the club, but I’m particularly happy to see it go all-digital. Saving money + color images = why not?
My only request for changes at this point would be to make all of the embedded URLs links so the reader could jump strait to resources that interest them. Other than that, fantastic!






