What would it take to build an open source version of POUND?

So BuzzFeed has this tool called POUND – the Process for Optimizing and Understanding Network Diffusion. Superstar publisher Dao Nguyen revealed the technology in a famous (to me) April 2015 blog post discussing the massive traffic success of The Dress.

When talking to reporters at PMN, I would often use these illustrations from the post to visualize how content bounces and spreads between networks.

POUND is mature tech at this point, as Nguyen told Fast Company the following shortly after the blog post was published:

Nguyen says that BuzzFeed now stores more Pound data in a single day than all the other data the company has been collecting for content optimization since it was founded. Pound is capable of processing over 10,000 web requests per second.

I’ve been sniffing around to see what it’d take to build an open source version of POUND, including prior art.

There was a discussion on the Snowplow repo and there was one comment in particular by Alexander Dean that I wanted to excerpt:

It’s great to have this ticket back in the frame! A few observations on how Buzzfeed does it, as the articles on Pound are somewhat vague:

  1. Buzzfeed uses a short hash to identify each sharing node in the tree. This is deliberately a very short string (much shorter than a UUID) to reduce the chances of it being clipped/truncated during dark social sharing
  2. When you click on this URI: http://www.buzzfeed.com/catesish/help-am-i-going-insane-its-definitely-blue, on landing on the page you have the URI rewritten to include the hash, e.g. http://www.buzzfeed.com/catesish/help-am-i-going-insane-its-definitely-blue#.vaO5pjMGwM
  3. When you then share your URI (including hash) on a social site and somebody clicks on it, on landing on the page they in turn will have the URI rewritten to include a new hash, to indicate a new node in the sharing tree, e.g http://www.buzzfeed.com/catesish/help-am-i-going-insane-its-definitely-blue#.topV28ADPg

Coming up with a hash which is densely packed enough (or can be associated with some other metadata e.g. IP address) to minimize collisions but brief enough to avoid truncation is an interesting challenge…

There’s also a repo by an engineer in Japan (I think) who’s started a re-implementation called footprint.js. That project is helpful in its organization of the hash accessing and local storage, but it currently chains hashes rather than generating a new hash as described in the blockquote above.

That said, we could definitely handle the problem scope represented by the hash math and hooking up a footprint.js like pattern to Google Analytics or another system; however I am less certain how we would make the graphic visualizations accessible to newsrooms if we were to pass these hashes to Google Analytics. Potentially with Fusion Tables?

I’m pretty sure there are other similar implementations already in the wild – I know world-reigning champion of news design Ivar Vong built something similar for The Marshall Project (you’ll see it if you visit that link) and so does Mic.com. The Washington Post has stuff too but nowhere near the elegant brevity of POUND-style implementations.

Do you have ideas or suggestions? Let me know in the comments.

Posted Jun. 6 2017, 10:47 am by Davis