Request a free site audit

Interested in offering ad free site memberships?

Life news: I got a dog


Google digitizing millions of never-published New York Times photographs

The New York Times has been going full Google over the past two years, moving all consumer-facing products to Google Cloud (“a huge win” according to CTO Nick Rockwell).

But Google and New York Times have also been working on a secret project, one whose payoffs are harder to estimate – together, they are digitizing the entire photo archive of the Times, likely somewhere in the range of 5m to 20m photographs (a precise figure was not mentioned).

The digitization effort includes the hand-written “metadata” on the back of each image.

Each Times photo is marked up like a well-traveled passport–the prints have stamps, dates, notes, publication history, clipped captions, and other vestiges of their travel throughout the newspaper’s history.

New York Times CTO Nick Rockwell

The “vast majority” of these images have never been published, according to Rockwell, who spoke at Google Cloud Next yesterday to announce the project.

Ultimately, storytelling is the motivation for undergoing such a monumental effort. Rockwell offered the Times’ Unpublished Black History as an example of what could be done with this new resource.

An example photo from the New York Times archive, including the “passport” style stamps on the back.

Google Cloud Storage is used to store the photos; Google PubSub for workflow orchestration; Google Spanner for the metadata; and Cloud Vision to help create additional indexed metadata from the images.

While this will not be a public dataset, the new image archive will be a strategic asset for the Times and its singular newsroom–yet another example of the so-called Gray Lady leading us boldly into this new digital world.

Did you enjoy this post?

Signup to receive a weekly email containing my new posts, curated links, and book reviews.

Thank you for subscribing.

Something went wrong.


Follow me on Twitter

Philly should learn from tech by embracing “OKR’s”

By setting OKR’s, leaders can provide north stars that can guide decisions at any level of the organization.

Less, More, and None

Lenfest Institute and Digital First Media?

Notes on dynamic meters

Notes on newsletters


Capturing Shawmont Station before its $1,000,000 preservation begins – the oldest extant passenger rail station in America

Originally a 18′ by 36′ stone house (Wissahickon Schist), the structure wouldn’t have stood out from the other country homes in this part of Philadelphia, at the tip of the Manayunk Reach, situated at the end of today’s Manayunk Canal Towpath.

Testing WordPress Gutenberg on a high volume news site

Water, sand, and societal change

Translation of Ben Franklin’s 1731 Apology for Printers into modern vernacular

Ben Franklin’s 1731 Apology for Printers, translated into modern vernacular.

EverQuote and patent medicine

In a thread begun October 2016, Washington Post technology director Aram Zucker-Scharff tweeted about the shady advertising practices of EverQuote, a Boston-based startup. Since then these ads have become prolific on the web (and nearly as prolific are Aram’s tweets documenting the malfeasance).

Lenfest Institute and Digital First Media?

What if Alden let another organization manage its newspaper assets as a blind trust? It would lose a great deal of flexibility in using DFM assets to leverage other companies it owns, but it would be able to wash its hands of the growing public relations crisis. Furthermore, it would give space for the strategic direction of DFM to be explored and pursued without the added baggage of hedge fund cross-percolation.

Receive a weekly email with newly posted content

  • About one email per week
  • Includes original posts, curated links, and book reviews

Thank you for subscribing.

Something went wrong.

Send this to a friend