Request a free site audit

Interested in offering ad free site memberships?

Life news: I got a dog


Google digitizing millions of never-published New York Times photographs

The New York Times has been going full Google over the past two years, moving all consumer-facing products to Google Cloud (“a huge win” according to CTO Nick Rockwell).

But Google and New York Times have also been working on a secret project, one whose payoffs are harder to estimate – together, they are digitizing the entire photo archive of the Times, likely somewhere in the range of 5m to 20m photographs (a precise figure was not mentioned).

The digitization effort includes the hand-written “metadata” on the back of each image.

Each Times photo is marked up like a well-traveled passport–the prints have stamps, dates, notes, publication history, clipped captions, and other vestiges of their travel throughout the newspaper’s history.

New York Times CTO Nick Rockwell

The “vast majority” of these images have never been published, according to Rockwell, who spoke at Google Cloud Next yesterday to announce the project.

Ultimately, storytelling is the motivation for undergoing such a monumental effort. Rockwell offered the Times’ Unpublished Black History as an example of what could be done with this new resource.

An example photo from the New York Times archive, including the “passport” style stamps on the back.

Google Cloud Storage is used to store the photos; Google PubSub for workflow orchestration; Google Spanner for the metadata; and Cloud Vision to help create additional indexed metadata from the images.

While this will not be a public dataset, the new image archive will be a strategic asset for the Times and its singular newsroom–yet another example of the so-called Gray Lady leading us boldly into this new digital world.

Your ad blocker is on.

Read ad free.

Sign up for our e-mail newsletter:
Support quality journalism:
Purchase a Subscription!


Vox Media licenses Chorus to Sun-Times; take a look at what the backend’s like

The team at Vox Media deserves all the snaps for its work on Chorus, the once-mythical “unicorn” content management system that does just about everything a digital publisher could want.

EverQuote and patent medicine

In a thread begun October 2016, Washington Post technology director Aram Zucker-Scharff tweeted about the shady advertising practices of EverQuote, a Boston-based startup. Since then these ads have become prolific on the web (and nearly as prolific are Aram’s tweets documenting the malfeasance).

Send this to a friend