Google digitizing millions of never-published New York Times photographs
The New York Times has been going full Google over the past two years, moving all consumer-facing products to Google Cloud (“a huge win” according to CTO Nick Rockwell).
But Google and New York Times have also been working on a secret project, one whose payoffs are harder to estimate – together, they are digitizing the entire photo archive of the Times, likely somewhere in the range of 5m to 20m photographs (a precise figure was not mentioned).
The digitization effort includes the hand-written “metadata” on the back of each image.
Each Times photo is marked up like a well-traveled passport–the prints have stamps, dates, notes, publication history, clipped captions, and other vestiges of their travel throughout the newspaper’s history.New York Times CTO Nick Rockwell
The “vast majority” of these images have never been published, according to Rockwell, who spoke at Google Cloud Next yesterday to announce the project.
Ultimately, storytelling is the motivation for undergoing such a monumental effort. Rockwell offered the Times’ Unpublished Black History as an example of what could be done with this new resource.
Google Cloud Storage is used to store the photos; Google PubSub for workflow orchestration; Google Spanner for the metadata; and Cloud Vision to help create additional indexed metadata from the images.
While this will not be a public dataset, the new image archive will be a strategic asset for the Times and its singular newsroom–yet another example of the so-called Gray Lady leading us boldly into this new digital world.
Did you enjoy this post?
Signup to receive a weekly email containing my new posts, curated links, and book reviews.