Stories

New York’s data deluge begins

Under long-awaited law, all published city stats must now be downloadable for app makers and analysts

Numbers nerds rejoiced Thursday as the city reached an important milestone: All the city’s online data, from rat reports to hurricane evacuation maps, must be downloadable from the city’s open data portal.

March 7 was the deadline set under the city’s open data law, also know as Local Law 11. While it will take another five years to roll out in full, this week marks the first time that web-published information must be available to the public in machine-readable form, meaning that application developers, journalists and others can repurpose it into their own work.

In the past year, city agencies have posted at least 350 new datasets, including a database of building complaints, historical crime data and the location information for every payphone in the city.

Andrew Nicklin of the city’s technology agency (at podium), flanked by Councilmembers Gale Brewer (right) and Mark Weprin (left), join opengov advocates to mark an important milestone this week: the release of (almost) all city data for download.

About 75 percent of the data that currently exists on the city’s websites has been made accessible through the portal, according to Andrew Nicklin, director of research and development at city’s Department of Information Technology and Telecommunications (DoITT).

Agencies that could not comply with Thursday’s deadline were required to explain to DoITT why particular datasets could not be made available; Nicklin says that DoITT may soon release a list.

Compiling information in a central location is the first step toward understanding the size and scope of the data that is in the city’s possession. “It’s the equivalent of going to a library, and know you have a card catalog you can go to,” Noel Hidalgo, executive director of the Open NY Forum, a civic data meetup group that was part of the coalition of non-profits that advocated for the law.

In the meantime, some omissions from the new lode are glaring. For example, precinct-level crime data, released on a weekly basis by the NYPD, does not appear to be housed in the portal. The NYPD currently hosts that information on its own website, and makes it available only in PDF format, which makes it nearly impossible to extract and use the data to track crime trends and patterns.

Local Law 11 requires agencies to convert data posted on the web portal to a format that can be used for other purposes, like building applications and running analyses. “Not having it in machine-readable format is almost a disservice, or it’s creating an obtuse government,” said Hidalgo. “Willingly creating data that is not easily consumed by computers, you’re not in 21st century.”

On the brighter side, the data portal now includes statistics previously not available in machine-readable format. For instance, the public can now download ten years of figures from the Mayor’s Management Report, which include about 1,200 different indicators measuring the performance of city agencies. Previously, anyone wanting to analyze this information would have to extract the data from individual PDF files downloaded from the mayor’s website.

According Nicklin, DoITT is also working to build backend systems that will be able to extract information from large city databases that are continuously updated, like those maintained by the Department of Buildings. “The [goal] is to do that automatically,” he said. “So there’s no human contact.”

Advocates say they hope that the continued release of public data will facilitate greater government transparency and accountability. “Putting the information out there is only as important as it is useful,” said Alex Camarda, director of public policy and advocacy at the good-government group Citizens Union. “We’ll see how different elected officials use this to make agencies accountable and make operations more efficient.”

The open data law requires agencies to eventually release all of their public data, which includes information not currently housed on agency websites but subject to Freedom of Information Law requests — information that could provide greater insight into city operations.

For example, Patrick Markee, senior policy analyst for the Coalition for the Homeless, would like to see the Department of Homeless Services release its historical data on the city shelter population, which is not available to most members of the public. “We’ve managed to obtain that data over time,” said Markee. “But that’s literally because we have staff and interns here sitting with paper reports from the ’80s, hand-entering the data. That’s a workaround solution.”

New Yorkers will have to wait until September to find out what datasets the city will be releasing from behind closed doors. That is when city agencies must file an inventory of their public datasets alongside a plan for the release of the information. Agencies have until 2018 to release all of their data. However, even this deadline — six years after the bill’s passage — can be missed as long as an agency explains why a particular dataset cannot be released. In the meantime, members of the public may nominate datasets for release.

It is unclear what consequences agencies will face if they are not compliant with Local Law 11. “Ideally, someone would step up and be the policeman,” said Dominic Mauro, staff attorney for the good government group Reinvent Albany.

Councilmember Gale Brewer, who spent years shepherding the bill through the City Council and will leave office next year because of term limits, says she’s confident that agencies shirking their responsibilities will not go unnoticed. “You can’t fine agencies,” she said. “But you can have public pressure.”

Advocates say that continued implementation of the law could depend on who succeeds Mayor Michael Bloomberg at the end of this year. “It’s critically important because [the mayor] oversees the agencies,” said Camarda.

Mauro says that that a new administration will have to get on board, because the information tracked by city agencies ultimately belongs to the public. “Taxpayers fund the government and the government collects the data,” he said. “Its our data. We don’t pay taxes so we can not know what goes on.”