Go Code Colorado organizers are pleased to offer additional data resources for Go Code Colorado participants. There are many data resources available that can be used with Colorado state public data to create useful business insights and tools. We encourage you to attend a Go Code Colorado event and meet the Go Code Colorado data team to learn more about these resources and how they could help you.
U.s. Patent and Trade Office
Intellectual property data is often an early indicator of meaningful research and development (R&D). The U.S. Patent and Trade Office (USPTO) has a mountain of scientific knowledge in the millions of patent applications they receive. This is no accident—it’s part of the fundamental bargain in the Progress Clause of our Constitution. In exchange for disclosing the invention to the public, including what it is and how it works, the inventor gets exclusive rights for a limited period of time. One of the most innovative products to date from USPTO is their Open Data Portal, including an API catalog, USPTO data visualizations and the USPTO Developer Page. All are designed to derive new and sustainable ways to expose data and provide a platform to get data “faster and easier.” The goal is to combine this data with other data, such as economic data, report data on filings rates, inventorship, assignee and location of filing to illuminate compelling new trend lines and insights. This platform lets people dive into what very well may be the world’s largest repository of data on innovation and R&D technology trends. By harnessing the power of patent data, the USPTO hopes to better arm those looking to innovate and create by having better access to what’s come before—what’s worked commercially and what hasn’t—to empower a more innovative society.
National Renewable Energy Laboratory PUBLIC APIs
The National Renewable Energy Laboratory (NREL) provides energy data via an API at https://developer.nrel.gov/. Anyone may access and use these web services by signing up for a free API key. Information provided includes data related to energy efficiency and the use of renewable energy technologies in residential and commercial buildings, services associated with the costs, generation, transmission, delivery and monitoring of electricity, and access to NREL wind datasets and models. Users may also access the complete alternative transportation technology datasets, which include station locations and transportation laws and incentives, and a number of solar resource data and models, including real-time installation data. NREL’s State and Local Energy Data (SLED) tool gives comprehensive energy use and activity data based on a city name or ZIP code. SLED output is derived from a number of sources listed on the Data Sources tab of the tool. These sources provide data that includes utility rates and averages, fuel sources, consumption trends, state and local policies, and demographic and housing data.
COLORADO data engine and the shift research lab
Data available on the Colorado Data Engine is focused on providing neighborhood-scale public data in a standardized, geolocated format. The available datasets are focused on promoting research and civil efforts toward community development and health—the health of the citizens as well as the vibrancy of their environment. Datasets include school measures, health indicators, food deserts and more. Public data for the public good. The Shift Research Lab offers neighborhood-level data and analysis via online platforms, performs objective research to support community change initiatives, and provides technical assistance that helps organizations build their capacity to use data.
Drcog regional data catalog
The Denver Regional Council of Governments (DRCOG) has made the Regional Data Catalog available to house a variety of datasets curated for the needs of planners and economic developers in the Denver Metro Area. It focuses on topics in demographics, land use, employment, infrastructure and transportation. Whereas Go Code aims to highlight statewide data wherever possible, there are so many valuable datasets being built for this catalog that it is worth highlighting. You can also look to other collaborating agencies like Northwest Colorado Council of Governments (NWCCOG) for data resources or any of the other 14 in the other Colorado Association of Organizations.
Competing teams may use datasets in addition to those published on the Colorado Information Marketplace. Many are available at OpenColorado, which provides access to crime, financial, health, population, transportation and GIS data from Arvada, Boulder, Denver, Fort Collins, Denver Regional Council of Governments (DRCOG) and other jurisdictions. OpenColorado provides a data sharing platform that makes public data available and accessible to all Colorado constituents by allowing any municipality, county, government agency, nonprofit or individual to upload and share open data with the public.
RTD Developer resources
Competing teams interested in using transit data in their apps can utilize the available data resources from the Regional Transportation District. The RTD schedule data is available in General Transit Feed Specification (GTFS). RTD provides real-time data feeds for arrival predictions and vehicle locations in GTFS realtime format. RTD GIS data is also available for download. Developers must accept an agreement before downloading GIS data.
The Go Code Colorado Data team would like to take a few minutes to tell you about data.
Aside from being free to use, public data is different from private data in two major ways. The first is that it public data does not include any personally identifiable information, and private data sometimes does. The second is that public data is always secondary data, meaning it is collected by the government for the purposes of government operations (primary data), but then when re-purposed by data consumers it becomes secondary data. This means that, quite often, data consumers are limited in their capacity to request changes to the data that meet their needs. In some scenarios, there is an open feedback loop (much like the one created by Go Code Colorado!), where data consumers can request enhancements to the formatting or content of the data. The primary difference between public data and open data is that public data is derived from the government, and open data can have a variety of sources.
Use of the word variety to refer to how the government manages its data is perhaps not ambiguous enough. State to state, city to city there are differences, and within states there are a wide variety of organizational structures. That being said, there are some indicators that we can look to as citizens that give us a clue about how data is collected about us. Think about where you go when you need to renew your drivers license, versus where you pay your property taxes. The graphic below shows examples to help you see the pattern. In some cases, the state will aggregate some of the local data to produce a data product designed to measure patterns across the municipal jurisdictions that collect the data. Quite often, for these aggregate datasets, the data at the local level retains a greater level of detail. Open Street Map represents publicly available data that is created by public contribution – it is essentially crowd sourced data, and like all crowd sourced data, there are issues with versioning and currency.
There are three key factors for valuable data: Is the data accurate? Is it up-to-date? Can it be combined easily with other data for analysis? The following content explains how the tag word “gocode” on the Colorado Information Marketplace (CIM) means quality.
Data curation is the process of transforming data from its original build and native management into a series of datasets that are formatted to the needs of the data consumer.
As a reminder, all datasets on the Colorado Information Marketplace can be accessed through the portal’s API.
Signs Your Data has been Curated
Best practices for open data programs include:
- Thorough documentation on the technical data publishing process
- Published open data strategic plan and progress toward publishing goals
- Published documentation on the organizational steps in the data provider process
- Good documentation on what metadata is and why its valuable
- Meaningful metadata
- Metrics and dashboards showing various data update schedules
- Use of standards and naming conventions
- Quality keywords and enhanced searching
- Successful apps built off the data NYC, SF, Chicago
- Control over catalog populating, and assuring authoritative sources
- Smart regulation of public publishing and crowd sourced data
In addition to these administrative and applied signs that the open data portal is populated with data of value, you can also look at the formatting of the data itself to get a feel for how much cleaning and curating has gone into the publishing.
Distinct effort has gone into ensuring that the data is presented in the appropriate size and scale. This means that tables are built to present a meaningful collection of fields, so as to minimize the number of different API calls required to convey a story or produce an app with the data. Different tables are selected and combined to produce a final product that is based on the breakdown a database based on its internal themes and maximized for interpretation by the user. This size is determined in the publishing process by a data architect who knows the right way to balance the breakdown of tables between groups that encourage theme discoverability and groups that are too segmented that they become inefficient..
The complexity of a database determines the number of tables that are produced and published for public consumption. In some cases, information is duplicated on different tables in order to provide the necessary context for that table. A good example of this is two datasets, each having a field related to the County FIPS Code. Both would have this column in addition to a descriptive code, like County Name. In the best scenarios, similar datasets have columns with common attributes or a shared unique ID for each record—these tables can be joined.
The Labor Market Information system (LMI) of the Colorado Department of Labor and Employment (CDLE) is a large database of over 70 tables. The portion of the dataset that is public has been through the ETL (extracted, transformed and loaded) process from the LMI database and loaded onto the CIM as five separate tables:
- Employment by Industry
- Income Data by County
- Employment and Unemployment Estimates
- Occupational Employment Statistics
- Current Employment Statistics
Metadata and Description are Clear and Succinct
All metadata has been created from interviewing the data providers and data stewards in order to characterize the data as thoroughly as possible in relation to the needs of the data user, and presented on this website in a way that is designed to enhance the creative process of designing an app for Go Code Colorado.
Standards for File Naming Format
Go Code Colorado brand datasets employ a standard naming convention in addition to a standard format for the “brief description” that immediately follows the dataset title in the Colorado Information Marketplace catalog.
Title formatting for a good first impression:
[3 word desc] + [data type] + [geography]
- Airport Locations for Colorado
- Business Entities for Colorado
- Bike Lane Routes for Denver
Description formatting for Improved Catalog Browsing
- Maximum 30 words
- First 15 ‘show’ in CIM catalog
- What’s in it
- Who is the data provider
- What is the time frame
- How is it maintained
- How was it created
Want your agency to participate?