Open contracting data: structure and spreadsheets
For the third post in our technical series, we’re looking at approaches to to generate and export OCDS data from flat structures in spreadsheets and CSV files.
The Open Contracting Data Standard (OCDS) was designed to support rich representations in which data can be nested together to capture one-to-many relationships: for example, contacting processes with many associated items, documents, awards and contracts. For this reason, the primary format for the OCDS is JSON, a developer-friendly representation in which data can be nested together to capture one-to-many relationships.
However, there are many times when it is useful to create and analyse contract information in simpler tabular formats — working in everyday desktop tools like Excel, OpenOffice or Google Sheets. For that reason, we’ve been working hard on secondary formats for OCDS data, developing a bi-directional approach to convert between CSV, Excel and OCDS JSON.
Back in May, OCDS developer Ben Webb gave a talk at csv,conf in Berlin about the work we’ve been doing. The detailed technical slides from that talk are here, and we’ve recently updated the documentation for the underlying flatten-tool which powers our approach (as well as flat serialization for a number of other standards). In this blog post we look particularly at how spreadsheet formats for OCDS can be useful for:
- Data output: providing a common pattern to provide CSV or Excel export from OCDS compatible systems.
A common approach for flat data exports can support researchers and analysts to work with data from different sources.
- Data input: providing tried and tested tooling to convert from tabular formats into structured OCDS data. This can support the design of simple data-collection templates.
The flat serialization for OCDS has a column for each field in the JSON, with column headings constructed based on JSON pointer. This serialization can be implemented by developers working on spreadsheet export for their OCDS compatible system.
Additionally, flatten-tool can convert any existing OCDS JSON into the flat serialization. For JSON arrays (one-to-many relationships) an extra sheet is created for each different array, with identifiers of each parent object. This mimics the structure of a relational database.
Flatten-tool can also be used as a command line tool, as a python library and is also built into the OCDS Validator.
When a JSON file is loaded into the validator the user is given an option to convert it to a spreadsheet. For an example, we can submit some OCDS sample data to the validator.
The Validator can also be used to check that implementations of spreadsheet serialization are correct, in the same way that it checks spreadsheets for data input, see below.
Flatten-tool can generate a blank OCDS spreadsheet template based on the OCDS schema. These are also available for download from the OCDS sample-data repository. The template can be filled in and converted into JSON using flatten-tool.
Flatten-tool uses the type information from the OCDS JSON schema to output the correct types for fields in the converted JSON. Flatten-tool tries to produce a readable JSON file, by adding indentation and ignoring cells that are empty in the spreadsheet.
The OCDS Validator uses Flatten-tool to convert spreadsheet files that are uploaded. The Validator retains a mapping of where in a spreadsheet data came from — this means it can validate the JSON against the OCDS schema, and also report where in the original spreadsheet the error occurred.
It’s also possible design custom spreadsheet templates, by adding columns headings in a way that flatten-tool understands. We’ve recently added a Spreadsheet Designer’s Guide to the flatten-tool docs to explain how to do this. There’s a lot of flexibility when designing a template:
- Conversion happens based on the names of column headings, so columns can be removed around and deleted as required.
- Different shapes of templates can produce the same valid OCDS data, because the most convenient spreadsheet representation depends on the particular use case, particularly when modelling one-to-many relationships. For example you might start with the generated template and copy columns into the main sheet where you know it will be a one-to-one relationship.
- Extra data that isn’t in the OCDS standard can be added, and will be included in the converted JSON. It’s also possible to use flatten-tool to generate an extended flat template for an OCDS extension.
The flat serialization of OCDS and associated tooling have had fairly limited real world use up to now, which is starting change, so we will be supporting people with this and dealing with any bugs it raises.
One particular use-case of the flat serialization is for the upcoming Public Private Partnerships profile of OCDS, so we will be looking what changes to our tooling may be needed to support that.
Currently, the heading row of the flat serialization is based on JSON pointer, so always uses the field names that appear in the OCDS JSON. We’re working on an option to use the titles from the schema instead, which would include being able to use local language titles from a translated schema.