Building on groundwork investigating open data supply, we’ve been experimenting with different ways to categorise and structure data.
The super details
Each dataset included the following columns: Original Name, English Name, Formatted Name, Standard Name, Description of Field, English Translation of Description. These were created to have clear links from the original dataset names to names that are finalized for ODC. We decided to categorize the dataset information in linked concepts: Phase – Entity – Group. All field names we found in the datasets are integrated in these 3 linked concepts. E.g., tender (phase) – solicitation (entity) – tender_status (group) We also added information about the field names in terms of ‘Allowable Values’ and ‘Data Types’ (shown in detail below).
Allowable Values are constrictions to free text inputted and exclusive parameters given. For example, the field name “tender_openness” can only be Open, Selective or Limited; these terms are the allowable values. Data Types represent the format of the data in the following categories:
- Single Select
In addition to the “hugs” of these datasets, we brainstormed visualizations for dataset comparisons across countries. An example of a potential target comparative capability would be to have visualization capabilities across jurisdictions, as rendered in Georgia’s procurement portal.
Contracting Data Comparison: Dataset Descriptions
After the PyCon Sprint, we continued building on our in-depth analysis of datasets. We now have 20 datasets from priority countries, regions and cities: Canada, Chile, EU, Georgia, Korea, Mexico, Moldova, Nepal, UK, UNOPS, NYC (US), San Mateo County (US), Uruguay.
Note: 6 more datasets will be analyzed from the World Bank, Philippines, Colombia and US. Following is a list the of concepts we created to categorize the dataset fields:
- CONTRACT FEATURES
- CONTRACT TRACKING
- AWARD FEATURES
- AWARD TRACKING
- GOODS / SERVICES
- TENDER FEATURES
- TENDER TRACKING
- ADD ON
Along with a concept for each dataset field, we added a standard name, data type and a description (if available). Throughout the analysis we added notes in the Dataset Description, where we discussed similarities, differences and unique features of each dataset that we included. This analysis helped us better understand the contracting picture in creating concepts based on the terminology already used by countries. Reviewing these datasets also helped us determine the focus of the data, and what countries are publishing through their specific portals – both Open Data Portals or E-Procurement Portals.
Contracting Data Comparison: Datamaps
We built a datamap to accompany the in-depth dataset analysis and help communicate the contract modelling.
In the example above, we used Georgia’s Tender Data dataset download. Note that some concepts have no circles, indicating a lack of information given on those concepts / no fields available in the dataset. In this case, AWARD TRACKING, CONTRACT FEATURES, CONTRACT TRACKING and DOCUMENTS are missing. To clarify the concepts, we added a notes section and a list of all the dataset fields below the visualization of the datamap. The notes section includes information particular to that dataset download, changes we have made and the challenges encountered in the process of analysis. We included a datamap Fields list to our datamap to display all concepts based and categorize the information based on how many times they were used by all 20 datasets.
High level overview
The datasets and datamaps helped us construct a picture of the type of available contracting data. They have also helped add to the conceptualization of the contracting data model. Here, we started building on the model from a high level overview perspective. The conceptual model has build on the idea of a contracting journey, where the different phases of contracting have a unique contracting journey has a unique bidding process; eg. a framework contract might have multiple contract documents although it is one unique contracting journey The Open Contracting Data Standard will specify:
- Releases of data – when publishers put out notices – from all phases (tender, amendments, etc.)
- Summary contract record – overview of the current state of the contract to look at the current view of the contract without looking at different phases. The heart of the contracting record is all the records released can be compiled into one by the standard.
The core components include:
- Contracting features
- Contracting process – e.g., tender information – award process – how many bidders are available (Georgia gives a good example of this)
- Add on information – publishers or users want to augment their standards with the system – having their own unique identifiers
Note: the vocabulary section can be changed
- Aspiration for people to share contracting data
- Aspiration for people to share information in a more homogenized way (e.g., document classification would be an ideal set of categories; find ways to deal with other pertinent issues: e.g., IATI – commitment, legal vs non-legally binding in countries)
Our challenge now: this data is only partially linkable. The separation between releases and metadata is key: 80% of people will look at records and 20% will want to investigate contracts. Our use cases should validate these two models.