What to do with fields that don’t map*? (*At first)
This post in our technical series explains how to deal with additional information in your data when mapping to the Open Contracting Data Standard.
When mapping from existing datasets to the OCDS, you might find that there is additional information in the source dataset that does not have a corresponding field in the core OCDS schema. In this post, I outline the steps we use to think about modelling this additional information.
- First, start from the user perspective.
- Then, map to existing fields, titles and descriptions.
- Next, explore existing objects and then extensions.
- And finally, develop additional extensions when needed.
Start from the user perspective
I always start with a thought experiment, in which I consider how any additional information might be accessed and understood by three fictional users who only know about the core OCDS standard:
- (1) A user who is using the flattened spreadsheet serialization of OCDS to explore the dataset, and is looking to carry out basic filtering or pivot reports on a dataset;
- (2) A user who is exploring a contracting process using OCDS Show (our demonstrator visualization of OCDS releases);
- (3) A user analyzing data from many different publishers in statistical software, running a set of prewritten algorithms to look for red flags.
In each case I have to think about whether the additional information affects how these users should interpret and understand each contracting process, or whether it is ‘optional, extra’ information. From the user perspective, ‘optional, extra’ information is data that would add depth and detail to an understanding of a process, and that might be important to understand the particular details of a specific contracting process, or type of procedure, but that can also be ignored without getting a entirely incorrect understanding of the process.
This is particularly important to consider as one key use case for OCDS is interoperability. It should be possible to gain a reasonable understanding of even extended datasets using tools that only understand the core schema, and in line with the conformance statement, extended data should not change the meaning of data represented using the core schema.
After considering the user who is unaware of this additional data, I then think about the users who really care about the additional fields. Who are they? What analysis would they want to do with the additional data? Do they need particularly structured data, or just descriptive fields? Do they need to combine this additional data from multiple publishers, or is this information specific to a particular publisher?
Answering these questions helps to focus attention on whether the later steps will need full OCDS extensions, or might be handled by mapping data to an existing object.
The last user to consider is the publisher themselves. They may be concerned about how the additional data is represented. It may be a legal requirement that this data is published, or it may be that the data is relevant to an internal business process that they want to make sure is transparent.
Data publishers are also often specialist data users as well, and they may be concerned with how to use additional fields for their own business analysis with their published data. The challenge here is to work out whether the additional fields are only useful to the publisher (in which case, they might be collected together in a custom object, described with a local extension), or whether they have informational or analytical value for other specialist users.
A publisher may have two additional fields in their source data: ‘InternalSupplierSystem’ and ‘SupplierSize’.
Here, ‘InternalSupplierSystem’ is a legacy field which describes whether the supplier’s registration was processed in the ‘old’ or ‘new’ supplier registration system. It is useful to the publisher, but has very little value to other users, as the publisher is outputting public organization identifiers for all suppliers. The publisher could reasonably omit ‘InternalSupplierSystem’ from their published data, or, if it is useful for them to have this information in published OCDS data to support their own analysis of data quality (for example), they might include it as ‘x_InternalSupplierSystem’.
By contrast, for a ‘SupplierSize’ field we can envisage use cases that would allow specialist data users to analyze the distribution of contracts between large and small suppliers. This is a good candidate for adding as a community extension of [partyDetails].
Map to existing fields, titles and descriptions
A good strategy to make sure generalist users can still understand a contracting process with additional fields is to consider when the additional information can be included within titles, descriptions and other existing fields.
A publisher may have a field that classifies a procedure as a ‘competitive dialogue’ and may have additional fields used to describe a ‘competitive dialogue’ contracting processes.
If it is important to flag to users that they should exercise caution when analyzing competitive dialogues alongside other types of contracting processes, the publisher could apply a template to add ‘Competitive dialogue:’ as a prefix onto the tender.title field.
The additional dates involved in a competitive dialogue could be modelled using the tender.milestones array with clear text descriptions of each milestone, generated programmatically, allowing tools that only understand core OCDS fields to display reasonable information about the workflow of this procedure.
However, this kind of simplified ‘human centered’ mapping to existing fields does not preclude also mapping to more structured fields and extensions.
Explore existing objects and then extensions
Sometimes it is possible to model additional data using existing OCDS objects in ways that make sense for both specialist and generalist users, without affecting data conformance (i.e. without changing the meaning of those existing fields).
If a source system has fields for ‘adjudication panel date’ and ‘adjudication panel members’ then the existing tender.milestone fields can be used to represent the adjudication panel date, and the parties array (introduced in 1.1) can be used to include a list of panel members (note: we are ‘between terminology’ with parties, having moved to use a term that can describe both individuals and organizations, although the underlying schema properties are still called ‘Organization’ and ‘OrganizationReference’)
When existing OCDS fields are not sufficient, the next step is to look for existing OCDS extensions in the extensions registry. Whilst a publisher cannot assume that basic OCDS tools will be aware of extended fields, the extensions provide common agreed ways to model additional fields.
If a source system has additional fields to describe the bidders on a contract, then the bid extension can be used to record structured information.
Extensions may add single fields to an existing object, or they may introduce entire new sections and objects. An extension provides a definition of each field and object it introduces, as well as relevant codelists. The same conformance rules as for the standard apply.
Develop additional extensions when needed
When the additional data to be published cannot be represented using existing core fields, or existing extensions, then it may be time to propose a new extension to the OCDS.
To work out whether to focus on a local extension (the simple documentation of additional fields, likely to be only used by one or two publishers), or a community extension (created through dialogue amongst multiple publishers and users to agree and document a common data model for additional fields), it is important to ask:
- Are there other publishers who could have information like this to publish?
- Will their additional information be identical, or only similar?
To some extent this is a matter of perspective, and how abstractly the question is asked. Practically, there is a balance to be struck here between abstract modelling that might be useful to other publishers and users in future, and context-specific modelling that is quicker to develop and deploy. Deciding which path to head down may depend on questions of timing, and how easy it might be to update (or complement) the mapping in future from a specific to a more general solution.
In Moldova a field exists to link contract items with their budget-funding source using an IBAN number. This appears to be a Moldova-specific practice, and could be modelled using a local extension and a single sourceIBAN property on item.
Alternatively, if there is a sense that a link between individual items, and the source of funding for those items is something that will exist in a number of countries, then a more general model with a fundingSource object attached to item, using the OCDS pattern of having a scheme and id to allow identification of funding source using a range of identification schemes, including IBAN and others.
Once developed, extensions can be logged in the extensions registry for other publishers to discover and work with.
Next week, I’ll share in detail how to create and manage extensions. In the meantime, get in touch if you have any questions!