How do you spot suspicious government purchases? With roughly 1.2 million tenders awarded in Chile since 2014, we can’t look at each of them individually. But with the help of open contracting, we can automate the monitoring process somewhat and use analytics to tell us which purchases are likely to violate the rules. That’s why we, at the public finance NGO Observatorio Fiscal, have been working in collaboration with the government’s central purchasing body, ChileCompra, to develop a tool for detecting risky procurement.
This “red flags” tool will analyze official public procurement data for the last five years to identify potential indicators of irregular activity in public tenders, such as exceptionally short tendering periods, low participation, or changes to tender specifications. We drew on international best practice for red flag modeling and adapted it to the Chilean context. But many of the indicators we had originally compiled proved impossible to calculate because of gaps or quality issues in the source data. Our plan is to apply the model to tenders in real time and expand our work to assess other modes of public procurement. We also hope to add data about company owners to examine connections between buyers and sellers, which can be useful for detecting possible corruption and collusion.
As we set out this week to learn from experiences across Latin America at our seminar in Santiago, here’s how we approached corruption risk analysis in Chile.
The rise of red flag analytics
In Chile, annual spending on public procurement is around CLP 4.3 trillion (USD 11.5 billion), or roughly 4% of GDP. Procurement plays a significant role in guiding economic development and delivering high-quality public services, but the risks associated with these processes mean they must be transparent and well-managed.
To promote integrity, transparency, efficiency, and competition in procurement, we have been working with ChileCompra to develop a “red flags” model for identifying risky tendering behavior for official review. Red flag risk modeling is one of the international best practices and practical tools for managing procurement risks using data analytics that have emerged as electronic procurement platforms have become more popular and contracting data more readily available.
“Red flags” are indicators of elevated procurement risk at any stage of the procurement cycle – though they alone do not establish impropriety. Widely-used red flags include unusually short tendering periods, relatively few bid submissions, unreasonable bid bond requirements, and wide gaps between estimated and awarded contract amounts. (For an overview, see Open Contracting Partnership guide Red Flags for Integrity)
Our model references Chilean procurement data against a set of observable, collectible, and locally relevant risk variables. Crucially, to ensure the findings are independent and non-partisan, we at Observatorio are responsible for the model’s design and oversight, rather than the government.
Developing the risk model
We began with a thorough review of the legislation and regulations governing public tenders in Chile. Next, we studied the international literature on red flags modeling, compiling, and then narrowing down a list of potential indicators from such sources as the EU, Open Contracting Partnership, the OECD, and Transparency International. We then assigned each variable to one of four stages in the procurement process: planning, bidding, evaluation, and award. In line with European Commission guidelines, our variables have the following characteristics:
- Objective: based on factual data, not stakeholders’ perceptions, judgments or self-reported experiences.
- De facto: derived from actual procurement behavior and events rather than legal prescriptions or expectations.
- Micro-level: defined at the transactional level between purchasers and suppliers, but can be aggregated at higher levels (e.g., by region, by purchaser).
- Comprehensive: adequately capture procurement behavior across a diverse set of purchasing and selling entities.
- Time-series: measured and comparable over time.
We chose to include certain variables specific to the Chilean context, rather than strictly following the European Commission guidelines suggesting that red flag indicators be internationally comparable. For instance, our model examines whether procuring entities estimate their tender values at, or just below, Chilean legal thresholds above which new or stricter administrative requirements take effect.
With our variables defined, we then extracted the necessary data from ChileCompra’s relational database and calculated our indicators for each public tender dating back to 2014 – a total of roughly 1.2 million. Most of our variables are binary in format (i.e., whether or not the tender attribute such as price, but also the date and time thresholds, exceeds a reasonable threshold), but some were calculated as counts, percentages, or ratios – see below for an illustrative example.
Once we had extracted, calculated, and loaded the red flag variables into a new database, we normalized the data to feed it into our model. Specifically, we used four instances of a k-means algorithm – one for each stage of the procurement process – to cluster the tenders into five distinct risk groups. We then refined and interpreted the model taking into account factors such as the size and entropy of our final clusters.
Challenges and next steps
Data quality and availability were the main challenges we faced throughout the project. Regrettably, many of our original indicators proved impossible to calculate once we began digging through the data and bumping into limitations. For example, changes made to tender specifications or amounts – often an indication of misconduct – are logged either unclearly or not at all in the database. Monetary amounts are reported in different currencies without their equivalents in either of Chile’s widely used currency units (Unidad de Fomento and Unidad Tributaria Mensual). Many data fields are simply not populated. Compounding all of these issues, many tables in the database lack an accompanying data dictionary. Idealistic as it may be to expect a spotless database to emerge from hundreds of different users inputting data, there is certainly room for improvement here.
Looking forward, we plan to apply the risk model to real-time procurement data, flagging risky tendering activity based on the model’s understanding of past procurement behavior. Eventually, we hope to develop comparable risk models to move beyond public tenders and assess Chile’s other modes of public procurement: private tenders, direct awards, and framework agreements.
We also hope to integrate beneficial ownership data into the risk model to increase transparency about the natural persons behind bidding entities. Among other things, this would shed light on the connections between buyers and sellers, as well as between bidders (e.g., the submission of multiple bids by distinct, but commonly owned, legal entities).
As our model begins to generate findings in real time, effectively communicating these results to the government will be equally important as the results themselves. So we plan to build a coalition of civil society partners to periodically review the model’s findings and, jointly, present them to government stakeholders. And, to continuously improve the model, we would welcome feedback on the quality and relevance of our results.