“These are my contracts, and if you don’t like them… I have others,” reads the Todos Los Contratos website which scrutinizes Mexico’s public contracts, riffing off a joke by the comedian Groucho Marx. “Mexican authorities seem to follow this maxim when awarding contracts and reporting on them,” it continues, before explaining that around 1% of the country’s GDP is wasted due to corruption in procurement, according to conservative estimates.
PODER, the civil society organization behind the platform, is committed to bringing more transparency and accountability to contracting processes in Mexico, and has built an innovative solution for this purpose, centered around a single database of public contracts, which they created by aggregating millions of records from three different government sources. Covering almost MXN 30 trillion (US$1.5 trillion) worth of public spending, the database connects to several digital tools, which PODER developed to explain corruption and mismanagement in the Mexican procurement system. They relied on open contracting technology and free software solutions, and published detailed guidelines on how the software works to make it easy for journalists, programmers and analysts elsewhere to develop their own versions of the tools.
It’s an extraordinary approach, employed by an interdisciplinary team of developers, journalists and researchers, so it’s no surprise that the project recently won a Sigma Award, which recognizes outstanding data journalism from around the world.
PODER wanted to show that “it is possible to measure corruption based on public contracting data,” the team said. “We are starting to see the possibility of one day no longer relying on corruption perceptions surveys.”
Creating the database was a painstaking process that involved cleaning data on four million public contracts awarded by the federal government from 2001 and 2019, and standardizing them according to the Open Contracting Data Standard format.
The website TodosLosContratos.mx (“All The Contracts” in Spanish) answers important questions about Mexico’s procurement with explainers and simple tables outlining how the procurement process works, along with typical “bad” practices and a ranking of procuring entities scored using algorithms, while the search engine QuiénEsQuién.wiki and an API offer access to the full database.
PODER says the project has simplified the work of conducting investigations about public contracts, with at least a dozen Mexican and international news outlets (such as El Universal, Proceso, and EMEEQUIS) using the tools for their own reporting. It has also encouraged greater transparency in public contracting — three government agencies approached PODER with an interest in improving or uploading new data to the platform and the parties are discussing how improvements to their open data strategies could feed back into the civil society organization’s tools.
Ordinary people are better informed about public procurement too. PODER says visits to the QuiénEsQuién.wiki platform are on the rise and every week they receive messages from people who have concerns or questions about contracts or their participants.
Key technical features:
- Data Import: an importer and web scraper orchestrator were developed based on the free software Apache NiFi. This modular software offers a simple setting for reusable components like the data cleaning module or the data update module.
- Platform and API: QuiénEsQuién.Wiki is based on a mongoDB+node.js, all the data is hosted in a Kubernetes cluster of MongoDB databases and then exposed through a public API which is documented both in Spanish and English. Plus a model client in node js is usable with the NPM package registry. The website consumes the API and is compatible with desktop, tablets and mobile devices.
- Algorithmic analysis: The «groucho» engine for analyzing open contracting data in the OCDS data standard. The engine is published with a GPL license, which makes it reusable and transparent. It’s written in Node.JS.
- Data analysis: In order to fine tune the parameters of the algorithmic analysis engine we have combed through the data with the help of Kibana, an open source data visualization dashboard based on the ElasticSearch database engine, which helped us to quickly recognize patterns and detect deviations.
- Data visualization: Our data is nicely presented using custom designed web-based interactive graphs and maps using primarily the D3.js library.
TodosLosContratos.mx code on GitHub
API documentation in Spanish and English
OCDS extensions developed for QuiénEsQuién.Wiki cases