How Datanomix.pro is using AI to fight fraud and misconduct in Kazakhstan

This blog post is part of our series exploring the impact of AI on public procurement and cutting through the buzz. It has been written by Oxford Insight’s team Jasmine Kendall and Joram Brouwer who spoke to Vitaliy Trenkenshu, a managing partner at Datanomix.pro in Kazakhstan about the company’s tool Red Flags Management.

The problem: misconduct in procurement processes

Procurement processes are prone to fraud and misconduct. The European Commission calculated that Member States lose around €120 billion (US$135 billion) each year to corruption through illegal practices, such as specifications tailor-made for certain companies, conflicts of interest in evaluating bids, collusive bidding, and unclear selection or evaluation criteria.

The government of Kazakhstan has made it a priority to fight inefficient use or theft of public funds. By 2015, Kazakhstan had developed a centralized e-procurement portal and made its use mandatory for every contracting authority. This means all data about transactions is kept on record, in theory enabling full auditing of transactions. Vitaliy explains: ‘We have a digital footprint of everything. It is 100 million rows of data, but this makes it impossible to audit it all manually.’

What Datanomix.pro did about it

To enable data-driven auditing, Vitaliy and the team at Datanomix.pro developed a new tool, Red Flags Management. Its purpose is to help the State Audit institutions identify procurement transactions that are deemed risky (raise a “red flag”), and save taxpayers’ money by detecting fraud.

A significant share of public procurement information in Kazakhstan is unstructured and consists of documents like technical specifications in PDF format. Such documents need to be transformed, in order to perform data analysis. Red Flags Management does this by feeding the unstructured text about a procurement transaction into a large language model (LLM), which extracts specific data elements. For a human, this would require too much time, whereas a computer can quickly interpret vast amounts of information, and recent LLMs can perform this task with acceptable accuracy.

*The user interface of Red Flags Management*

How Red Flags Management works

The tool works by calculating 43 ‘red flag’ characteristics that indicate procurement risk, for each procurement transaction.

The system scans the massive procurement dataset each day. It consists of two parts. The first part uses rule-based algorithms: whether a transaction should be red-flagged is first determined using corruption risk indicators for overpricing, collusion, and irregular invoicing, among others. For example: the price associated with a transaction is compared to the average price for its product group. This part does not depend on AI.

The second part uses an LLM. Any data that could support procurement auditing is fed into the model in an unstructured format: procurement data from Kazakhstan’s e-procurement portal, technical specifications of products in PDF format, and additional databases State Audit Institutions have access to such as invoice data, tax data, ownership data, and citizen records. The PDF files of technical specifications are downloaded and parsed into text. Then, the LLM extracts specific data. For example, the LLM extracts attributes such as product type and quality differences (e.g. A4 paper versus photo paper), which are critical for accurate price comparison. For each transaction, the LLM evaluates whether it exhibits any red-flag characteristics. If it does, the transaction is flagged for further review by a government official.

Different LLMs (such as OpenAI’s GPT 3.5 turbo integrated using an API) are used depending on the exact use case. If better-performing models become available, it is relatively straightforward to switch to another LLM.

What the tool can detect

Part of Red Flags Management’s value lies in its ability to detect shell companies. A shell company is a registered company that has no significant assets, operations, or employees of its own, and which can be used as a front for illegal activities. Signs of shell companies that the system might pick up on include customers buying everything from the same supplier via a single source model: for example from toilet paper to IT systems. The tool can also make use of tax data to detect shell companies; such as a business with large numbers of sales but no employees.

It can also detect cases where input and output invoices do not match, and cases of nepotism and corruption, by using data provided by the government on family relationships and cohabitation.

However, when it comes to saving taxpayers’ money, the most important feature of the tool is that it can detect overpricing (available only for goods, not works or services). Using data on competitive procurement procedures, the tool calculates a normal range for different goods and compares this to invoices to identify suspicious transactions.

*Overview of transactions that have been red-flagged for overpricing*

Where the Red Flags Management tool is used for law enforcement purposes in Kazakhstan, humans are involved: an official decides which cases to follow up on after a transaction has been flagged. The goal of the tool is to identify risky transactions with a human always making the final decision.

Future improvements

Vitaliy plans to improve the tool by employing a machine learning-based price prediction model, replacing the current rule-based approach to improve accuracy. This model will draw on data from competitively awarded contracts via the e-procurement API, including details like product names, product characteristics, and contract amounts. For products traded in small quantities, additional attribute data will be extracted from technical specification PDFs using an LLM. Historical and current exchange rate data will also serve as an input, to make sure that international price fluctuations are taken into account.

Another planned feature is a detection system for cartels and antimonopoly law violations by using supplier data such as addresses, details about founders and executives, and IP addresses used to access the e-procurement portal. With this data, Vitaly expects to be able to identify collusive networks. In addition, by analysing bid data across suppliers, the tool will detect patterns like cross-participation in tenders, suspicious price reductions, and other unusual pricing trends that may indicate anti-competitive behaviour.

The result: saving taxpayers’ money and a move to competitive tendering processes

The use of Red Flags Management has had a tangible impact on the public procurement process. For example, when a supplier’s bid for a government contract is significantly higher than the statistically derived fair price range, the tool flags this anomaly, prompting a closer investigation. In a specific example Vitaliy shared with us, the tool found that reams of paper that would be sold for between US$5-8 in a competitive market were sold for US$500. Detecting such cases has helped to prevent overpricing and corruption before funds are misallocated.

Vitaliy shared some impressive high-level results: US$22 billion in spending is analyzed annually, saving an estimated US$86 million.

The tool also contributed to government bodies adopting more competitive tendering as part of a broader reform.

Lessons for the community

We can learn several valuable lessons from Datanomix.pro about how to use AI in public procurement.

1. Government data availability is crucial

The success of such a tool largely depends on the availability of high-quality procurement data and the ability to combine this with broader datasets from government departments, such as beneficial ownership & tax reports. Governments that want to use similar tools must focus on improving data completeness, as well as on the capability of combining different data sources, including through unique identifiers.

2. Promote cross-functional collaboration

Developing AI solutions for procurement requires that people talk to each other who usually might not. Vitaliy suggests starting ‘cross-functional project brainstorming groups’ to combine expertise between business and data analysis teams. Bringing together and learning from multiple groups can ensure well-rounded goals and determine the operational steps to successfully implement a solution like the Red Flags Management tool.

3. Be very concrete about the role AI plays in a process. Give it a clear goal and determine how it works together with humans.

If used, AI must have a clear and specific purpose. In Kazakhstan, the Red Flags Management tool is designed to identify risky transactions for further human review rather than replacing human judgment. This collaboration between AI and human oversight ensures that tasks beyond human capability, such as analyzing data and documents of millions of transactions, are automated, while nuanced decisions remain in an official’s hands.