Insights from Digital Data

It is only in the last decade that innovative sources of digital data are being used to gather insights which are typically not possible using survey data, writes Kanika Mahajan

Kanika Mahajan

12 September, 2023 | 4m read

Economists and policymakers have been using survey data for many decades to derive insights about various economic indicators over time. However, it is only in the last decade that innovative sources of digital data are being used to gather insights which are typically not possible using survey data. The leveraging of non-traditional data sources has been enabled through advancements in technology.

While we have a rich source of survey data presented in easily readable, downloadable and ready-to-be-visualised format by researchers and policymakers, at CEDA we have been supplementing these with other granular administrative data sources which are difficult to extract from government websites. The administrative data collected by the government are an extremely rich source of granular data. However, these are generally not available in easily usable form. We realised in the early years of CEDA that these should form an important pillar of offered data products. Advances in digital technology have made it possible to collect, collate and present these data almost instantaneously. We currently have two products – Agricultural Market Data and Daily Food Prices Data – that present real-time data on economic aggregates.

First Data

The first data provide commodity level, daily quantity arrivals and producer prices across government-regulated agricultural markets in India since 2001. It continuously updates these on a daily basis as The Ministry of Agriculture and Farmers Welfare in India records these in real-time on its website.

However, CEDA provides it in a form in which the daily data can be easily downloaded in a spreadsheet format with one click for a state. A user interface makes easy visualisations possible to track the trends in product supply and prices. It makes available these variables at a monthly level for visualisation at an all India, State, and District level – both geographic dispersion as well as trends in these over time.

The second data collates daily retail and wholesale prices of 22 essential commodities from more than 100 centres across India since 2009, collected by The Price Monitoring Cell under the Department of Consumer Affairs (DoCA). These are again made available by the DoCA on a daily basis but in pdf reports. The CEDA data portal scraps these and provides them in easily downloadable formats. It provides visualisations of both daily prices at the commodity-centre- zone and all-India level with daily updating of price data. This enables real-time monitoring of granular changes in food prices in the Indian economy. Using these data CEDA has also constructed two daily-level food price indices for the retail and wholesale markets of urban India.

Second Data

The above data have been used by individual researchers in academic papers, by institutions like the International Food Policy Research Institute for research purposes, as well as by state government departments like Orissa for monitoring granular price changes by officials. These have also been used by the media to write articles on the emerging price situation in the country which has been under a lot of heat due to rising prices recently (Mint). The data narratives by CEDA regularly use these data to report the changing price situation in the country. In a recent piece, we use these data to discuss the rise in Tur-dal prices in 2023 which has been higher than the average increase in food prices in the country. It has seen a much higher increase in 2023 vs. other years (Figure 1) and more so in the southern region of the country (Figure 2). We hope to add more such data on the portal in the coming years to contribute to emerging policy issues.

In general, across the world, such data are gaining increasing traction. For instance, the Billion Prices Project which started in the US in 2008 has been collecting price data from hundreds of online retailers. Other sources of data that have gathered traction include social media platforms like Twitter and Facebook, job matching platforms, payroll data, reviews data, taxation data, data on retail payments and financial transactions etc.

However, using such data to draw insights at the economy level is marred by issues of representativeness. Using data from selected markets or selected private entities, unless it forms a very large share of the relevant market, is likely to be plagued with being unrepresentative and capturing trends that may be idiosyncratic to its user base. Thus, using such data to track aggregate indicators needs pooling by multiple players or across many geographies. This remains a promising venture for the future.

(Kanika Mahajan is an Associate Professor of Economics at Ashoka University)

