This is the seventh article in the weekly series of Expert’s Opinion, where we talk to academics who study AI or other emerging technologies and their impact on society and the world.
This week, we spoke to Sanjana Krishnan, Partner and Christian Franz, Co-Founder and Partner, at CPC Analytics, a data-driven policy-consulting firm with offices in Pune and Berlin
Sanjana Krishnan is a graduate from the Tata Institute of Social Sciences in Urban Policy and Governance, and has worked with state governments on open data and is a member of the DataMeet Open Data Community in Pune. Christian Franz has a Masters in Public Policy from the Hertie School of Governance and has worked on several healthcare and data projects.
Analytics India Magazine caught up with the duo to get insights into their recent research on the Cross-border Data Trade For AI done in collaboration with Bertelsmann Stiftung in Germany.
AIM: What is the significance of cross-border data trade for AI? Does Europe need it in the light of various digital strategies the EU has employed over the last few years?
Krishnan and Franz: The European Commission has certainly launched an impressive set of policies over the past few years. It will take some time until the policies are filled with life in the 27 member states of the Union. With its data strategy, the Commission has shown that it has understood a truism of the data & AI economy: Next to computational power, a key ingredient of current AI applications is access to vast amounts of data. Having access to datasets that fit the demanding requirements of the current AI-workhorse technology and machine learning is becoming a necessity.
Firms adopt several methods to build such datasets. Large Silicon Valley-based firms have succeeded because of data network effects with a large user base for their products, the accretion of data increases resulting in a better-trained product that further attracts users. Chinese firms have built their databases mostly from their vast domestic (and highly competitive) market activity. For specific use cases such as training autonomous vehicles (identifying pedestrians, road signals and traffic lights), or identifying cancerous tumours and brain lesions, the data is first manually labelled and annotated by human ‘data labellers’ so that the AI systems can identify and learn from it. In the absence of this, AI relevant data could also be ‘traded’ between firms at a global scale by easing barriers to cross-border data transfer.
The EU’s push for common data markets within different industries could be simply seen as an approach to ensure that the wealth stored in data does not anymore remain locked in different data silos but can be used by companies to build more innovative products.
AIM: What are some of the major roadblocks when it comes to cross-border data trade and what would it take to establish a global data sharing act?
Krishnan and Franz: How challenging cross-border sharing across countries is, can also be observed in the European Union. Data protection and data privacy is just one important piece of governance that needs to be in place. Let me give you an example: Just think of the difficulties we face in any given hospital around the world: A myriad of different IT-systems that are not designed to speak to each other. Data is entered in many different ways and situations making much of the human-generated data unlikely to be useful for analysis that’s going beyond mere monitoring. And lastly, data generated in the regular processes a hospital represents a valuable asset – what incentive does the hospital have to share the data?
I am giving you this example because it is vital to consider the challenges at a micro-level before elevating the entire discourse on the geopolitical level. Yes, we see a push to make technology another dimension of industrial policy and geopolitics. And rightly so! But let’s face it: There are solutions to establishing cross-border transfers of data. In our conversations with experts from India, we have often heard concerns about GDPR compliance. At the same time, when asked whether the experts’ companies have found ways to establish the cross-border transfer of those data, they all said yes. In reality, I do not believe it is an insurmountable obstacle.
A global data sharing act is unrealistic at this point. What we argue is that it will suffice for now to build bilateral data exchange platforms, support the exchange of knowledge among practitioners and build common standards from within communities of practice.
AIM: From a bilateral perspective, do both countries or regions need to have strong data protection rules and frameworks or can one country’s laws be leveraged for the trade?
Krishnan and Franz: The most important action would be to pass and enforce robust regulation on data protection. Even though the Supreme Court has recognised privacy as a fundamental right under Article 21, the current state of data protection is not adequate to ensure the data collected in India and crossing Indian borders adheres to the expected ethical and data protection standards. Strong implementation of the PDP bill and establishment of a DPA promise considerably higher levels of data protection in the country. This is a necessity to ensure adequate protection for data from India and increase the trust of its trading partners, specifically countries in the EU that have set high standards with the GDPR.
Europe’s data economy is estimated to reach a volume of €550 billion by 2025 (4% of the overall EU GDP). A report of MeitY declared that the digital economy of India (somewhat broader definition) will be $1tn by 2025. The opportunity of building strong connections between these two markets is too big to pass because data privacy standards in India are not established or enforced in an insufficient way.
AIM: Considering both parties have robust privacy laws, what are some of the major roadblocks when it comes to cross-border data sharing?
Krishnan and Franz: The first requirement is to enhance the quality of data and ensure that it meets the technical specifications that AI requires. Most of the data generated in India in its current form is not fungible, tradeable and hence not usable by AI across borders and contexts. A concerted effort to produce large, comprehensive, labelled datasets is needed. Simultaneously, the possibility of bias also needs to be recognised and accounted for, especially when AI directly interacts with humans (such as chatbots) or impacts their lives. The risk of bias – if not considered – will significantly damage the trust in the application.
Second, India needs to urgently and strategically beef up its internal capacity to become a relevant player in the AI landscape. While India ranks low on digital skills, there is a larger percentage of people enrolling in AI courses which shows the anticipated demand of these skills in the market as well as the readiness of talent to adapt to the change. A high hiring rate for those with AI skills (higher than the US, Germany and France) indicates dynamism in the demand for AI talent. In the private sector, while India ranks high on the number of AI-based start-ups (higher than China), US and China far outstrip India when it comes to private investment in AI. India also needs to increase its investment in research.
AIM: The study mentions the threat of ‘data colonisation’. Going ahead, how do we avoid the exploitation of India’s data resources and ensure both parties get equal benefits from the cross-border data exchange?
Krishnan and Franz: When speaking of India as a “data-rich” country – a statement that has motivated many actions in addition to the publication of the study that we did– we always run the risk of following such a narrative. A one-dimensional framing of India as a strategic reservoir of data for the development of AI for the benefit of the trading partners’ domestic AI firms comes dangerously close to echoing historical colonial practices of economic extraction. It is no coincidence that researchers and activists from various disciplines and regions have warned against an increasing “digital colonialism,” “data colonialism” and “algorithmic colonisation.”.
Any meaningful cross-border data partnership for AI will have to consider these different power dimensions in the digital world. They should help create a common knowledge base for a future, a balanced partnership of equals. AI is a technology that can lead to discriminatory, marginalising and outright violent outcomes. Like India has done in its National Strategy for AI, partner countries should subscribe to a vision of AI that reinforces ethical values and benefits all of society. This vision also needs to be embedded into the efforts meant to create a closer data exchange for AI development.