Decolonising Data for Data Justice

With the emergence of Big Tech and AI, we’ve seen how our data isn’t merely data anymore but is an overall reflection of our lives and the patterns through which we function in society. Datafication is done by both public and private bodies for policy making, understanding patterns and introducing and advancing technologies, among others; but the practice of data collection isn’t new. For centuries, this data has led to excluding certain groups and communities, and has built on further discriminatory policies and practices outside the scope of human rights. Considering this, it is important to understand how datafication may not always be fair and can disadvantage certain groups, and in turn be a practice as old as the roots of colonisation across the globe particularly in the Global South itself.

To acknowledge the full scope of how data collection may have affected certain groups and how it can be linked to colonisation, one must understand since when we have been giving out our data and since when it has been used against us?

Data for Colonisation

Collecting data has always been of extreme significance to colonisers and a form of colonial capitalism in which the aim of the colonisers was not to merely defeat populations but also gain access to their land, power and establish a personal military system of control. To have a strong base, colonisers needed to have an equally strong understanding and data of the communities they were trying to control and access. The atrocities and violence that came about through this data for indigenous communities across the globe have still left a mark on us today as a society. Social imbalances stemming from datafication of the population led to researchers coining a new term called ‘data justice’.

Data justice is a fairly new term which sheds light on how datafication and the practice of data collection has had adverse effects on marginalised groups across the globe, and it has led to less autonomy over data of these groups and more surveillance and social injustices. From Pakistan, Shmyla Khan, Research and Policy Director at the Digital Rights Foundation, believes that colonialism is embedded in data collection practices. “You can see colonialism imbued into many of the major data collection practices in post colonial societies. Data colonialism in the present moment involves the dispossession of data from its subjects, i.e. the bodies whose data is represented, and vesting control in capitalist, patriarchal and white supremacist structures.” She adds, “For instance, data collection mechanisms in post-colonial societies often draw directly from a history of colonial governance and replicate these systems. The state still employs practices such as census and relies on the same categories developed by colonial administrations for exercising control over the general population. Even when the systems are new and employ digital technologies, they often replicate the logics of colonial governance. This can be seen as the dispossession of data through surveillance and platform capitalism structures.”

Whereas, Shyam Krishna, a Postdoctoral Researcher at The Alan Turing Institute working on the Advancing Data Justice Research and Practice project, believes, “There is historical evidence that colonial practices in the South Asian region aimed at solidifying identities as tools of population management had a significant impact on data collection approaches. The European colonial approach, for instance, frequently relied on moral condemnations of gender identities beyond their heteronormative framing and employed racialised perspectives that effectively criminalised entire populations. These classifications found their way into the legal frameworks of postcolonial states, such as the debates around the impact of caste codification in India, the CMIO categorisation in Singapore or the legal regression due to gendered rights throughout the South Asian region.”

Even when the systems are new and employ digital technologies, they often replicate the logics of colonial governance. This can be seen as the dispossession of data through surveillance and platform capitalism structures.

Shyam further adds, “All of these continue to influence data practices in the present day. Ultimately, colonial influence often conflated racial, religious, national, and cultural markers, flattening identities written into datasets and overlooking the diverse complexities within these communities.”

Today we see how this colonial legacy, while not being entirely the same, has still been taken forward by advanced digital technologies, particularly through AI and algorithm bias, and lead to a further complicated societal structure with certain voices being heard while most being stifled and controlled through surveillance. Data colonialism, as we see today, has led to the erasure and misrepresentation of marginalised communities, particularly gendered and religious minorities living in the Asia Pacific region.

Data Injustice In Governance and Policy Making

According to Shmyla, “Data colonialism in the contemporary moment while does not neatly map onto previous colonial structures, many of the communities dispossessed through colonialism continue to be on the receiving end of colonialist oppression including indigenous communities. Take, for instance, the national census, the ways in which these systems are designed lead to the erasure of entire communities that are systemically undercounted.” Shmyla highlights that in Pakistan, during the digital census in 2023, there was severe undercounting of the transgender community. It was reported that the “population of trans people has dropped by almost 35% in the province [Sindh], which shows the bias of the organisers against this community [as] the general population is exploding, while that of the trans people is imploding.” She believes that, “Conscious and systemic biases regard who is ‘counted’ and who is invisbilised means that the data we collect and take important decisions on is not only incomplete, it excludes and harms certain communities. Census data has direct implications for resource allocation, voter registration and policymaking. Data is rarely neutral, it is informed by basis and legacies such as colonialism.”

Shyam also believes that data collection practices have had an adverse effect on marginalised gendered minorities in South Asia, particularly the transgender community. He adds, “One example of potential misrepresentations or erasure of marginalised communities' experiences and voices is the treatment of transgender communities in countries like India, Pakistan, and Bangladesh. The colonial legacy contributed to rigid categorisations and classifications of these communities, perpetuating stereotypes and marginalising their lived experiences. For instance in India, these colonial practices even upended welfare and support systems made available for transgender communities by pre-British era provincial states. The lasting colonial influences have meant that it is only in recent years that we have witnessed positive movements, such as the recognition of a 'third gender' as a data category and the use of LGBTQIA+ inclusive terminology in datasets."

The European colonial approach frequently relied on moral condemnations of gender identities beyond their heteronormative framing and employed racialised perspectives that effectively criminalised entire populations.

Furthermore, examples of data injustice in South Asia are not just subjected to the experience of gendered minorities but also to other marginalised groups and individuals in society. State’s deployment of new systems and practices with advanced technologies has effects on the real lives of people and the resource allocation promised to them. The Aadhaar biometric database in India is one such example; known as the world’s latest database, it was launched in 2009 to combat welfare fraud and assist people below the poverty line with entitlements. The system also collects biometric data of individuals who sign up as beneficiaries under the government’s ration programs. The biometric scanners have, in the past, failed to authenticate fingerprints of those working with stone, cement, limestone, and people over the age of 60 owing to the affect their work and/or age causes to their fingers. The system also only lets a family’s single registered claimant to draw rations, barring other family members to receive it in case the primary recipient is unavailable. Adding to this, the system requires authentication of the recipient through a password that is sent to the registered mobile number with the assumption that everyone has access to a device which may not always be true in poverty stricken households.

In recent times, we’ve also seen states adopting algorithmic decision making systems (ADS) which are systems that rely on analysing personal data and using it to make decisions and policies that affect oppressed people and communities. One such example is that of the Dutch government who used a secret algorithm titled, ‘SyRI’, to try to detect social welfare fraud. The system helped the government investigate suspects of frauds by reviewing their taxes and allowances, and it mostly targeted individuals from poor neighbourhoods. But after a push from civil rights groups, the District Court of Hague termed the system a violation of privacy and demanded a halt on its use. While the pushback against this invasive and discriminatory system was a success, it also illustrates how countries in the Global North, particularly in the EU, are keen on adopting systems with far reaching implications to understand and surveil movements of people from the Global South. From using satellite images and surveillance technologies to understanding how and where refugees move and how to control them when they reach shores are just some of the new surveillance methods being adopted by states. It’s also interesting to note that all these tools are being used to control movements of individuals who’ve been colonised by these powers.

Data Injustice Amplified by Algorithm

The adoption of these technologies and tools has led to an added layer of the data injustice that we see today. While data is being controlled and used for policy making, we now also see that Big Tech has been emerging as an ally of the states and is helping to advance the erasures that post-colonial societies are subjecting marginalised groups to. According to an Amnesty International Report, in 2017 Meta amplified and promoted content that incited violence and hatred against the Rohingya Muslims in Myanmar through its algorithm. Similarly in 2021 Meareg Amare, a chemistry professor in Ethiopia, was gunned down because of his Tigrayan ethnicity, after a series of Facebook posts targeted him accusing him of stealing and selling university equipment. Even though his son, Abraham Amare, requested for the removal of these posts when they started appearing, they were not removed. He later filed a lawsuit against Meta for ignoring his requests and for amplifying ethnic violence against his father that led to his murder.

Nikolett Aszodi, Policy & Advocacy Manager at Algorithm Watch, a non-profit research and advocacy organisation that analyses automated decision-making (ADM) systems and their impact on society, adds, “AI systems should not be technical tools but social technical tools and should be focusing on the larger picture and have a context of deployment. There must be specific standards and transparency on how these systems are being used, particularly adopting Fundamental Rights Impact Assessments (FRIAs).”

She says that information collected through these systems does not represent the world and only certain groups benefit from it. “There must be transparency around certain aspects of data and more accountability on who uses what systems and for what purpose.”

The EU is already in the lead to have a legislation around AI, called the AI Act, which is said to be a model for policymakers across the globe on how to regulate AI. However, Nikolett argues, ‘The Act needs to be more socially and economically fair. [It] offers protection to people but the Act also identifies certain high risk areas which include migration and political asylum, and has more stringent regulation in these fields. There needs to be protections in place particularly when surveillance technologies that are AI powered like lie detectors, motion detectors and drone surveillance are being used by the EU to regulate refugees.”

There must be transparency around certain aspects of data and more accountability on who uses what systems and for what purpose.

Nikolett raises an important point about the scope of AI technologies and absence of adequate regulatory protections when these technologies will be used elsewhere. She says, “There is also not much said [in the AI Act] about what happens when these systems are exported particularly to autocratic regimes where the AI Act does not apply and does not provide people with the same protections."

Making Data Just

Data collection and AI have created complications for marginalised groups and keep on promoting bias and violence against them across the globe. Regulation of systems that control AI and algorithm bias are important, however that regulation must be in line with the fundamental rights of every person around the world. Data collection needs to be looked at in a different manner, and there needs to be a sensitive lens not only around how it is collected but also how it reflects diverse opinions while making sure it doesn’t lead to surveillance by the state or amplified violence by tech companies.

Shmyla Khan of DRF says, “We need to recognise that systemic exclusions and mechanisms cannot be undone overnight, and certainly cannot be undone at an individual level. We need to work towards collective and equally systemic actions to dismantle these structures. We need to invest in and support community-led initiatives, [and plan] how we [can] empower these communities to take charge of their own data. This requires actively working to provide resources directly to communities and dismantle power structures and hierarchies when working with these communities.”

Emphasising on the need to involve stakeholders, including marginalised communities, designing technology, and systems and policies that regulate it, Shmyla says, “Co-creation does not merely mean entering a community and conducting a focus group discussion, steps need to be taken to create an equal relationship between all parties. This often means working twice as hard to undo existing power structures and barriers that these communities face in participation in discussions around data justice and system design.”

We need to recognise that systemic exclusions and mechanisms cannot be undone overnight, and certainly cannot be undone at an individual level. We need to work towards collective and equally systemic actions to dismantle these structures. We need to invest in and support community-led initiatives, [and plan] how we [can] empower these communities to take charge of their own data.

Shyam of The Alan Turing Institute echoes this and adds that to foster culturally sensitive and collaborative data practices, researchers, data scientists and technology practitioners need to reflect on their positionality when engaging with the communities that their actions will impact. “Currently, the dominant view in data collection and analysis is to treat decision-makers’ or developers' priorities and logics, and technical objectivity as central aspects of the project or product. A shift is needed from this narrow view to a more comprehensive understanding that places context, communities and collaboration at the forefront. This shift reframes bias as not just an error to be eliminated but also as an opportunity to reflect on underlying social structures that caused the injustice, requiring ongoing engagement and dialogue with the communities affected.” Shyam adds that these strategies would help, for instance, coders and developers to understand that their role in society is both social and technical. “Communities should become active decision-makers in the design process, and their input should be sought in setting focus areas. For example, co-production exercises like citizens' assemblies and participatory workshops should be used to actively engage stakeholders in the decision-making process.”

It’s important for us as a society to reflect on not only our data collection practices but also ethics around why we’re collecting data and who it is benefiting. We know that due to the quantifiable nature of data, it is open to interpretation and can be used in a manner that benefits one particular party. Data justice should be a practice we implement not just in the work that we do as civil society but also a demand from states and Big Tech as a whole to build a more impartial and fair society.

Add new comment

365 views

Decolonising Data for Data Justice

Data for Colonisation

Data Injustice In Governance and Policy Making

Data Injustice Amplified by Algorithm

Making Data Just

Add new comment

Plain text

Tags

Region

Share

Other sites from APC

About GenderIT.org

Sign up for the genderit.org bulletin