LegalTech for Consumer Protection

Legal Technologies for Consumer Protection is a Maastricht University interdisciplinary research conducted by the Maastricht European Private Law Institute (M-EPLI), the Institute of Data Science (IDS), and the Authority for Consumer and Markets in the Netherlands (ACM). This research explores the legal and technical means for developing innovative and transparent LegalTech software for consumer protection purposes.


drawing


Background


The project builds upon work carried out for DJ Justice and Consumers Exploring IT/AI tools for monitoring online markets for consumer policy purposes JUST/2018/CONS/PR/CO01/0123 and a follow-up project collaborating with the Dutch authority for consumers and markets. This endeavour got granted an NWO-IDG award by the research idea of Using AI for Consumer Protection: creating AI-based persona for mystery shopping. The project aimed to develop and improve methods for monitoring e-commerce and online advertising given the detection of infringements of existing legal rules and further policy development in the fields of consumer law, anti-discrimination law and privacy protection, building on interdisciplinary law and technology research.

The 2019 Consumer protection regulation allows consumer authorities to conduct data analysis and online mystery shopping. For example, a consumer authority can determine how the online seller behaves by pretending to be a consumer. In particular, a consumer authority can investigate whether a consumer is shown different prices or adverts by comparing other traits (e.g. a "deal hunter", "negligent shopper", or "elderly female").

These selling practices are made possible by personalized algorithmic content using data analytics and computer engineering that often cross the line of the prohibition of unfair commercial practices and infringe the rights of consumers.

A new breed of legal technologies appears to be equipped to perform investigations on these issues. Not only consumer law enforcement bodies are primarily benefited by the development of these technologies but also scholars. The goal is to study challenging questions and implement prototypes both from a legal and technical perspective.


Publications


Editorial: Discrimination in Online Advertising
Maastricht Journal of European and Comparative Law - Open Access
Read the paper: DOI: 10.1177/1023263X211022526

A New Order: The Digital Services Act and Consumer Protection
European Journal of Risk Regulation - License CC BY 4.0
Read the paper: DOI: 10.1017/err.2021.8

An exploratory study of information technology (IT) and artificial intelligence (AI) tools for consumer protection in digital markets
Symposium Consumer and Data Privacy 2020: Conference
Poster and Findings: DOI: 10.13140/RG.2.2.10144.30724
Read the report: Public version

Researching algorithmic unintentional discrimination in online advertisement and e-commerce
Working paper: Preprint
Presentation: Slides


Software


By definition, online automated systems that use algorithmic personalization or recommendation techniques are trained on user internet browsing's data. This internet browsing data comes from different sources (or a combination that usually is not disclosed by the tech industry). For example:

  • Google Ads impressions are based mainly on users' browsing history or IP locations.
  • Google Search results use hardware data (i.e. what OS are you using).
  • Prices of products of some retail websites are based on the user-website interaction helped by cookies.
  • Social Media Ads are mainly based on the user's behaviour in that social media platform
  • etc.

Therefore, as of 2021, no one-size-fits-all technique can be used to create AI-based Personas for online mystery shopping. However, it is possible to implement separate LegalTech software for investigations and compose user interfaces to unite them all.


Facebook Ads Auditor

This technology is a ready-to-use Ads Analysis Auditing Tool written in Python that uses the official Facebook Graph API to access the Facebook Ads library and gather Ads data at scale. The methodology and limitations of the code were tested. We performed a pilot data analysis on more than 2,000 Facebook Ads from the Netherlands on the housing and credit topic as a use case.

What can be used for?

  • It can be used for investigations on particular Facebook Ads campaigns that are suspected to be scams, misinformation campaigns or ads that have discriminatory nature. The analysis gives insights over the targeted audience demographics, the money spent and who was the campaign creator.

Limitations

  • No GUI (graphical user interface)
  • The user needs to have a potential suspicious topic or campaign in mind
  • Facebook Ads API only allows accessing the Social Issues, Elections or Politics category.

Facebook Ads CrawlerBot

This technology is an in-house build Python script that simulates user-agent browsing behaviour in the Facebook news feed. This script is based on the Selenium web-drivers family and aims to scrape and collect ads in the form of screenshots and raw HTML Ad source code. The methodology was tested by collecting ads based on different traits.

What can be used for?

  • It can be used to conduct investigations of discriminatory ads shown to certain demographic groups (e.g. middle-aged female gamer) given the over personalization of users news feeds in Facebook.

Limitations

  • No GUI (graphical user interface)
  • The user needs to have access to a Facebook accounts that can represent groups of interest at least once.

Google Ads PersonaBot

This technology is an in-house build Python script that simulates user-agent browsing behaviour in a set of predefined websites that are proxies of consumer traits. This script is based on the Selenium web-drivers family and aims to train user-agents representing consumer traits. The methodology was tested in a small set-up where screenshots were collected based on different traits.

What can be used for?

  • It can be used to conduct investigations of discriminatory Google ads shown to certain demographic groups (e.g. religious male elderly) on popular websites.

Limitations

  • No GUI (graphical user interface)
  • The user needs to define the desired websites to investigate, we used:
    nu.nl, msn.com, buienradar.nl, bbc.com
  • The tool considers around 20 Personas. If the user wants to add a new Persona, then it has to be manually added
  • Some predefined Persona groups might be not 100% accurate

Dutch Prices CrawlerBot

This technology is an automated web scrapper, highly configurable and integrated with CD/CI Github deployments. This in-house build library is a Python-based Selenium web-driver that automatically visits popular Dutch retail websites and collects products information, including descriptions, prices and sellers. We tested the methodology by collecting data on more than 200 Dutch Products

What can be used for?

  • It can be used to conduct investigations of algorithmic pricing and online commercial practices from different sellers in retail websites.

Limitations

  • No GUI (graphical user interface)
  • The tool only collects data and put it in tabular fomat. The analysis is not included.
  • The tool only works for the following websites:
    ah.nl, bol.com, bonprix.com, coolblue.com, lidl.com, mediamark.nl
  • Web scrappers technologies depend entirely on the HTML source code of the website. Therefore, when one of those websites changes, the tool won't work for the upgraded website.
  • The tool can be executed on the Github cloud or from a local device. If run locally, the IP address will be revealed. If the user wants to change the geolocation, it has to be manually done or combined with a VPN.

Repurposing BI tools

BI (business intelligence) technologies are licenced software for data analysis (e.g. Tableau, KNIME). These tools are popular since allowing big data manipulation using powerful visualizations without requiring coding skills. We tested it for consumer protection issues using the data extracted from the Dutch Prices CrawlerBot. The combination of both tools is an excellent example of how to enhance online consumer investigations.

What can be used for?

  • It can be used to conduct investigations of algorithmic pricing by comparing the trend of prices of selected products among different sellers in retail websites.

Limitations

  • It requires a paid license
  • The user must have ready to use clean data on consumer related issues.

Spoofing Extensions

Spoof Timezone and User-Agent Switcher and Manager are external open-source tools that can be used to fake one's location and user-agent (device, browser, operating system). Although the tools come as browser extensions, we have tested them and created tutorials for investigation purposes.

What can be used for?

  • It can be used to conduct manual investigations of anything related to personalization in the web where the personalization algorithms specifically based their information on the user's hardware (e.g. mobile device) or the user's location (e.g. Maastricht).

Limitations

  • Only works for Google Chrome browser
  • These are browser extensions and do not collect data. In case the user wants to collect data, it has to be done manually.

Emulators

An emulator allows the user to run programmes that are built for another operating system than the one available on the user's device. To simulate that a users is working on a mobile phone or tablet, when using a desktop or laptop, android or iOS emulators can be used

What can be used for?

  • It can be used to conduct manual investigations of commercial practices in the application's interface experienced by the consumer due to personalziation (UI/UX or user-interface/user-experience), for example, dark patterns or personalized pop-ups (e.g. ads).

Limitations

  • Some are paid, others are free. Some need to be downloaded, others run online.
  • The legality of emulators is accepted in the US. In the EU, the matter is still unsettled. Yet, it has been argued in legal scholarship that the creation of emulators for operating systems without the authorisation of the rightholder of the operating system is permitted under the conditions of Article 6 of the Software Directive

Can these technologies be used in anonymity by consumer protection authorities to conduct investigations? Yes
More info on spoofing technologies section under the "Undercover agent anonymity" section.



Webinar on Researching Discrimination in E-Commerce and Online Advertising:
4 and 5 March 2021: Original event
Webinar report: Read the report

DSRI: Data Science Research Infrastucture
Website
All our software is build and tested at scale in the DSRI cluster for computer science research

M-EPLI: Maastricht European Private Law Institue
Website and M-EPLI talks

Related Work

AdAnalyst is a browser extension aimed to collect information about Facebook Ads and investigates advertising practices at scale, this work is the result of a research project, learn more here.

OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. learn more here.


People


Pedro V Hernandez Serrano | Data Scientist/Data Policy Advisor, Faculty of Science and Engineering
p.hernandezserrano@maastrichtuniversity.nl

Caroline Cauffman | Associate Professor in Consumer and Competition Law, Faculty of Law
caroline.cauffman@maastrichtuniversity.nl

Research Assistants

  • Kirill Shchervakov ‣ Computer Science Graduate
  • Pranav Bapat ‣ Computer Science Graduate
  • Laura Robinson ‣ Data Science Graduate