March 10, 2022
What even is a cookie?
Data collection techniques made possible by the internet: cookies, trackers and everything else.
In my last post, I touch on the inherent capacities for data collection and feedback to advertisers of the internet. I provided some context about the two-sided market model and how it changed and grew with each technological invention. Here, I look to explain further the technical capabilities for data collection that the internet allows for, and that have revolutionized so drastically the mechanics of the two-sided market model.
Online media can learn valuable information about its audiences, on the aggregate and individual level. The methods through which they can collect this information vary significantly in their complexity, reach, and detail. One way of segmenting audiences on websites was very similar to the methods used in offline advertisement. By placing the ads on themed websites and even online directories, a business could reach a small segment of users who presumably had those particular interests. By advertising on search engines for specific queries, businesses can be seen by a segment of people whose query matches their offering. These rudimentary segmentation tactics were more commonplace during the early days of online advertisement (Evans, 2009), and slowly more complex methods have started taking over.
User Provided Data
The most obvious mechanism for data collection by websites is user provided data. This means data collection takes place when a visitor to a website creates an account, fills out a form or purchases a product, actively providing their personal information to the website owner. This type of data is precious in the two-sided market model, and it stands as the most transparent form of data collection; it is the equivalent of a customer survey with significantly better reach. And for websites such as Facebook, Google, Amazon and other large online platforms, it allows them to give advertisers access to very specific groups of their users, highly segmented and accurate.
In the world of digital market and behavioural algorithms, we cannot ignore the value of this user-provided personal content. It is already considered as digital currency, often without full awareness of final users (consumers, data subjects).Malgieri, 2018. P.119
IP stands for Internet Protocol, and each user of the internet has an IP address which identifies their positioning, at least down to the city and sometimes even more precisely than that. IP addresses are typically unique and remain the same over some time; Internet Service Providers change the IP address of their customers with varying frequencies to prevent identification.
On a one-to-one level, an IP address can identify visitors to a website over time, for example, one time vs repeat visits, and to identify their geographic location. On an aggregate level, ad networks can use this information to match across multiple sites and identify patterns for a user or a household. Among the context information that advertising stakeholders leverage, location information is certainly one of them. However, when this information is not directly available from the end users, advertising stakeholders infer it using geolocation databases, matching IP addresses to a position on earth.
Another well-known means for data collection online, and one used widely in the online advertisement industry, is cookies. While often spoke about in the media and the ad industry, there is not a widespread understanding of what cookies are and how they work and the different cookies that are used nowadays.
Cookies are small files of text that get saved on a user’s computer by a web server and many of them are necessary for some of the basic functioning of a website. Through cookies, a website can identify if a visitor is logged in, if they have changed settings or preferences on the site, or if they have items in their shopping cart, to name a few examples. These cookies are placed on a web browser by the website that the user has visited and are known as first party cookies. Another type of cookie that websites save in their visitors’ computers is known as third-party cookies. Any site can leave behind anything from ten to over a hundred cookies stored in a visitor’s computer, most of these coming from third parties.
Websites that include content from other providers, such as displaying advertisement banners, embedding videos, and using pixels, widgets and other scripts, provide the opportunity for these other content providers to add tracking cookies to the computer of a visitor. Advertisers use these cookies to track users across websites, creating a complete picture of someone’s interests, purchases and interactions on all the websites where they have a presence. This data is later aggregated and offered to advertisers for a profit.
Other forms of data collection include tracking of mobile phones, interactions with virtual assistants and access to publicly available, or purchased databases. This data can be aggregated and matched to generate an even more complete picture of an individual. This data is collected primarily to satisfy the data requirements of advertisers, rather than to act on behalf of the users’ best interest.
- Bergemann, D., & Bonatti, A. (2015). Selling Cookies. American Economic Journal. Microeconomics, 7(3), 259–294. https://doi.org/10.1257/mic.20140155
- Callejo, P., Gramaglia, M., Cuevas, R., & Cuevas, Á. (2021). A deep dive into the accuracy of IP Geolocation Databases and its impact on online advertising. arXiv:2109.13665
- Evans, D. S. (2009). The Online Advertising Industry: Economics, Evolution, and Privacy. The Journal of Economic Perspectives, 23(3), 37–60. https://doi.org/10.1257/jep.23.3.37
- Hormozi, A. M. (2005). Cookies and Privacy. EDPACS, 32(9), 1–13. https://doi.org/10.1201/1079/45030.32.9.20050301/86855.1
- Malgieri, G. (2018). User-provided personal content in the EU: digital currency between data protection and intellectual property. International Review of Law, Computers & Technology, 32(1), 118–140. https://doi.org/10.1080/13600869.2018.1423887