Filter Bubbles and the Surveillance Model

This was co-authored with Anushree Gupta who blogs at

[Disclaimer: We don’t claim to know the exact workings of the internet so we are working with whatever knowledge we do have. Also, this touches upon quite a few things that are complexly interrelated and form feedback loops and vicious circles. Hence a linear flow may not always be found. Please bear with it.
Companies generally mean normal (non-internet) companies. “Website” has been used again and again instead of internet companies to keep the writers sane.]

The internet is basically just a lot of servers(data storage devices) that are interconnected, forming a network. Hence, interconnected+network=internet. Okay, maybe not, but essentially the internet isn’t a magical thing. It is just a collection of devices that are connected. Now keeping all these things working requires many resources. Need for resources generally implies that there is going to be a need for financial support for the infrastructure as well as the intellect working behind the scenes. In the current scenario, the Web as we know it is run by companies providing services to anyone who has access to the internet including social media, news and search among many others. Like regular companies, they need money to operate. So, how do these internet companies get money?

Unfortunately, in the case of the internet, the main financial source is advertising[1]. To understand why this is unfortunate, we’ll have to look at certain implications of using advertisements as a means to make money. Let us first look at why advertisements on an online platform is an amazing idea for companies. The main aim of the companies when making advertisements is to reach as many people as possible. Websites provide quite a good medium to do this. The potential users (viewers of the advertisement) consist of pretty much anyone who has access to the internet, which is a humongous number. So, potentially, everyone who is part of the internet world may see your advertisement. Another reason why advertisers like the internet is that you could target your advertisements according to the age, sex, income, location, etc. of the users[2]. These two factors give websites a significant advantage over conventional mediums like newspapers or magazines which print advertisements for a generic population.

Companies come to the Internet market looking for advertising space. Similar to the conventional channels of advertisements, the Internet has some ground rules as well and to capture a larger market share, the companies need to make the right choices, in terms of selecting the website on which they are to display their advertisements, as per their product. The number of websites and the companies wanting to advertise online constitutes an ever-growing number. But we can safely assume that the number of websites is more than the number of companies. Hence, the websites have to vie with other websites for selling their advertising space. So, they need to attract these companies. The only reason companies would choose one website over another is if the first could promise more outreach and maybe a more targeted outreach; basically better exposure and a solid guarantee that it will help the company gain more users and hence a bigger profit. So, this gives websites a pretty good incentive to get more people to visit, keep people on their sites longer, and to figure out how to better target their users.

  1. How to get more people to visit the website? Advertise!
  2. How to keep people on the site longer? Give them content they’ll like!
  3. How to better target? Understand the user and advertise accordingly!

So, the first of these obviously leads to more adverts. What about the other two? The second way leads to what are referred to as filter bubbles while the third results in what has been called a surveillance model of the internet (more on this later).

A filter bubble is a result state in which a website algorithm selectively guesses what information a user would like to see based on information about the user (such as location, past click behavior and search history) and, as a result, users become separated from information that disagrees with their viewpoints, effectively isolating them in their own cultural or ideological bubbles. Prime examples are Google’s personalized search results and Facebook’s personalized news stream.

Now, what the filter bubble does is basically what it says. It creates a bubble around you that filters all that you see on the internet. What do you think that chat thing on the right of your facebook page does? It divides your friends into two, the top part would contain people you ‘interact’ with frequently, while the “More friends” part contains anyone else who happens to be online. And it is generally aligned and set so that you end up focusing only on the top section. Try stalking someone and that person would end up in the top section, so would any new friends you make. Chat enough and you’ll see that person there even if they aren’t online. “Connect with friends and the world around you on Facebook.“ That’s taken straight from the facebook login page. The ‘around you’ part is grossly under-emphasized. Like enough HONY (Humans of New York) pictures and your newsfeed is going to be bombarded with HONY pictures whenever any of your friends like it. The newsfeed is built in a way that it excludes stuff that you don’t click on, it includes stuff that you frequently click/like/comment on. What this means is that instead of giving you a “news” feed it gives you something that is akin to a “what-facebook-thinks-you-want-to-see” feed, thanks to the awesome algorithm working in the background. Similar things happen in your Google search too. Try searching for the same thing when you are logged in and then when you aren’t. The results may vary drastically. This link detailing the differences in the news seen by Israelis and Palestinians provides a very good example of how filter bubbles work in the case of news websites in a very similar fashion to that on social media and search engines. You don’t even have to look that far to see instances of filter bubbles. They are visible closer, in the popular posts and recent comments in Entelechy. They might not be intended to be that way but they do play very similar roles.

Now why do these things exist in the first place. There are two reasons for this. The first is the need to keep users longer on their sites. The more comfortable and familiar content you get, the longer you want to stay on the site. When the picture of the web that you get is one which has a lot of people who are shown to have similar interests, you want to know more, stay longer. Another thing that filter bubbles give us is more personalization. That also incentivizes a longer stay on that website. By restricting or rather outright excluding certain things from ‘your internet’ these filter bubbles put the proverbial blinders on you. It is not only what these websites actively choose to show you but also all information that you don’t get, which is the most intimidating bit. Often without the average user realizing this, the websites provide the user information that is a very skewed version of events (take that Israel-Gaza thing for example) which is the real demon in this entire situation.

Some people argue that this sort of personalization aids in the use of the internet, at least for the average user. Take for example, based on previous search results and other data, Google has determined that X is a travel loving person. So, when X types Egypt to, say find out general information about the country, the top search results that turn up are all directed to travel information and travel blogs; and very less information about the country per se. As a short term consequence, X will be interested and visit the websites that Google shows. Its an added bonus if X is actually looking forward to travel to the country and hence the personalization is convenient. But in the long run, say X could not travel to Egypt; what happens then is that Google fails to provide the information that X wants and makes it cumbersome for X to search for it. Moreover, the personalization limits the scope of information that X would have otherwise had at his/her disposal and this proves to be (albeit in a very minor way, with respect to this particular example) detrimental.

Now that we have understood the what and the why of this issue, let’s look at why filter bubbles are wrong. I could go into the vast array of social problems created due to this, but for now I’ll just stick to the principles on which the Web was made. One of the reasons the internet is such an awesome place is the openness and more importantly the connectivity it facilitates with anyone else in the world. You can pretty much connect with any other source that exists in the cyberworld. Anything “on the net” should be accessible. Now filter bubbles essentially go against this philosophy. What filter bubbles create are exactly what the name implies, little bubbles outside of which one doesn’t see anything. It is like restricting your world to a two-dimensional plane (Flatland anyone?). But you don’t know this because when you look around (Google stuff) all you see is the two-dimensional plane that Google creates for you, based on your previous searches. It becomes very difficult to find anything outside this filter bubble. The open World Wide Web is no longer truly open or world wide.

During this discourse about filter bubbles, the research led to the mechanics of how the filter bubbles are implemented. Obviously these websites will have to keep data about all your activity on the web to understand what you like and what you don’t. To get a good picture about you and to try and predict what you’ll be liking, the websites cannot afford to lose any data about you; every single click gets stored somewhere in the process. This leads to what Schneier calls the surveillance model of the internet. Schneier’s article is an absolutely gorgeous read, managing to cover a lot about such a big issue with concision alongwith bringing in a new way of looking at things:

If these features don’t sound particularly beneficial to you, it’s because you’re not the customer of any of these companies. You’re the product, and you’re being improved for their actual customers: their advertisers.

This particular paraphrasing of the whole situation raises goosebumps and is a particularly striking revelation to anyone who uses the internet extensively. Also, this new outlook explains a lot of things regarding the surveillance model. The websites running on this model aren’t really doing much for us. They are doing all that they are for the companies that give them money.

Let’s not get pulled into a discussion of whether the companies should be doing such a thing (that would require a whole separate venue) and instead have a look at the general repercussions of this model in the context of all users. As mentioned earlier, for filter bubbles to work properly, there is a need for a large amount of data to be stored about every user of your product. Now, just storing this data isn’t going to be enough. There is an obvious need to analyse all that data and try to figure out patterns of usage to implement the “filter bubbling” efficiently. The fact that there is a whole discipline dedicated to just doing this is a frightening prospect in itself. Big Data: Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.

Without going into much details about the mechanics of storing and analysing Big Data and the extraordinary amounts of investments made to do so, the main cause for concern should be the consequences of this process. The problem isn’t with Big Data, it is actually with what is done with all this data that raises more scary questions. The answer to that question is an obvious and easy one: the results of the analysis feed the surveillance models for the particular companies which ultimately result into more stringent and suffocating filter bubbles.

The primary consequence of Big Data analysis that calls for our attention is that there is a definitive loss of privacy due to the extensive data collection. So much of your data is in the hands of these companies that Schneier says that

I used to say that Google has a more intimate picture of what I’m thinking of than my wife does. But that’s not far enough: Google has a more intimate picture than I do. The company knows exactly what I am thinking about, how much I am thinking about it, and when I stop thinking about it: all from my Google searches. And it remembers all of that forever.


These two images show the default settings of personal data on Facebook in the years 2005 and 2010. Taken from an infographic that may be updated at any time. Content credits to Matt McKeon.

And if this isn’t a scary enough scenario, then think about the power that this gives the government over you. If it is possible for a company however big to collect so much data about you, governments with deeper pockets and more resources (especially legal) in general would easily be able to make more invasive intrusions into your private lives. Instead of trying to reduce this problem of Big Data, Maciej Ceglowski says that arguments like this are being made in the companies’ decision making processes:

If the algorithms don’t work, that’s a sign we need more data. If the algorithms do work, then imagine how much better they’ll work with more data. There’s only one outcome allowed: collect more data.

Both of these lead to a model of collecting more data.

These aren’t new problems that have popped up yesterday, nor are people unable to come up with solutions. Some many faceted solutions have been suggested to take care of this problem.

  • Awareness (and concern) about the privacy issues among the general public.
  • Get purchasing power.
  • Government regulations
  • De-centralize and De-Americanize

Now talking about all of these things is, of course, a very easy thing to do. Creating awareness definitely seems to be a good idea. Unless people know that there exists a problem, no one is going to do anything about it. Further, we can always hope that aware and concerned people would succeed at getting a few changes that wouldn’t have happened otherwise. Getting the purchasing power is a solution to the problem that we have become the products of these companies and are no longer the customers. The whole idea behind getting some purchasing power is the very simple economics principle of supply and demand. We are the supply currently. The websites use us while all the time maintaining to be working for us. This has to change, the status quo needs to change. Unless we have some power, whatever be the way in which we get it, we need some sort of purchasing power to counter this model in which we are the ignorant products.

The two solutions I have talked about are mainly related to the issue of advertisements. The last two in the list are more focused at the surveillance model. Regulation primarily tries to control or at least limit the quantity and quality of the information that is stored by websites. Limiting the collection of data is going to be tiresome and absolutely impractical. What can be done is an attempt can be made to try and limit the kind of ‘behavioral data’ (browsing history, clicks, etc.) that is stored and for how long it is stored. Also, there is a need for openness about the kind of data that is being stored. I mean I have no idea what Facebook knows about my life. There should be a mechanism via which I can find this out and if I want, delete it. Freedom is fine as long as it is on the internet but when the freedom on the internet starts to transcend boundaries and enter the physical world with implications that are much more visible and tangible then it can end up causing problems. This is the part where the regulation should come into play.

Now, decentralization and de-americanisation are two alternatives that, not only hope to get rid of some problems but also try to take the internet back to its original ideals of a de-centralised global thing. Explaining decentralization or de-americanisation properly would require an article unto itself but the whole argument for de-centralising has two facets. The first is principle based and goes back to the way the internet was supposed to be. The internet is one of the most democratic tools that humans have ever imagined and yet the starting of this article says that “…the Web as we know it is run by companies providing…”. That goes against the whole idea of a free and open web for all. The second is a more practical one, a concern about the amount of power that these handful of companies have quietly amassed and the possible misuse of it. When we are so dependant on these companies to keep the internet working, if they ever go bad or if they decide to make policies that we aren’t happy with, we’ll not be able to do much about it. This problem has already started taking effect in the case of Facebook and Google, where we just accept whatever changes they make in their policies, because we are inevitably dependant on them.

These solutions are not simple by themselves, the root reason behind attempting such solutions may be easy enough to figure out but understanding the consequences of these solutions to make sure that they don’t lead to more problems. Also considering the winner-takes-all market structure of the internet figuring out how to implement these solutions properly and efficiently is a complex task. It is a very messed up model, full of pitfalls that are very easy to fall into, both with respect to filter bubbles as well as the whole model.

This whole discussion arose from the question of how the internet functions and runs, or more accurately the economic model that it uses. Although there are no other obvious or easy alternatives, it should be pretty clear by now that the current model that is followed is a very problematic one and needs to be fixed.


  1. The Internet’s Original Sin – Ethan Zuckerman
  2. Maciej Ceglowski’s Talk: The internet with a human face(transcript):
  3. Surveillance as a Business Model – Schneier on Security


Other interesting links:

Israel/Gaza link:

Filter Bubble TED Talk:


One thought on “Filter Bubbles and the Surveillance Model

  1. Now, see, at the end of the day, Facebook and Google are indeed businesses. The guys who are coding, sitting in their offices in Mountain View, are not going to think that one lone guy half a world away is going to want to look at the world not from the bubble but from the outside of the bubble.

    As you guys said, they know that a common man goes to Facebook and chats with friends, a common man goes to Google Search and types the place he/she wants to visit, a common man goes to YouTube and clicks on the ‘recommended for you’ just to get the same kind of music or video he/she likes.

    Internet isn’t a utility anymore. It has been exploited deeply and thus it has become a cover. Compare people going to Quora and Github to people going to Facebook and Youtube everyday. Of course, they want to increase the number! Because they are already getting more than half of the total population of the world! People are not intelligent! They just play by the tools given by these giants and give their information happily in return.

    However, I wanted to know that does the information I give to Facebook or Google (or even Yahoo for that matter) share? Suppose I’ve something in my G+ profile and my previous searches that make Google give a certain kind of result to me. So, Google filters the information and gives me the desired one. So, does this ‘desired one’ have the access to my information or a part of it?

    Nicely written, by the way!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s