Data Analysis, Models, and Knowledge (Applied Epistemology)

Posted on February 21, 2013 by

Most of my recent philosophical research has been in the area of epistemology—the theory of knowledge. Knowledge is a familiar concept among most people, but mainstream epistemology has turned it into something that is only shared within the small confines of philosophers. As a consequence, epistemologists have robbed common folk of knowledge!


The transformation of knowledge into a “philosophers-only” concept sometimes irritates me. I want to study epistemology to understand knowledge and beliefs for the purpose of putting my understanding of the subject to work in everyday matters. But the concepts analyzed in mainstream epistemology have restricted domains. Because of the restrictions, I realize that a lot of the philosophical knowledge that I have often does me no good in applying it to practical matters.

Recently, however, I have imagined applying some of the traditional epistemological concepts to a growing field that has crossed over the domains of business, economics, politics, social networking, and many other things. The field that I’m referring to is data analysis. With increasing numbers of Internet users in the past decade, there has been a large flow of data in cyberspace. Because there is so much data available on the Internet, data analysis has become essential to managing the growth of information.

With all of the data on the web, businesses have found that they can make use of the data in predicting behaviors and trends. This information is crucial for marketing, strategic planning, and development. But there is a problem that everyone should be made aware of.The overwhelming amount of data makes it difficult to find real patterns. Having so much data is like having stacks of unorganized files in a warehouse and hoping to fish out a single document in the mess. So what is the solution for shuffling through the mess? Data analysis and data mining. Data analysts use specific methods in identifying patterns and trends. They build models that filter through the bad data and locate important data.

Ok, so how is epistemology relevant to data analysis? The idea of sorting through information is an idea that epistemology welcomes. Epistemologists study the quality of information that humans have. Quality matters for making good inferences in our daily lives. So it is important to understand what type of information is reliable for making inferences. The obvious response by the epistemologist is that knowledge–not belief or opinion–is the best type of information one can have. Knowledge is key to making good inferences. But acquiring knowledge is not an easy task. Humans have to acquire good information in order to have knowledge.

Acquiring good information may be explained through a view called reliabilism. Reliablism entails that an agent acquires knowledge, in part, through reliable belief-forming processes. Alongside reliablism, we can add on virtue epistemology. Virtue epistemology broadly entails that there are valuable intellectual virtues where “getting things right” is at the core. By combining reliable belief-forming processes and intellectual virtues together, we can claim that knowledge comes about within an agent through forming a belief via reliable belief-forming processes and she gets it right.

In following the same line of thought, we might say that data analysis is similar. I think that the same concepts apply. First, data analysts want to get things right. Not only do their jobs depend on it, but I can’t imagine that they hope that they are wrong. So the intellectual virtue of getting things right applies to data analysis. Second, and this is the more difficult aspect, data analysis depends on reliable processes that will achieve the intellectual virtue of getting things right.

Data analysts build models that take in data and produce an output. Models are the mechanisms that generate information that we form our beliefs from regarding the content that is analyzed. Moreover, if a model gets disrupted or is just faulty to begin with, we can say that it is unreliable. And if the model is unreliable, then it won’t get things right (barring any sort of luck). So it is important to have good models that are reliable and get things right so that we can acquire knowledge from them. Or else, why even bother doing data analysis if it doesn’t produce good information?

Assuming that data analytics and models are reliable, there is much instrumental value in data analysis and model construction. Since the world is rapidly transforming from the increasing amount of data on the web, we are beginning to change our patterns in learning. Examples of this include reading up-to-date news, Googling questions and queries about things that one doesn’t understand, and staying in the loop with family and friends through social networking. It is not unreasonable to assume that in the future, we will become entirely reliant on the Internet for providing us with all of our information. But as I have said, there is good and bad information out there. So the best way for us to get the good information is for data analysts to embrace the philosophical views presented above when developing models and analyzing data. These sound practices will significantly aid our ability in acquiring knowledge. Eventually, knowledge is going to depend on more than just what’s in our head. It’s going to depend on mechanisms external to us. How good those mechanisms are will depend on their reliability.

If what I have said so far is plausible and knowledge in the future will depend on having reliable external mechanisms, then there is an open philosophical question to ask. Should we treat models and data analytics as a source of testimony or should we treat the models as an extension of us? In treating models and data analytics as a source of testimony, we would regard them the same as we would regard a witness in a courtroom or a teacher in the classroom or a bystander in the street giving us directions. In treating models and data analytics as an extension of us, we would be admitting that our knowledge and cognition is not located just in the head. Our mental lives step out into the environment and hook on to external objects. Both conceptual treatments are highly controversial, but I’m going to leave it at that for now.