Skip to Main Content

Data Module #1: What is Research Data?

Difference Between Data & Statistics

While the terms ‘data’ and ‘statistics’ are often used interchangeably, in scholarly research there is an important distinction between them.  

Data are individual pieces of factual information recorded and used for the purpose of analysis. It is the raw information from which statistics are created.   Statistics are the results of data analysis - its interpretation and presentation. In other words some computation has taken place that provides some understanding of what the data means. Statistics are often, though they don’t have to be, presented in the form of a table, chart, or graph. 

Both statistics and data are frequently used in scholarly research. Statistics are often reported by government agencies - for example, unemployment statistics or educational literacy statistics. Often these types of statistics are referred to as 'statistical data'.

 

A Closer Look

Using the Twin Cities Light Rail Transit Green Line as an example, let's briefly explore the difference between data and statistics.
 

FROM DATA...
The Metro Transit Commission (MTC) provides raw data files for many of their operations. This information is made available in a machine-readable format so it is easily usable with statistical analysis software.

...TO STATISTICS
This raw data was used to create a graph showing average daily boardings by month on the Green Line train in a news story by the
Star Tribune published online on July 27, 2015. This statistical information allowed readers to better understand the raw data. Remember, statistics are created once data are analyzed and computations are done. Statistics are often (although not always) presented in the form of a table, chart, or graph. In this case, the Star Tribune reporter used the graph to show the average number of riders who boarded the LRT at each of the various stations along the Green Line during each month of 2014. The raw MTC data counting the numbers of riders who boarded at each stop on each day was used to determine an average number of riders boarding at each station during each month.


 

 

 

 

 

 

 

 

SOURCE: “Green Line LRT Ridership Surpasses Blue Line in 2015.” 2016. Star Tribune. Accessed November 14. https://www.startribune.com/green-line-lrt-ridership-surpasses-blue-line-in-2015/309420891/.

Data Tools

There are a number of statistical data analysis and visualization software tools available on campus. Here is a list of many of them:

ATLAS.ti
Licensed computer software used mostly, but not exclusively, in qualitative research or qualitative data analysis. It is available in many public and departmental computer labs on campus as well as on library computer workstations.

JMP
Data analysis and visualization software package, used in many scientific fields. It is available for faculty and students to download, and installed in most computer labs on campus as well as on library computer workstations

R
Programming language and development environment for statistical computing and graphics. Available in public computer labs and on library workstations.

SPSS
Statistics and data analysis program available on most public lab computers and on library workstations.