Skip to Main Content

Data Module #1: What is Research Data?

Types of Research Data

Data may be grouped into four main types based on methods for collection: observational, experimental, simulation, and derived. The type of research data you collect may affect the way you manage that data. For example, data that is hard or impossible to replace (e.g. the recording of an event at a specific time and place) requires extra backup procedures to reduce the risk of data loss. Or, if you will need to combine data points from different sources, you will need to follow best practices to prevent data corruption.

Observational Data

Observational data are captured through observation of a behavior or activity. It is collected using methods such as human observation, open-ended surveys, or the use of an instrument or sensor to monitor and record information -- such as the use of sensors to observe noise levels at the Mpls/St Paul airport. Because observational data are captured in real time, it would be very difficult or impossible to re-create if lost.
Image courtesy of

Experimental Data

Experimental data are collected through active intervention by the researcher to produce and measure change or to create difference when a variable is altered. Experimental data typically allows the researcher to determine a causal relationship and is typically projectable to a larger population. This type of data are often reproducible, but it often can be expensive to do so.

Simulation Data

Simulation data are generated by imitating the operation of a real-world process or system over time using computer test models. For example, to predict weather conditions, economic models, chemical reactions, or seismic activity. This method is used to try to determine what would, or could, happen under certain conditions. The test model used is often as, or even more, important than the data generated from the simulation.

Derived / Compiled Data

Derived data involves using existing data points, often from different data sources, to create new data through some sort of transformation, such as an arithmetic formula or aggregation. For example, combining area and population data from the Twin Cities metro area to create population density data. While this type of data can usually be replaced if lost, it may be very time-consuming (and possibly expensive) to do so.