Data Modules
Table of Contents
#1 - What is Research Data?
#2 - Planning for Your Data Use
#3 - Finding & Collecting Data
#4 - Keeping Your Data Organized
#5 - Intellectual Property & Ethics
#6 - Storage, Backup, & Security
#7 - Documentation
Module created by Aaron Albertson, Beth Hillemann, & Ron Joslin.
Finding and collecting data are an early and important stage in all data lifecycle models. There are three broad strategies for obtaining data: collecting the data yourself, gathering data collected by someone else, or combining these strategies together. In this module, we will discuss these strategies in more detail and provide best practices for each.
Summary of Data
Collection Strategies
Collecting the Data Yourself
When collecting the data yourself, you should design a data collection plan that includes why, when, and how you will collect your data. Collection methodologies vary widely, including running experiments in a lab, observing research subjects, conducting interviews or focus groups, etc. Planning will ensure, as much as possible, that the data collected will match your research needs. It will also improve accuracy and integrity by giving you greater control and understanding of the data collection process. As a student, you may or may not have the time or means to collect your own data. Evaluate your situation to determine if collecting your own data are feasible.
Using Existing Data
You might find that using existing data are the most efficient strategy for acquiring the data you need. A lot of data can be found freely online, in print, and for purchase. If the data you need is readily available this can greatly speed up your research process. However, you may not be able to find existing data that completely matches your research needs. You might need to adjust your research question or make do with less than perfect data. In addition, you are less likely to be aware of any problems that occurred during the data collection process. It is important to thoroughly evaluate existing data and its documentation, to assess its quality and to see if it meets your research needs.
Combination of Collecting Your Own Data and Using Existing Data
The best strategy might be a combination of the first two strategies. It might overcome or lessen the impact of any shortfalls in the existing data you are using. For example, you might be able to collect a variable that is missing from the existing data. Keep in mind that when you combine datasets into one large dataset the variables must be equivalent. For instance, you may have two datasets on the tonnage of cargo shipped out of major ports, one from the United States, the other from China. Different countries may define ton differently: long ton vs. short ton. Be sure you are combining variables that reflect the same data measurement. Combining collecting your own data and using existing data can be time consuming or costly. You should consider whether using a mixed strategy approach will result in enough benefit to justify this extra effort.