Data Module #7: Documentation

Describing the Analysis Data

Analysis Data

What Information You Should Record

Create metadata for the data used for your analysis. Some of this metadata might be the same as the metadata for your original data (e.g. author.) However, you will create metadata about any changes you made to the original data as well as the steps you took for analysis. For example, if you have altered your original data, you will record or describe the steps taken to alter it (e.g. changing Fahrenheit to Celsius, merging data files into one, removing extraneous data you are using, etc.). You will also provide metadata about what you did for your analysis (e.g. computer code used to run your the analysis, any new variables created, etc.). Finally, include citations for your original data files.


As you did with the original data, keep the analysis metadata with the analysis data files. A simple way to do this would be to create a document of your analysis metadata and keep it in the folder for your analysis data files. If there are multiple metadata documents or files, you might want to create a subfolder called "Analysis Metadata." For some projects, you might consider separating the metadata about your analysis data from the metadata about your analysis process. Project Tier provides more detailed recommendations, particularly for more complex projects.

Research Scenario

Research Scenario, Part 3

Data Analysis Metadata Description

For our voter participation research project, we analyzed polling data from the 2016 presidential election and voter participation rate data from before 1974, when same day voter registration went into effect in Minnesota. We created a folder for our analysis data, and within that folder we created an analysis metadata folder.  In a document saved in the analysis data folder, we described our analysis data: variables (from both the polling data and the voter participation data), and citations for those two original data files. In a second document, we described the process we took to create our analysis data file. For example, we did not use all of the variables from the original data, so we selected data that met particular criteria: those who registered on voting day in 2016; participation rate data from 1960-1973.