Skip to Main Content

Data Module #7: Documentation

Describing the Original Data

What Information You Should Record

Record metadata for all data, whether you collected it yourself or used someone else's.

Metadata should include:

  • Who collected the data (might be multiple people or even an organization)
  • When it was collected
  • How it was collected
  • A  basic description
  • How to access/find it
  • Names and definitions of variables.

You should record anything that would be needed by a researcher to use your data. The specifics of this will vary depending on your data. It might include such information as: coding schemes, units of measurement, details of the sampling method and weight variables, descriptions of how any variables were constructed, how to access any additional metadata, or anything else you think would be helpful for a researcher. 

Existing Data

If you are using existing data, always include a basic citation for the data. For example, here is a citation for data published by the National Institute of Education:

National Institute of Education (1992) Safe School Study, 1976-1977. Inter-university Consortium for Political and Social Research. https: //

Note the use of a DOI (digital object identifier). If one is available for the data, use it as in the example above. If no DOI is available, but the data are on the web, provide a URL. You should also include a brief description of the data and how to access the original data's metadata.


Metadata should always be kept with the data files they are describing.  There are many systems for organizing your metadata. At a minimum, place a document containing your metadata in your original data folder. If you have several data files, they could go in a metadata subfolder in your original data folder. In earlier modules, we mentioned the Tier Protocol. For that standard you would create an entire Metadata Guide. Choose a system that best suits your needs and the needs of your data.

Research Scenario, Part 2

Metadata description

For our voter participation research project, we have both our own data and data from others. We need to create metadata for all of our data sets. We have decided to create a text document (.txt) to record our metadata for each dataset. For example, we have our original polling data in a folder called "Polling Data," which is a subfolder of "Original Data." We will create a folder called "Polling Metadata" and place it in the "Polling Data" folder. We will also create a folder called "Voter Turnout Metadata" that will reside within the "Voter Turnout" folder within "Original Data."

The reason why you create a metadata folder for your data sets, rather than just a file, is that you may have more than one file of metadata. Our particular research project example is straight-forward and fairly simple.

Now that we have made a decision about our organization of metadata, it is time to consider what we will record as metadata. 

Polling data:

  • Names of the people who ran the poll
  • Date: 11/8/2016
  • Methodology (i.e. how it was collected) In this case, in-person interviews
  • Description of the data (e.g. "brief interview about why people chose to vote")
  • Names and definitions of variables (e.g. the interview questions)

Voter turnout data from the Minnesota Secretary of State's office:

  • Citation for the data (in APA style): Office of the Minnesota Secretary of State. (n.d.). Minnesota Election Statistics 1950-2016. Retrieved
  • Description of the data: Spreadsheet showing voter turnout for both primary and general elections from 1950-2016 in the state of Minnesota.
  • Names of definitions of variables: Because we don't have the definitions of the variables in the downloaded data files, we decided to put in a link to the source: More information available at the Office of the Minnesota Secretary of State website: