Skip to Main Content

BIOL 359: Big Data in Ecology

This Research guide includes sources and research guidance for students in BIOL 359: Big Data in Ecology

Dataset Sources: Ecology-Specific Repositories

How to Find a Data Set: a step-by-step guide

If you're not sure how or where to start looking for a dataset that you can use for your assignment, follow the steps below.

Step 1. Identify your topic of interest

First, think about what topic interests you: would you like to analyze data on sports, US presidential elections, climate change, or public health?

However, keep an open mind and keep your interests somewhat broad at this stage since it can be tricky to find a data set that includes the specific variables you are interested in. While you may have an idea for a research question at this point, it’s best to wait to finalize your research question until after you’ve identified a dataset that meets your needs. You may need to identify a broad interest, and then focus your interest as you review variables in available datasets.

Step 2. List your data requirements

Keeping your topic of interest in mind, create a data requirements checklist for your assignment to help guide your search.

This checklist should include attributes that a dataset must have in order for you to use it to complete your assignment and other considerations you should take into account, such as minimum number of observations (rows), type of dependent (outcome) variable you need, type of independent variables you need, and your previous experience and skill with preparing data for analysis.

Step 3. Search for possible datasets

Where you start your search for a dataset for your assignment will depend on your topic. There are a couple different ways you can search for a dataset on your topic that meets your data requirements.

Option 1. Search data repositories and databases. You will find a selection of sources for micro data and data sets on a variety of topics from a variety of sources in our Social Sciences Data Research Guide and our Natural Sciences Research Guide.

Option 2. Search for datasets cited in articles on your topic. You can also find articles in library databases that have conducted analysis on your topic and check what dataset they used. This information is typically found in the methods section, appendix, and/or reference list.

Step 4. Evaluate possible datasets: are your data requirements met?

To find out if your data requirements are met:

  • First, review the dataset titles and consider the author and purpose the dataset was created 
     
  • Next, if a dataset seems like it might fit your requirements based on the title, author and context, take a look at the metadata, including the data dictionary, to find out more
     
  • Finally, you will need to download and explore the dataset to be sure

Note that you will probably have to review many datasets before you find one that comes close to meeting your requirements.

If you’re having trouble finding a dataset that fits your data requirements, consider trying a different search strategy, looking in other sources (i.e. different repositories or articles), revising some of your data requirements, or consulting a librarian or your instructor.

Step 5. Finalize your research question It’s best to wait to finalize your research question until after you’ve identified a dataset that meets your needs. Once you’ve found a dataset and confirmed that it meets your data requirements, go ahead and finalize your research question.

Sometimes finding a dataset that meets your needs is tricky. If you are having trouble, try these strategies to get back on track.

Can't find a data set that meets your data requirements? Try these three strategies to get back on track:

  • Change your search strategy and try other repositories.
  • Consider different variables for your research question.
  • Ask for help from your librarian or instructor.