Need to find a dataset for STAT 155? The library can help!
This guide has some strategies for getting your search started—including where to look, what to look for, and how to look for it.
Don't see what you need? Schedule an appointment with a librarian! Brigid McCreery and Shannon Merillat are available to help.
If you're not sure how or where to start looking for a data set that you can use for your assignment, follow the steps below. Be sure to take a look at the data requirements checklist in the next tab too.
Steps for finding a data set for your assignment
Step 1. Identify your topic of interest: First, determine what topic interests you - would you like to analyze data on sports, US presidential elections, climate change, or public health?
Step 2. Identify your data requirements: Keeping your topic of interest in mind, create a data requirements checklist for your assignment to help guide your search.
This checklist should include attributes that a dataset must have in order for you to use it to complete your assignment and other considerations you should take into account, such as minimum number of observations (rows), type of dependent (outcome) variable you need, type of independent variables you need, and your previous experience and skill with preparing data for analysis. For details, see the Data Requirements Checklist tab in this Reference Guide.
Step 3. Search for possible data sets: There are a couple different ways you can search for a data set on your topic that meets your data requirements. Where you start your search for a dataset for your assignment will depend on your topic.
Option 1. Search the resources in this Research Guide. In the Data Set Sources section of this Research Guide, you will find a selection of databases of micro data and data sets on a variety of topics from a variety of sources, such as ICPSR, IPUMS, and DATA.Gov. You will also find specific high quality data sets, such as the CDC's National Health Interview Survey (NHANES) and to some helpful guides from other organizations that include links to sources for data sets.
Option 2. Search for datasets cited in articles on your topic. You can also find articles that have conducted analysis on your topic and check what dataset they used. This information is typically found in the methods section, appendix, and/or reference list.
Step 4. Review metadata and data dictionary to determine if your requirements are met. Once you have identified a dataset that looks like it may meet your requirements based on the title and summary, take a look at the data dictionary and meta data to see if all of your requirements are met. You may need to download and explore the dataset to be sure.
Checklist Item | Description |
---|---|
Outcome (dependent) variable |
|
Independent variables |
|
Number of observations (rows) | Determine the sample size you need to conduct your analysis and find a data set that meets or exceeds that number. For this assignment, you will need a data set with at least 30 observations that include your variables |
Skill level |
Consider your skill level and experience in data preparation. For example, some datasets require extensive cleaning before they are ready for any kind of analysis, others may only require minimal preparation. If you do not have much experience in data preparation, you may want to stick to file formats that won't require conversion to be used in the software you'll be using to analyze your data. |
Have questions about the research process? Don't struggle, reach out to a librarian for help! Students interested in research support can book a meeting with a librarian to:
To make an appointment, reach out to a subject librarian specializing in your topic.