Skip to Main Content

Data Guide: Working with Data

A guide to citing, managing, and incorporating data into your research.

Where to Search

While this guide includes more than 50 different sources for datasets, these are our favorite interdisciplinary data repositories that cover a wide range of topics. Many (but not all) datasets in these sources have great documentation (i.e. data dictionaries and other information about how the data was collected) They have a lot of datasets and are relatively easy to use and search.

When searching for data, you may want to look for repositories dedicated to the subject area of your research. For a list of data repositories sorted by subject, see the “List of Data Sources by Topic” of this guide.

You can also search interdisciplinary repositories.

Development Finance Institution

A development finance institution, also known as a development bank, is a national or regional financial institution designed to provide medium- and long-term capital for productive investment, often accompanied by technical assistance, in low-income countries (for more information on the history of words used to describe a country’s economic status, see this post from the Borgen Project, 5 Terms to Use as an Alternative to “Third World”). For more information on development finance institutions, see this encyclopedia article from Britannica, Development Bank.


Nonprofit

Non-profit organizations do not pay out any of their earnings to benefit any private shareholder or individual. There are many types of non-profits, including social welfare organizations, public charities, private foundations, and recreational clubs. Non-profit status doesn’t necessarily indicate an organization's information is trustworthy. ProPublica’s Nonprofit Explorer is a great place to search for information about a specific non-profit, and features a list of non-profit types.


IGO (Intergovernmental Organization)

Intergovernmental organizations (IGOs) are a type of international organization (IO). Their mission is often related to addressing economic, social, and security issues. For more information on IGOs and IOs, see the International Encyclopedia of Political Science’s International Organizations entry.


NGO (Non-Governmental Organization) or Think Tank

Non-Governmental Organizations are non-profit mission-driven organizations U.S. and international NGOs represent virtually every conceivable ideology, political cause, religion, social issue, and interest group. Think Tanks are considered NGOs. Some are political and others are nonpartisan, involved only in social issues. For more information on NGOs, see the U.S. Department of State’s NGO Fact Sheet.

Government Data

Government data is available on a wide variety of topics related to the missions of government agencies, including health, employment, wildlife, climate change, pollution, education, food security, and much more. Depending on the agency that collected it and the purpose for collection, data is available at the state, county, region, city, or neighborhood level. 

Note that not all government microdata data is publicly accessible: certain types of census data and health data are restricted due to confidentiality concerns. Researchers can apply to access restricted  data for legitimate research purposes. If you have questions about accessing restricted government data (or any other type of restricted data) please email a librarian.

We've included those specific data repositories that students might find the most useful in this guide, but this is not a complete list. We have also included some repositories of European union and Canadian government data.


U.S. Federal Government Data

In the U.S. and other democracies, government information  is a public resource. The free flow of information between the government and the public that it serves is essential to maintaining an informed citizenry. The public's right to know about government operations and functions is essential in holding government accountable to its citizenry. For more information see the American Library Association’s webpage on Key Principles of Government Information

You can access almost all U.S. government data from data.gov, or directly from the government agency data repository. 


Minnesota State Government

International Governments

Large Indexes & Repositories


Search Engines

If you aren’t finding a dataset dedicated subject-specific repositories relevant to your topic or you have an interdisciplinary topic, you can use a search engine, which searches the internet for datasets. 

How to Search

If you're not sure how or where to start looking for a dataset to use for your assignment, follow the steps below:

Step 1. Identify your topic of interest

First, think about what topic interests you: would you like to analyze data on sports, US presidential elections, climate change, or public health?

However, keep an open mind and keep your interests somewhat broad at this stage since it can be tricky to find a data set that includes the specific variables you are interested in. While you may have an idea for a research question at this point, it’s best to wait to finalize your research question until after you’ve identified a dataset that meets your needs. You may need to identify a broad interest, and then focus your interest as you review variables in available datasets.

Step 2. List your data requirements

Keeping your topic of interest in mind, create a data requirements checklist for your assignment to help guide your search.

This checklist should include attributes that a dataset must have in order for you to use it to complete your assignment and other considerations you should take into account, such as minimum number of observations (rows), type of dependent (outcome) variable you need, type of independent variables you need, and your previous experience and skill with preparing data for analysis.

Step 3. Search for possible datasets

Where you start your search for a dataset depends on your topic. There are a couple different ways you can search for a dataset that meets your data requirements.

Option 1. Search data repositories and databases. You will find a selection of sources for micro data and data sets on a variety of topics from a variety of sources in our Social Sciences Data Research Guide and our Natural Sciences Research Guide.

Option 2. Search for datasets cited in articles on your topic. You can also find articles in library databases that have conducted analysis on your topic and check what dataset they used. This information is typically found in the methods section, appendix, and/or reference list.

Step 4. Evaluate possible datasets: are your data requirements met?

To find out if your data requirements are met:

  • First, review the dataset titles and consider the author and purpose the dataset was created for.
  • Next, if a dataset seems like it might fit your requirements based on the title, author, and context, take a look at the metadata, including the data dictionary, to find out more.
  • Finally, you will need to download and explore the dataset to be sure.

Note that you will probably have to review many datasets before you find one that comes close to meeting your requirements.

If you’re having trouble finding a dataset that fits your data requirements, consider trying a different search strategy, looking at other sources (i.e. different repositories or articles), revising some of your data requirements, or consulting a librarian or your instructor.

Step 5. Finalize your research question

It’s best to wait to finalize your research question until after you’ve identified a dataset that meets your needs. Once you’ve found a dataset and confirmed that it meets your data requirements, go ahead and finalize your research question.

Creating a data requirements checklist before you start searching for data sets will help you search more efficiently. Because it can be difficult to find secondary data that will answer your research question, it’s important to build some flexibility into your checklist early on. Note what things you can and cannot compromise on.

Item Description
Dependent (outcome) variable

You'll need a dataset with a variable or set of variables that you can use as your dependent variable or use to create your dependent variable. 

You will want a dataset that includes a dependent variable that is: 

  • Relevant to your topic
     
  • Fits the type of data analysis you will be doing
     
    • For example, you may need a quantitative or binary variable, or you need a categorical variable.

Independent variable(s)

You will also need variables that influence your dependent variable. 

You will want a dataset that includes independent variables that are: 

  • Relevant to your topic and associated with your dependent variable
     
  • Fit the type of data analysis you will be doing
Number of observations (rows)

Determine the number of observations (i.e. rows or sample size) you need to conduct your analysis, and find a data set that meets or exceeds that number

Make sure there are enough observations for all of the variables that you need to include in your analysis!

Skill level

Consider your skill level and experience in data preparation

Some datasets require extensive cleaning before they are ready for any kind of analysis, while others may only require minimal preparation

If you do not have much experience in data preparation, you may want to stick to file formats that won't require merging or conversion, and files with clean data.

Sometimes finding a dataset that meets your needs is tricky. If you are having trouble, try these strategies to get back on track.

Can't find a data set that meets your data requirements? Try these strategies:

  • Change your search strategy and try other repositories.
  • Consider different variables for your research question.
  • Ask for help from your librarian or instructor.