Skip to Main Content

Data Guide: Working with Data

A guide to citing, managing, and incorporating data into your research.

Aggregate Data

Data that has already been analyzed (i.e. already tabulated or summarized). 

Data

Any information that has been collected for analysis or reference. Data can take the form of numbers and statistics, text, symbols, or multimedia such as images, videos, sounds and maps. Data that has been collected but not yet processed, cleaned or analysed is known as ‘raw’ or ‘primary’ data.

Source: The Alan Turing Institute

Dataset

A collection of data that can be analysed to obtain information. Datasets are often collected and stored in a tabular format, with each column corresponding to a different variable (e.g. height, weight, age) and each row corresponding to a different entry or ‘record’ (e.g. a different person). The data might come from real-life observations and measurements, or it can be generated artificially (see ‘synthetic data’).

Source: The Alan Turing Institute

Repository

Repositories are online resources for accessing and sharing data, and they can often be referred to as “databases” or “archives.”  Repositories allow researchers to place their data in a secure location in order to enhance sharing and to meet mandates of granting agencies. Many repositories are dedicated to a particular subject, geographic location, or  organization/institution. 

Microdata

Microdata are unit-level data obtained from sample surveys, censuses, and administrative systems. They provide information about characteristics of individual people or entities such as households, business enterprises, facilities, farms or even geographical areas such as villages or towns. They allow in-depth understanding of socio-economic issues by studying relationships and interactions among phenomena. Microdata are thus key to designing projects and formulating policies, targeting interventions and monitoring and measuring the impact and results of projects, interventions and policies.

Microdata files provide unanalyzed, “raw” data consisting of individual records. You can analyze them with statistical software (i.e. R, STATA, etc.)  to create custom statistics using the specified weights, universe, and geography. They come in a variety of formats, including ASCII, CSV, and formats designed for specific statistical analysis software, such as R, STATA, SPSS, SAS, etc. 

Source: World Bank Knowledge Base

Search Engine

A search engine is designed to find answers to queries in a collection of information, most commonly the World Wide Web. A Web search engine, such as Google,  produces a list of web pages or websites that contain or relate to the terms in a query entered by the user into a field called a search bar.

Source: Britannica Academic

Study

In a study, all the information is collected at a single time or for a single purpose or by a single principal investigator. A study consists of one or more files. Examples include the General Social Survey and the Latino Immigrant National Election Study (LINES).

A study series is a named collection of related studies. Examples include the American Public Opinion and United States Foreign Policy Series and the Los Angeles Family and Neighborhood Survey.

Statistic

In the context of data, statistics are data that has already been analyzed, versus raw data, which has been collected and input, into a spreadsheet document or a database, but not yet analyzed.  Statistics can take the form of averages, percentages, counts, or differences, and often are accompanied by charts, figures, and simple tables.