Skip to Main Content

Data Module #4: Keeping Your Data Organized

Keeping Your Data Organized

Keeping things neat and organized, whether it be your dorm room or your class notes, makes your life much easier. The same is true with your research data. 

You work hard gathering your data. It may be very difficult or impossible to recreate if it is lost or corrupted. Successful completion of your research depends on your data being readable. Take the time to understand best practices for data organization.

Proper data organization makes it easier to use your data.  An organized file structure and standard naming tells you what data are contained in a particular file and when it was created. Version control help you keep track of steps taken in your research process. 

Having organized data also makes your research more accessible to other researchers. People who work with your data will want to know what is in each data file, how to access it, when it was created, and much more. Organized data allows others to replicate your research or use your data for further research.

This module lists tips and best practices for data organization.

Avoiding Pitfalls: Some Data Organization Case Studies

Here are some data organization problems that have been experienced by other students:

"I found the data that I wanted about tobacco use at the Center for Disease Control web site. In the course of my research, I decided to include the exposure to second hand smoke as part of my analysis. Because I didn't include any variables about second hand smoke in my original data download, I had to go back and download data again. This time, I included the additional variables that I wanted. Unfortunately, I saved the new data file in the same place as the first. Also, I gave them both similar names that didn't distinguish between the two versions. I accidentally used the wrong data file for part of my analysis and now have to redo a bunch of work."

Clear and consistent naming of data files is very important in order to avoid confusion during your research.  Always give your files descriptive names that include information such as download date and version.  

"Last year when I was a junior, I did a survey about people's sleep habits. This year as a senior, I'm planning to use that data again along with collecting some new data on the same topic for my honors project. Unfortunately, I did not give my variables clear names and I did not create any documentation on what steps I took with the data for my analysis. The data I have from last year is just my final analysis results. Now I don't remember what all the variable names mean or all the steps that I took to do my data analysis."

In addition to giving files and folders clear and descriptive names, it is important to do the same within a dataset.  Clearly labeling fields and variables within your dataset will make it easier to identify what is what when you (or others) use it again in the future. Save information about the steps that you took in a documentation folder