Skip to Main Content

Research Data Management

RDM Resources at Case Western Reserve University

Best Practices

Keep Your Data Accessible to You

  • Store your temporary working files somewhere easily accessible, like on a local hard drive or shared server.
  • While cloud storage is a convenient solution for storage and sharing, there are often concerns about data privacy and preservation. Be sure to only put data in the cloud that you are comfortable with and that your funding and/or departmental requirements allow.
  • For long-term storage, data should be put into preservation systems that are well-managed. Long-term data storage options are available with [U]Tech. Explore those options here: https://case.edu/utech/departments/research-computing/services/research-data-storage
  • Don't keep the only copy your data on a thumb drive or portable hard drive, as it can be easily lost or stolen.
  • Think about file formats that have a long life and that are readable by many programs. Formats like ascii, .txt, .csv, .pdf are great for long term accessibility.
  • Whenever converting your data from one format to another, keep a copy of the original file and format to avoid loss or corruption of your important files.
  • Online platforms like OSF can help your group organize, version, share, and preserve your data.

Backup, Backup, Backup

  • The general rule is to keep 3 copies of your data: 2 copies onsite, 1 offsite.
  • Backup your data regularly and frequently - automate the process if possible. This may mean weekly duplication of your working files to a separate drive, syncing your folders to box, or dedicating a block of time every week to ensure you've copied everything to another location.

Data Management Plan

  • Use a DMP to plan your project and data collection and ensure you are complying with the needs of your departmental and/or funding agencies.
  • A DMP is not a replacement for good data management practices, but it can set you on the right path if it is consistently followed.

Preservation

  • There is a difference between storing and preserving your data. True preservation is the ongoing process of making sure your data are secure and accessible for future generations. The National Library of Medicine has a great breakdown and set of resources about data preservation here: https://nnlm.gov/data/thesaurus/data-preservation
  • Identify data with long-term value. Preserve the raw data and any intermediate/derived products that are expensive to reproduce or can be directly used for analysis. Preserve any scripted code that was used to clean and transform the raw data.
  • Software changes, so be sure to preserve a copy of the software necessary to process your data if it is in danger of becoming outdated. 
  • Save tabular data in a simple format like .csv to ensure its future accessibility.
  • If possible, save copies of your data in uncompressed and unencrypted formats. Corruption can occur during the decompression and decrypting process.
  • If you need long-term data preservation and storage, contact [U]Tech to discuss your options.

Organization

  • Establish a consistent, descriptive filing system that is intelligible to future researchers and does not rely on your own inside knowledge of your research.
  • A descriptive directory and file-naming structure should guide users through the contents to help them find whatever they are looking for.

Naming conventions

  • Use consistent, descriptive filenames that reliably indicate the contents of the file.
  • If your discipline requires or recommends particular naming conventions, use them!
  • Some best practices for naming conventions include:
    • Do not use spaces between words. Use either camelcase or underscores to separate words
    • Include LastnameFirstname descriptors where appropriate.
    • Use a consistent date format: YYYY-MM-DD, YYYY_MM_DD, or YYYYMMDD.
      • Avoid using MM-DD-YYYY formats
  • Do not append vague descriptors like "latest" or "final" to your file versions. Instead, append the version's date or a consistently iterated version number.

Clean your data!

  • Mistakes happen, and often researchers don't notice at first. If you are manually entering data, be sure to double-check th eentries for consistency and duplication. Often having a fresh set of eyes will help to catch errors before they become problems.
  • Tabular data can often be error checked by sorting the fields alphanumerically to catch simple typos, extra spaces, or otherwise extreme outliers. Be sure to save your data before sorting it to ensure you do not disrupt the records!
  • Programs like OpenRefine ( http://openrefine.org/), are useful for checking for consistency in coding for records and variables, catching missing values, transforming data, and much more.