Good data practices

Best practices for creating reusable data publications

So, you want to share your research data in Dryad, but are unsure where to start or what you 'should' share? Don't worry, it's not always clear how to craft a dataset with reusability in mind.

We want to help you share your research with the scientific community to increase its visibility and foster collaborations. The following guidelines will help make your Dryad datasets as Findable, Accessible, Interoperable, and Reusable (FAIR) as possible.

No time to dig into the details? Check out our Quickstart guide to data sharing.

Gather all relevant data needed for reanalysis

Consider all of the information necessary for one to reuse your dataset and replicate the analyses in your publication. Gather and organize everything—this may include experimental method details, raw data files, organized data tables, scripts, data visualizations, and statistical output. There are often several levels of data processing involved in a project, and it is important to provide adequate detail. That said, don't hesitate to edit out superfluous or ambiguous content that would confuse others. Additionally, if applicable, please do not include any data visualizations that will appear in the published article, e.g., data figures and/or other supplementary material already present within the manuscript.
Unprocessed and processed data: Providing both unprocessed and processed data can be valuable for re-analysis, assuming the data are of a reasonable size. Including unprocessed raw digital data from a recording instrument or database ensures that no details are lost, and any issues in the processing pipeline can be discovered and rectified. Processed data are cleaned, formatted, organized and ready for reuse by others.
Code: Programming scripts communicate to others all of the steps in processing and analysis. Including them ensures that your results are reproducible by others. Informative comments throughout your code will help future users understand its logic.
External resources: Links to associated data stored in other data repositories, code in software repositories, and associated publications can be included in "Related works".

Make sure your data are shareable

All files submitted to Dryad must abide by the terms of the Creative Commons Zero v1.0 Universal waiver. Under these terms, the author releases the data to the public domain.
- Review all files and ensure they conform to CC0 terms and are not covered by copyright claims or other terms-of-use. We cannot archive any files that contain licenses incompatible with CC0 (GNU GPL, MIT, CC-BY, etc.), but we can link to content in a dedicated software repository (Github, Zenodo, Bitbucket, or CRAN, etc.).
- For more information see Good data practices: Removing barriers to data reuse with CC0 licensing, Why Does Dryad Use CC0, and Some dos and don'ts for CC0.
Human subjects data must be properly anonymized and prepared under applicable legal and ethical guidelines (see tips for human subjects data).
If you work with vulnerable or endangered species, it may be necessary to mask location to prevent any further threat to the population. Please review our recommendations for responsibly sharing data collected from vulnerable species (see tips for endangered species data).

Make sure your data are accessible

To maximize accessibility, reusability and preservability, share data in non-proprietary open formats when possible (see preferred formats). This ensures your data will be accessible by most people.
Review files for errors. Common errors include missing data, misnamed files, mislabeled variables, incorrectly formatted values, and corrupted file archives. It may be helpful to run data validation tools before sharing. For example, if you are working with tabular datasets, a tool like Frictionless validation can identify missing data and data type formatting problems.
File compression may be necessary to reduce large file sizes or directories of files. Files can be bundled together in compressed file archives (.zip, .7z, .tar.gz). If you have a large directory of files, and there is a logical way to split it into subdirectories and compress those, we encourage you to do so. We recommend not exceeding 10GB each.

Organize files in a logical schema

File naming

Name files and directories in a consistent and descriptive manner. Avoid vague and ambiguous filenames. Filenames should be concise, informative, and unique (see Stanford's best practices for file naming).

Avoid blank spaces and special characters (' '!@#$%^&") in filenames because they can be problematic for computers to interpret. Use a common letter case pattern because they are easily read by both machines and people:

Kebab-case: The-quick-brown-fox-jumps-over-the-lazy-dog.txt
CamelCase: TheQuickBrownFoxJumpsOverTheLazyDog.txt
Snake_case: The_quick_brown_fox_jumps_over_the_lazy_dog.txt

Include the following information when naming files:

Date of study
Project name
Type of data or analysis
File extension (.csv, .txt, .R, .xls, .tar.gz, etc.)

Examples

A) Organized by File type

DatasetA.tar.gz
|- Data/
|  |- Processed/
|  |- Raw/
|- Results/
|  |- Figure1.tif
|  |- Figure2.tif
|  |- Models/

B) Organized by Analysis

DatasetB.tar.gz
|- Figure1/
|  |- Data/
|  |- Results
|  |  |- Figure1.tif
|- Figure2/
|  |- Data/
|  |- Results/
|  |  |- Figure2.tif

Describe your dataset in a README file

A README is a documentation file that helps others interpret and reanalyze your data. Your README should be a clear and concise description of all the components of your dataset. The Dryad submission process includes the creation of a README.

README files created or imported in Dryad are included in all downloads of your full dataset, so this information can identify and explain your dataset to users regardless of whether it is accessed through the Dryad web portal. So that your README can be interpreted both on the web and as part of a download, Dryad README files are delivered in markdown, a language for text formatting that is also easily legible when opened as plain text.

If you wish to create a README using your preferred markdown editor, we provide a README template to guide you through the creation of your file.

Details to include:

Summary of experimental efforts underlying this dataset
Description of file structure and contents
Definitions of all variables, abbreviations, missing data codes, and units
Links to other publicly accessible locations of the data
Other sources, if any, that the data was derived from
Any other details that may influence reuse or replication efforts

Details not to include:

Author names, or any other potentially identifying information, if the data is being submitted to a journal with a double-anonymous review process in place

Ready to get started? Log in and go to the "My datasets" to begin your data submission now!

Good data practices

Gather all relevant data needed for reanalysis

Make sure your data are shareable

Make sure your data are accessible

Organize files in a logical schema

File naming

Examples

A) Organized by File type

B) Organized by Analysis

Describe your dataset in a README file

Details to include:

Details not to include:

Further reading

Examples of good reusability practices

Additional resources