Local Help for Data Management
File Formats
Best Practices:
- Accessible in the future, non-proprietary, commonly used by research community
- Unencrypted and uncompressed,
- Not proprietary use: PDF not Word, XML or RDF not RDBMS, CSV not XLS
Resources:
- File Formats Table - From UK Data Archive - covers a variety of file formats
- Issues around file formats - California Digital Library (CDL)
Organizing Files/Data
Best Practices:
Folder Structure
- Data and documentation files are in separate folders
- Data files organized according to data type and then according to research activity
- Documentation files are also organized the same way.
- Restrict level of folders to 3 or 4 deep and not to have more than 10 items in each list.
File Naming
- Use brief and meaningful file names
- Avoid spaces and special characters
- Include file versioning in the naming scheme

Resources:
- Organizing Data - from UK Data Archive
Create a Data Register
Create a text document or table that includes:
- what data you're collecting
- format(s)
- naming convention
- location you're storing the data
- owner (who's collecting, creating, or responsible for the data)
- access (who is allowed access)
About this guide
Acknowledgements: Sara Rutter, University of Hawaii at Manoa, for sharing her guide; UC3 (University of California Curation Center).
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Define Your Data Dictionary

Example from Hook, Les A., et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010
Establish a Descriptive File and Dataset Naming Convention
A consistent convention will help you easily identify your files and what they contain. Use abbreviated descriptive information such as
- project
- content or parameter
- location, date and/or time (yyyymmdd for easy sorting; hhmmssTZD for time)
- version number (establish numbering system for versions)
Use numbers, letters, dashes, underscores. Do not use spaces or special characters. Stay concise to be practical.
Using Excel
Best Practices:
- Use in conjunction with a "Data Dictionary" (similar to that listed above) containing information about:
- Variable name
- Variable types
- Codes and Ranges
- Missing values
- Place variable names in row 1
- Always have a unique identifier per entity
- Keep track of changes made to worksheet
- Format columns to matchthe variable type (date, numeric, text, etc.)
- Data entry guidelines:
- Freeze column headings so they will not scroll of the screen
- Enter string variables in a consistent case
- Do not leave any blank rows in the spreadsheet
- Do not include unessential text or fancy formatting in the spreadsheet
- Get rid of formulas - copy the entire spreadsheet into a new sheet using "Values" option
- Sort data with caution (always SAVE first)
- Verify data using double data entry
- Save as .csv for forward compatibility and interoperability
Resources:
- DataUp - An Excel add-in that will assist individuals in documenting and preparing Excel for archiving and sharing
- Elliott, A C. (2006). Preparing data for analysis using Microsoft Excel. Journal of investigative medicine, 54(06), 334-341.
Data Documentation and Metadata
Best Practices:
- Make good use of "readme.txt" files for documenting details
- Document:
- Data collection methods
- Context of data collection
- Variable names and description
- Algorithms used
- Transformations of data from the raw data through analysis
- Software and systems used for analysis
- Use discipline specific metadata standards
- Use a script rather than GUI during data analysis, better for documentation and makes results easier to reproduce
- Incorporate a workflow tool such as Kepler, Taverna or VisTrails
Resources:
- Metadata editors - a good list from DataONE
- Metadata: Distributed Active Archive Center (Oak Ridge National Laboratory)
- Schema Library (Altova) of common industry and cross-industry standards
- Seeing Standards: A Visualization of the Metadata Universe documents 105 standards used by the cultural heritage community
- DDI: Metadata specification for the social and behavioral sciences
Effective Data Practices: References
- Data Management 101 - California Digital Library's DataUp project
- Best Practices for Preparing Environmental Data Sets to Share and Archive (pdf) by Hook et al, 2010.
- DataOne Best Practices database.
- UK Data Archive: how-to, resources on data management.
- Some Simple Guidelines for Effective Data Management by Elizabeth T. Borer et al., Bulletin of the Ecological Society of America 90(2) 205-214, including:
- store a copy of your original rough data as a read-only, making copies to use in analysis
- provide descriptive filenames and designate the first row of tables as a header
- organize records in rows, using column headings that will allow analysis within columns rather than across columns, example: SITE YEAR RAIN TEMP SPEC_NAME POP
- set up your tables so that you do not have to add columns when adding data
- use ASCII characters to minimize translation problems with software programs
- your data tables should only contain data, comments should be in a read.me text file that accompanies the table
- DataCite on why and how to cite data


Loading...
