6 Data and File Management
FORRT relies on open, accessible, and well-organized data and documentation to support collaboration across its distributed, volunteer-based teams. This chapter outlines best practices for managing files, documentation, and any data generated through FORRT projects.
6.1 File Storage and Collaboration
All files should be stored in centralized, team-specific folders using approved cloud-based services. Commonly used platforms include: - Google Drive (for editable documents, slides, planning spreadsheets) - OSF (for archival storage, public access, and preprints) - GitHub (for websites, lesson plans, or code)
6.1.1 Principles
- Use shared team folders, not personal accounts.
- Share links instead of downloading/uploading versions to avoid version conflicts.
- Name files and folders clearly (e.g., “FORRT_ReplicationHub_WorkshopPlanning_2025-04”).
- Keep public and private files clearly separated and labeled.
[INSERT LINK TO CURRENT SHARED FOLDER DIRECTORY or ACCESS REQUEST INFO]
[OPEN QUESTION: Should we designate a lightweight “documentation steward” role in each team?]
6.2 Naming and Version Control
To make files easier to find and track: - Use clear, descriptive titles with project names and dates. - Add version numbers or initials for clarity when appropriate. - Use platform versioning (e.g., Google Drive revision history, GitHub commits) rather than creating duplicate copies.
Templates for file names and folders are available here:
[INSERT LINK TO NAMING GUIDELINES OR TEMPLATE]
6.3 Data Management for Projects
Most FORRT work is educational or resource-based and does not involve sensitive data. However, when data is collected (e.g., through surveys, interviews, analytics), teams must ensure ethical handling and documentation.
6.3.1 What to Consider
- What data is being collected (e.g., feedback, demographic info)?
- How will it be stored securely?
- Who has access?
- Will it be shared, published, or deleted?
A simple Data Management Plan (DMP) template is available and should be used for any project involving participant or user data. [INSERT LINK TO DMP TEMPLATE]
6.4 Open Access and Licensing
All final outputs (documents, slides, lesson plans, tools) should be: - Shared publicly via OSF, GitHub, or the FORRT website - Accompanied by a clear usage license (typically CC BY-NC-SA 4.0)
Where possible, include: - A README file explaining what the resource is, how it was created, and how to use it - Metadata (title, creators, keywords, version, date, link to related projects)
For code and software, use an OSI-approved open source license (e.g., MIT, GPL) and host on GitHub or similar.
[INSERT LINK TO LICENSEING GUIDANCE OR METADATA CHECKLIST]
6.5 Backups and Archiving
Teams are encouraged to: - Regularly review shared folders to clean up outdated drafts or duplicates - Back up critical files in at least one additional location (e.g., OSF)
[OPEN QUESTION: Should we implement a quarterly “digital housekeeping” check-in with each team?]
6.6 Summary
| Area | Practice / Tool | Responsible Party |
|---|---|---|
| File storage | Google Drive, OSF, GitHub | All contributors |
| Naming/version control | Clear titles + platform versioning | All contributors |
| Data collection projects | Fill out simple DMP | Project/team leads |
| Licensing and access | CC BY-NC-SA for docs, OSI license for code | Team leads |
| Archiving | Publish to OSF, tag final versions | Team leads or stewards |
[INSERT LINKS: DMP template, license guidance, metadata checklist (could be derived from the below), shared folders]
6.7 iRise Minimal metadata standards
If we want to retain / adopt those, they should probably go into a separate file?
All data outputs will be complemented with metadata, as outlined in the DMP. The following elements are required:
- Title: title describing the data output at hand.
- Principal Investigator or Creator: the main person(s) responsible for the intellectual content, with affiliation(s).
- Contributor(s): any other person(s) who contributed to the data output with affiliation(s).
- Funding: funding source of the project leading to the data output (iRISE and additional funding sources must be acknowledged here).
- References and citations: Citations to relevant work or other objects/material leading to the data output or using the data output. Only cite those articles or material that are important for the data output to be reusable and interpretable. Specifically, if applicable, cite any software or material needed to interact with the data.
- Summary | Description: A textual description of the aims of data collection and a summary of the data output itself (in the form of a short abstract).
- Keywords: List of relevant keywords making the metadata findable.
- Coverage: when and where was the data collection - or the project - started and when was it finalized.
- Date of publication: Date of data deposition (first – and new versions)
- Unit of observation
- Population: information on the population of interest represented or targeted in the data output.
- Data type and format: information on the type and format of the data collected.
- Sampling and weighting: information on whether any sampling or weighting was used in the data acquisition, and if so, which type or method of sampling and/or weighting was used.
- Mode of Collection: information on how the data was collected, on the method used for data collection.
- DOI
- Licenses and restrictions
- Ethical considerations: if ethical approval was needed and acquired, the metadata should link or cite the ethics approval.
- Description of variables: if possible, this should be done in a separate code book or data dictionary.