6 Data and File Management

FORRT relies on open, accessible, and well-organized data and documentation to support collaboration across its distributed, volunteer-based teams. This chapter outlines best practices for managing files, documentation, and any data generated through FORRT projects.

6.1 File Storage and Collaboration

All files should be stored in centralized, team-specific folders using approved cloud-based services. Commonly used platforms include: - Google Drive (for editable documents, slides, planning spreadsheets) - OSF (for archival storage, public access, and preprints) - GitHub (for websites, lesson plans, or code)

6.1.1 Principles

  • Use shared team folders, not personal accounts.
  • Share links instead of downloading/uploading versions to avoid version conflicts.
  • Name files and folders clearly (e.g., “FORRT_ReplicationHub_WorkshopPlanning_2025-04”).
  • Keep public and private files clearly separated and labeled.

[INSERT LINK TO CURRENT SHARED FOLDER DIRECTORY or ACCESS REQUEST INFO]

[OPEN QUESTION: Should we designate a lightweight “documentation steward” role in each team?]

6.2 Naming and Version Control

To make files easier to find and track: - Use clear, descriptive titles with project names and dates. - Add version numbers or initials for clarity when appropriate. - Use platform versioning (e.g., Google Drive revision history, GitHub commits) rather than creating duplicate copies.

Templates for file names and folders are available here:
[INSERT LINK TO NAMING GUIDELINES OR TEMPLATE]

6.3 Data Management for Projects

Most FORRT work is educational or resource-based and does not involve sensitive data. However, when data is collected (e.g., through surveys, interviews, analytics), teams must ensure ethical handling and documentation.

6.3.1 What to Consider

  • What data is being collected (e.g., feedback, demographic info)?
  • How will it be stored securely?
  • Who has access?
  • Will it be shared, published, or deleted?

A simple Data Management Plan (DMP) template is available and should be used for any project involving participant or user data. [INSERT LINK TO DMP TEMPLATE]

6.3.2 Roles and Responsibilities

  • Team/project leads are responsible for coordinating data handling practices.
  • If unsure, consult Team Ethics or the Steering Council for guidance.

[OPEN QUESTION: Should we create a central log or register of ongoing data-collecting projects across FORRT?]

6.4 Open Access and Licensing

All final outputs (documents, slides, lesson plans, tools) should be: - Shared publicly via OSF, GitHub, or the FORRT website - Accompanied by a clear usage license (typically CC BY-NC-SA 4.0)

Where possible, include: - A README file explaining what the resource is, how it was created, and how to use it - Metadata (title, creators, keywords, version, date, link to related projects)

For code and software, use an OSI-approved open source license (e.g., MIT, GPL) and host on GitHub or similar.

[INSERT LINK TO LICENSEING GUIDANCE OR METADATA CHECKLIST]

6.5 Backups and Archiving

Teams are encouraged to: - Regularly review shared folders to clean up outdated drafts or duplicates - Back up critical files in at least one additional location (e.g., OSF)

[OPEN QUESTION: Should we implement a quarterly “digital housekeeping” check-in with each team?]

6.6 Summary

Area Practice / Tool Responsible Party
File storage Google Drive, OSF, GitHub All contributors
Naming/version control Clear titles + platform versioning All contributors
Data collection projects Fill out simple DMP Project/team leads
Licensing and access CC BY-NC-SA for docs, OSI license for code Team leads
Archiving Publish to OSF, tag final versions Team leads or stewards

[INSERT LINKS: DMP template, license guidance, metadata checklist (could be derived from the below), shared folders]


6.7 iRise Minimal metadata standards

If we want to retain / adopt those, they should probably go into a separate file?

All data outputs will be complemented with metadata, as outlined in the DMP. The following elements are required:

  1. Title: title describing the data output at hand.
  2. Principal Investigator or Creator: the main person(s) responsible for the intellectual content, with affiliation(s).
  3. Contributor(s): any other person(s) who contributed to the data output with affiliation(s).
  4. Funding: funding source of the project leading to the data output (iRISE and additional funding sources must be acknowledged here).
  5. References and citations: Citations to relevant work or other objects/material leading to the data output or using the data output. Only cite those articles or material that are important for the data output to be reusable and interpretable. Specifically, if applicable, cite any software or material needed to interact with the data.
  6. Summary | Description: A textual description of the aims of data collection and a summary of the data output itself (in the form of a short abstract).
  7. Keywords: List of relevant keywords making the metadata findable.
  8. Coverage: when and where was the data collection - or the project - started and when was it finalized.
  9. Date of publication: Date of data deposition (first – and new versions)
  10. Unit of observation
  11. Population: information on the population of interest represented or targeted in the data output.
  12. Data type and format: information on the type and format of the data collected.
  13. Sampling and weighting: information on whether any sampling or weighting was used in the data acquisition, and if so, which type or method of sampling and/or weighting was used.
  14. Mode of Collection: information on how the data was collected, on the method used for data collection.
  15. DOI
  16. Licenses and restrictions
  17. Ethical considerations: if ethical approval was needed and acquired, the metadata should link or cite the ethics approval.
  18. Description of variables: if possible, this should be done in a separate code book or data dictionary.