The data management pilot addresses three levels
The objective of this pilot is to make available the datasets collected in EMPHASIS installations to a large community of plant scientists, following the FAIR principles (findable, accessible, interoperable, reusable). The rationale is that a user can reanalyze published datasets and/or perform meta-analyses compiling datasets. Three levels are considered for that.
Level 1: Data Identification and organization ('Reusable')
Sensors, plants or plots and vectors are identified with persistent and non-ambiguous identifiers (e.g. URIs), in particular the spatial positions of sensors, plants and plots are traced. Environmental variables are organized so all steps between sensor outputs to time courses and/or spatial distribution of variables are traced so a user can understand them. Phenotypic variables are traced via entities (e.g. 'meristem' or 'air', quality (e.g. 'temperature'), methods (e.g. 'thermocouple' or 'thermal infrared') and units (e.g. °C) and comply with MIAPPE specifications whenever relevant. Software tools help to map them onto public ontologies. Events during experiments are traced using public or local ontologies. Datasets are organized in files with the necessary metadata. These processes are compatible with those in the infrastructure ELIXIR and in a common working group MIAPPE.
The pilot organizes seminars and hands-on courses, and proposes software tools, for examples for generating URIs or for naming variables.
Level 2: Local information system for environmental and phenotypic datasets ('Findable', 'Accessible', 'Reusable' at the level of a single local infrastructure)
The local infrastructure installs an information system that allows connecting information for rapid query of any combination of information (e.g. trait values for plants of a given genotype in a given range of environments, across experiments), and rapid detection of problems associated with an experiment. This is based on stabilized ontologies and the use of semantic web.
The projects EPPN2020 and EMPHASIS have developed such information systems (PHIS, PIPPA) that are in open source. Three training session and follow up were organized. Information systems are currently installed or in the process of installation in several local infrastructures (Montpellier, Juelich, Ghent, Louvain la Neuve, Wageningen, Toulouse, Clermont-Ferrand, Angers). The long-term goal is to diffuse this information system to new countries of EMPHASIS.
Level 3: A multi-local information system facilitating meta-analyses ('Findable', 'Accessible', ‘Reusable’,Interoperable' at the level of the EU)
This third step connects local infrastructures that have reached level 2. It allows any user to perform the same queries as in level 2, but at a multi-local level. A software, 'Emphasis layer" is currently under development for that by using existing tools, it is tested in three locations.