1 Overview

The aim of this Delphi study is to generate consensus on how to best share psychoneuroendocrine data openly. The results of this study will inform us about a) whether or not to proceed with the development of a standard data format for hormonal data and, if we should proceed, B) which things to address when developing the format. Before releasing the newly developed standard data format, we will make sure to externally validate templates and provide infrastructure.

For more information on the NODES project please visit: https://www.nodes-pne.eu.

Please note:

The following two code chunks can be traced for transparency. To read them, expand the chunks by clicking “Code > Show all code” or “Show”. They describe which entrys were kept in the final sample.

# drop maria's pilot
data_full <- data_full %>% 
  filter(any(
    is.na(PNE_hormones_text), 
    PNE_hormones_text != "Pilot maria")
    )

# Relevant columns = columns that are not open text (i.e., sample description, statement rating on Likert scale)
relevant_cols <- c("PNE_researcher", "PNE_experience", "PNE_articles", "PNE_species", "PNE_datashare", "G1_hormones", "G2_questionnaire", "G3_physio", "G4_wet", "G5_uni_sample", "G6_uni_hormones", "G7_react_basal", "G8_exp_observe", "F1_file_type", "F2_codebook_varnames", "F3_rawdata", "F4_average", "F5_addonfile", "F6_unit", "F7_standreldtime", "F8_EUtemp", "F9_missings", "D1_sex", "D2_gender", "D3_age", "D4_educat", "D5_occu", "D6_work", "D7_ethn", "D8_nation", "D9_geno", "D10_daynight", "D11_mens_cycle", "D12_oral_contra", "D13_oral_contra_type", "D14_nicotine_t", "D15_alcohol_t", "D16_drugs_t", "D17_fasting", "D18_nicotine_s", "D19_alcohol_s", "D20_drugs_s", "D21_meds_disease", "D22_time_sampling", "D23_time_awakening", "M1_number", "M2_timestamp_global", "M3_timestamp_stand", "M4_exp_baseline_name", "M5_obs_baseline_def", "M6_metadata_core", "M7_manipulation", "M8_assay", "M9_assayCV", "M10_recruitment", "M11_in_exclusion", "M12_biospecimen", "M13_sampling_proc", "M14_hairwash", "M15_hairchem", "M16_hairvol", "M17_storage", "M18_species")

# preregistration: Data of experts that fill in more than 30 % of the survey will be included in the study.
min_answers <- round(length(relevant_cols)*0.3)

data_full <- data_full[-which(rowSums(is.na(data_full[, relevant_cols])) > min_answers), ]
write.csv(data_full, file = "processed/NODES_data_r1_clean.csv")

2 The survey

The questionnaire covered the following topics:

  • Which data is considered “psychoneuroendocrine data”?
  • How should the data format be organized?
  • Which participant data is needed to interpret the data correctly?
  • Which metadata is needed to interpret the data correctly?
  • Which practical aspects should be kept in mind when developing a standard data format?

The survey contained 65 items that were rated on a Likert scale, 45 open text fields that explained participants’ Likert ratings and 45 open text fields that suggested alternative wordings.

3 The sample

Overall, N = 52 participants filled in the survey. They had an average amount of 14.5 years of experience in the field (SD = 9.19, range: 1, 44, missings: 0).

All participants (N = 52) answered the question Have you ever worked, or are you currently working in the field of psychoneuroendocrinology or related fields that work with hormones (in the following, PNE)? wit “yes”.

3.1 PNE articles published

The following Figure depicts the answers to the question How many articles have you published in the field of PNE or related fields?:

3.2 Species worked with

The following Figure depicts the answers to the question Which species are you working with?:

3.3 Data sharing experience

The following Figure depicts the answers to the question How often have you shared (raw or preprocessed) data openly alongside publications, e.g. on an open repository like the Open Science Framework, in the past?:

3.4 Hormones

When asked Which hormones and markers do you commonly work with? (multiple selection possible), participants answered the following:

  • n = 48 worked with cortisol.
  • n = 19 worked with testosterone.
  • n = 16 worked with progesterone.
  • n = 21 worked with estradiol.
  • n = 5 worked with adrenaline.
  • n = 6 worked with noradrenaline.
  • n = 9 worked with oxytocin.
  • n = 7 worked with insulin.
  • n = 3 worked with leptin.
  • n = 1 worked with ghrelin.
  • n = 3 worked with melatonin.
  • n = 2 worked with aldosterone.

The following hormones were additionally named in the open text box:

  • Alpha amylase (n = 5)
  • DHEA (n = 3)
  • cortisone (n = 2)
  • thyroid hormones (n = 2)
  • cytokines (n = 2)
  • pregnenolone (n = 1)
  • endocannabinoids (n = 1)
  • Insulin-like growth factor 1 (IGF1, n = 1)
  • Growth Hormone (GH, n = 1)
  • vasopressin (n = 1)
  • ACTH (n = 1)
  • Somatostatin (n = 1)

4 Results

Consensus was defined as more than 70% ‘(dis)agree’ or ‘strongly (dis)agree’ with a statement (cf. preregistration: https://osf.io/c3y4p). For the calculations of the percentages that form the consensus index, we excluded ratings of participants that stated they had “no expertise” on this item.

In the first round of the Delphi study, we reached a consensus on the majority of items, with only 11 items not reaching the criterion.

4.1 General questions

Click here to show original statements.
  1. A standardized data format for PNE should provide the possibility to include hormonal data (e.g. salivary, blood, urine, or hair concentraitons)
  2. A standardized data format for PNE should provide the possibility to include questionnaire data (e.g., self-reported stress levels).
  3. A standardized data format for PNE should provide the possibility to include physiological data (e.g., heart rate [variability], blood pressure, etc.).
  4. A standardized data format for PNE should provide the possibility to include other wet data (e.g., enzymes like salivary alpha amylase, immune markers like cytokines, or endocannabinoids).
  5. A standardized format for PNE research should be universal for all kinds of sample origins and species (e.g., samples from humans, animals and in vitro research).
  6. A standardized data format for PNE should be universal for all kinds of hormones (e.g. testosterone, cortisol, ghrelin, progesterone, oxytocin, etc.).
  7. A standardized data format for PNE should apply to reactive hormone measurements , e.g., in response to a stimulation (like the Trier Social Stress Test), as well as basal measurements, long-term conditions, and measurements of diurnal rhythm .
  8. A standardized data format for PNE should be flexible enough to accommodate both experimental and observational datasets.

The consensus rates are depicted below:

Statement 5 did not reach the consensus criterion and was rephrased for the second round of the Delphi study.

4.1.1 Rephrasing of statement 5

The original statement read:

A standardized format for PNE research should be universal for all kinds of sample origins and species (e.g., samples from humans, animals and in vitro research).

The following comments were made:

Neutral:

  • Important, but details on species are needed
  • Format should include flexible species modules

Agreeing:

  • Important to support comparative and translational research
  • Important to maintain similarity across species so that researchers could comprehend it
  • Important to support data synthesis

Disgreeing:

  • Difficult to provide a universal approach for all kinds of sample origins without missing important information, as human vs in vitro vs animal research are too distinct
  • Species-specific formats considered more helpful

Based on the comments provided in the open text, we rephrased the statement to:

“A standardized format for PNE research should consist of flexible modules that can accommodate diverse sample origins and species-specific variables (e.g., human, animal, and in vitro samples).”

4.2 Data Format

Click here to show original statements.
  1. A standardized data format for PNE should define a standard file type that is readable by different statistical programs and allows to store data and metadata efficiently (e.g., comma separated value files [csv] or other text files integrated in a JSON, XML, HDF5 file).
  2. A standardized data format for PNE must come with a codebook template and use standardized variable names .
  3. A standardized data format for PNE should include the raw, untransformed data .
  4. A standardized data format for PNE should include the averaged hormone values obtained from replicates (e.g., duplicates, triplicates).
  5. Other data beyond the scope of the standardized format should be stored in additional files with a codebook included.
  6. Standardization should include standards for units (e.g., nmol/L).
  7. Standardization should include standards for relative timestamp referencing (e.g., first sample indexed as T1).
  8. Adopting and extending an existing format (e.g., the Stress-EU template) is preferable to developing a new standard from scratch.
  9. Missing data should be explicitly coded using standardized notations to distinguish between different types of missing data (e.g., not collected vs measurement error).

The consensus rates are depicted below:

Statement 4, 5 and 8 did not reach the consensus criterion and are thus rephrased in the second round of the Delphi study.

4.2.1 Rephrasing of statement 4

The original statement read:

A standardized data format for PNE should include the averaged hormone values obtained from replicates (e.g., duplicates, triplicates).

The following comments were made:

Neutral:

  • As long as the intra- and inter-assay CVs are also reported.

Agreeing:

  • Average values from replicates ensure consistency and reduce noise.

Disgreeing:

  • Raw data preferred to see intra- and inter-assay CVs
  • Average can be calculated from raw data
  • Replicates should be recorded in full
  • Only if original data is also provided
  • Averaging duplicates is not helpful due to different preprocessing

Based on the comments provided in the open text, we rephrased the statement to:

“A standardized data format for PNE should include the hormone values obtained from each assay replicate (e.g., duplicates, triplicates), rather than only the averaged values.”

4.2.2 Rephrasing of statement 5

The original statement read:

Other data beyond the scope of the standardized format should be stored in additional files with a codebook included.

The following comments were made:

Neutral:

  • Should be optional, guided by research question and analysis status
  • Clear examples, scope definitions and templates needed so users know what counts as “other data” and how to format it including conversion formulas

Agreeing:

  • Additional files enhance clarity, reproducibility and prevent data loss while ensuring a rich set of information
  • no additional information should be included to ensure that scripts can be applied

Disgreeing:

  • this may over-complicate adoptability

Based on the comments provided in the open text, we rephrased the statement to:

“Optionally and where appropriate, other data beyond the scope of the standardized data format (e.g., semi-structured interview data, qualitative data) should be stored in additional files with a codebook included.”

4.2.3 Rephrasing of statement 8

The original statement read:

Adopting and extending an existing format (e.g., the Stress-EU template) is preferable to developing a new standard from scratch.

The following comments were made:

Neutral:

  • Depends on suitability and limitations of the databases in question
  • The StressEU format needs extension to meet the requirements of PNE data in general
  • EU template not known by all participants

Agreeing:

  • Leveraging an existing standard can speed adoption and avoid duplicate work

Disgreeing:

none

Based on the comments provided in the open text, we rephrased the statement to:

“Adopting and extending elements of an existing format with proven suitability and good documentation is preferable to developing a new standard from scratch.”

4.3 Subject Info

The respective statements asked participants to rate which subject info should be included in the standard data format.

The consensus rates are depicted below:

4.3.1 Rephrasing of statements

Statements 4 (education), 5 (occupation), 6 (work), and 8 (nation) did not reach the consensus criterion and are thus rephrased in the second round of the Delphi study to be included optionally.

The rephrased statement reads:

“Optionally, researchers may provide variables indicating subject information like education, occupation, working hours or nation.”

4.4 Metadata

These questions were mainly asked to get a sense of which meta data may be necessary to be able to interpret the data correctly.

Click here to show original statements.
  1. A standardized data format for PNE should be suitable for a flexible number of hormone samples .
  2. Relative timestamps should use a similar annotation across measures (e.g. hormonal samples and repeated questionnaire data etc.).
  3. A standardized data format for PNE should have a standardized way to timestamp measurements .
  4. For observational studies involving, e.g., the measurement of diurnal profiles, the time point (t0) should be defined as the first sample collected during the study.
  5. Observational and experimental datasets should - whenever possible - adhere to the same core metadata requirements (e.g., variable naming, timing, assay details) to facilitate data comparison and integration.
  6. The metadata should include information about the experimental manipulations and interventions (e.g., treatment, experimental stress induction, pharmacological manipulation, etc.).
  7. The metadata should include information about the biochemical assay used (e.g., manufacturer, version).
  8. The metadata should include information about the observed inter- and intra-assay coefficients of variation , if replicates are used.
  9. The metadata should contain information on the recruitment of participants (e.g., student sample, community sample, patient sample, etc.). (*in human studies)
  10. The metadata should contain a clear description of the inclusion and exclusion criteria used.
  11. The metadata should include the information on the analyzed biospecimen (e.g., plasma/serum, saliva, hair).
  12. The metadata should include the information on the sampling procedure (e.g., unstimulated/stimulated saliva, use ofcatheter/repeated venipuncture for blood draw,location, length, and thickness of the hair strand).
  13. The metadata for hair should include the information on the frequency of hair washing .
  14. The metadata for hair should include information on whether the hairwas dyed, bleached, or treated otherwise .
  15. The metadata for hair should include information on the mass of hair used for analysis.
  16. The metadata should include the information of sample storage and manipulation (e.g., freezing/thawing cycles).
  17. The metadata should include the species investigated.

The consensus rates are depicted below:

Statement 5 did not reach the consensus criterion and is thus rephrased in the second round of the Delphi study.

4.4.1 Rephrasing of statement 5

The original statement read:

For observational studies involving, e.g., the measurement of diurnal profiles, the time point (t0) should be defined as the first sample collected during the study.

The following comments were made:

Neutral:

  • adoptability in studies with multiple days over several days needs consideration

Agreeing:

  • ensures alignment and facilitates longitudinal analyses or diurnal patterns

Disgreeing:

  • adoptability in studies with multiple interventions difficult
  • continuous timestamping would be more unambiguous
  • different standardizations may be useful depending on context
  • standardized variable names are not necessary, but comprehensibility of the naming is key

Based on the comments provided in the open text, we rephrased the statement to:

“The standardized data format for PNE should include a variable with a clear description of what the respective timepoint represents relative to interventions, circadian rhythm, etc. (e.g. first sample before intervention, first sample of day 1, sample of day 2 assessed at 2 PM, etc.).”

4.5 Convenience

These questions were asked to get a sense of which level of “convenience” is needed for the user to adopt the new format and which support measures should be developed.

Click here to show original statements.
  1. I would always use a PNE standardized data format if it existed.
  2. I would use a PNE standardized data format only if it can be integrated with existing infrastructures (e.g., BIDS, GitHub, OpenNeuro etc.).
  3. I would re-use data from others more frequently if it was shared in a standardized format.
  4. I would encourage others to use a PNE standardized data format if it existed.
  5. I would learn new software/coding to use a PNE standardized data format.
  6. I would change my data storage habits if a PNE standardized data format is available.
  7. I would prefer a website (or something else to click through) over coding myself to transfer my own data into a standardized format.
  8. The existence of a standard data format would encourage me to share my data .

The consensus rates are depicted below:

Statements 2 and 5 did not reach the consensus criterion and are thus rephrased in the second round of the Delphi study.

4.5.1 Rephrasing of statement 2

The original statement read:

I would use a PNE standardized data format only if it can be integrated with existing infrastructures (e.g., BIDS, GitHub, OpenNeuro etc.).

The following comments were made:

Neutral:

  • Integration is helpful but not a strict requirement

Agreeing:

  • Integration enhances usability, encourages adoption, and avoids redundant efforts

Disgreeing:

none

Based on the comments provided in the open text, we rephrased the statement to:

“The possibility to integrate the new standard with existing infrastructures (e.g., BIDS, GitHub, OpenNeuro etc.) is nice-to-have, but not a must.”

4.5.2 Rephrasing of statement 5

The original statement read:

I would learn new software/coding to use a PNE standardized data format.

The following comments were made:

Neutral:

  • Depends on benefits (e.g., quality, sharing)
  • Depends on effort that is required
  • Depends on guidance of how to use the tools

Agreeing:

  • If it improves organization and supports sharing

Disgreeing:

  • Learning new tools should not be necessary

Based on the comments provided in the open text, we rephrased the statement to:

“My willingness to learn new skills to be able to adopt the new standard data structure highly depends on effort, benefits, and available guidance.”

4.6 Addition of new statements

Additionally, based on the open text fields, the following variables are now included in the questionnaire:

The subject information should include:

  • BMI
  • Weight
  • stressor exposure and interventions
  • sampling adherence
  • timing deviations
  • caffeine/stimulants
  • sleep patterns
  • Information on time-varying covariates
  • Information on nested structures (e.g., cohorts, sites, repeated measures)
  • Species (for animals)
  • Strain (for animals)
  • Housing (for animals)
  • Circadian time (for animals)
  • Environment (for animals)
  • Season
  • Year
  • Societal context (e.g., COVID-19, war, discrimination)
  • Puberty
  • Bedtime and awakening time for humans