Artwork

Innhold levert av Winfried Adalbert Etzel - DAMA Norway. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Winfried Adalbert Etzel - DAMA Norway eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
Player FM - Podcast-app
Gå frakoblet med Player FM -appen!

2#18 - Scientific Data Management (Eng)

37:10
 
Del
 

Manage episode 366481102 series 2940030
Innhold levert av Winfried Adalbert Etzel - DAMA Norway. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Winfried Adalbert Etzel - DAMA Norway eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

«How can we consolidate data and describe it in a standardized way?»

Scientific Data management has some unique challenges, but also provides multiple learnings for other sectors. We focused on Data Storage and Operations as a knowledge area in DMBok. A topic that is often viewed as basic, often not in focus, but is a fundamental part of data operations.

I talked to Nicolai Jørgensen at NMBU - Norwegian University of Life Sciences. Nicolai has a really diverse background. His journey in data started in 1983! In his free time, Nicolai spends time with photography and AI for text to image generation

Here are my key take aways:

Scientific Data Management

  • To describe data in a unified way, we need standards, like Dublin Core or Darwin Core for scientific data.
  • Data is an embedded part of Science and Research - you can’t have those without data.
  • You need to make sure you collect the right data, the right amount of data, valid data, +++
  • You need to optimize your amount of time, energy and expenses when collecting and validating data.
  • You need to standardize the way you collect data, to ensure that it can be verified.
  • There needs to be an audit trail (lineage) between the data you have collected and the result presented in a publication.
  • Data needs to be freely available for research and testing hypothesis.
  • Data needs to be findable, accessible and interoperable, but a also reusable.
  • ML algorithms can help extract and find changes to scientific data, that is internationally available.
  • Describing data is key to tap into knowledge - for that you need metadata.
  • In times of AI and ML, Metadata is still the key to uncover data.
  • The development of AI models is a race - maybe we need to pause and get a better picture of cause and effect, and most of all risk.

Standardizing Infrastructure

  • How can were standardize on the infrastructure for research projects
    • Minimize or get rid of volatile data storage and infrastructure
    • Standardize data storage solutions
    • Secure what needs to be secured
    • Splitt out sensitive or classified data and store separate (eg. Personal data)
    • Train your end users and educate data stewards
  • Have good guidelines for researchers on how to store, use and manipulate data.
  • There is a direct correlation between disc-space use and sustainability.
  • Storage is cheap, is a correct saying, if you look at its in isolation - but in the bigger picture the cost is just moved.
  • Just adding more storage doesn’t solve your problems, it might just yet increase them.

Long-term Preservation & Integrity

  • To preserve data for long-term you need to
    • Encapsulate data at a certain level
    • Standardize the way you describe the data
    • Upload data package to a common governed platform
    • Enclose if there is a government body that can take responsibility to preserve your data for the time necessary
    • Ensure that metadata is machine-readable
    • Formats like XML provide the possibility to read the data by both machines and humans
  • Research integrity: conducting research in a way which allows others to have trust and confidence in the methods used and the findings in that result.
  • Ensure lineage and audit trails for your scientific data.
  • Fake data, data fabrication, are serious issues in research - the understanding and methods for keeping data integrity at the highest possible level is not getting easier, but increasingly important.
  • Changes to data (change logs, change data capture, etc) can be studied as well; you can build models to build scenarios around data changes.
  • You can fetch data from other sources to enrich the quality of your data.

  continue reading

56 episoder

Artwork
iconDel
 
Manage episode 366481102 series 2940030
Innhold levert av Winfried Adalbert Etzel - DAMA Norway. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Winfried Adalbert Etzel - DAMA Norway eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

«How can we consolidate data and describe it in a standardized way?»

Scientific Data management has some unique challenges, but also provides multiple learnings for other sectors. We focused on Data Storage and Operations as a knowledge area in DMBok. A topic that is often viewed as basic, often not in focus, but is a fundamental part of data operations.

I talked to Nicolai Jørgensen at NMBU - Norwegian University of Life Sciences. Nicolai has a really diverse background. His journey in data started in 1983! In his free time, Nicolai spends time with photography and AI for text to image generation

Here are my key take aways:

Scientific Data Management

  • To describe data in a unified way, we need standards, like Dublin Core or Darwin Core for scientific data.
  • Data is an embedded part of Science and Research - you can’t have those without data.
  • You need to make sure you collect the right data, the right amount of data, valid data, +++
  • You need to optimize your amount of time, energy and expenses when collecting and validating data.
  • You need to standardize the way you collect data, to ensure that it can be verified.
  • There needs to be an audit trail (lineage) between the data you have collected and the result presented in a publication.
  • Data needs to be freely available for research and testing hypothesis.
  • Data needs to be findable, accessible and interoperable, but a also reusable.
  • ML algorithms can help extract and find changes to scientific data, that is internationally available.
  • Describing data is key to tap into knowledge - for that you need metadata.
  • In times of AI and ML, Metadata is still the key to uncover data.
  • The development of AI models is a race - maybe we need to pause and get a better picture of cause and effect, and most of all risk.

Standardizing Infrastructure

  • How can were standardize on the infrastructure for research projects
    • Minimize or get rid of volatile data storage and infrastructure
    • Standardize data storage solutions
    • Secure what needs to be secured
    • Splitt out sensitive or classified data and store separate (eg. Personal data)
    • Train your end users and educate data stewards
  • Have good guidelines for researchers on how to store, use and manipulate data.
  • There is a direct correlation between disc-space use and sustainability.
  • Storage is cheap, is a correct saying, if you look at its in isolation - but in the bigger picture the cost is just moved.
  • Just adding more storage doesn’t solve your problems, it might just yet increase them.

Long-term Preservation & Integrity

  • To preserve data for long-term you need to
    • Encapsulate data at a certain level
    • Standardize the way you describe the data
    • Upload data package to a common governed platform
    • Enclose if there is a government body that can take responsibility to preserve your data for the time necessary
    • Ensure that metadata is machine-readable
    • Formats like XML provide the possibility to read the data by both machines and humans
  • Research integrity: conducting research in a way which allows others to have trust and confidence in the methods used and the findings in that result.
  • Ensure lineage and audit trails for your scientific data.
  • Fake data, data fabrication, are serious issues in research - the understanding and methods for keeping data integrity at the highest possible level is not getting easier, but increasingly important.
  • Changes to data (change logs, change data capture, etc) can be studied as well; you can build models to build scenarios around data changes.
  • You can fetch data from other sources to enrich the quality of your data.

  continue reading

56 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.

 

Hurtigreferanseguide

Copyright 2024 | Sitemap | Personvern | Vilkår for bruk | | opphavsrett