GCB 2023 - Workshops

Workshops

will take place on Monday, 11th September 2023

Overview:

WS1) 1001 reasons for a Data Management Plan

WS2) How to analyze spatial transcriptomics data (visium 10x) with R and Python

WS3) Standardizing and harmonizing NGS analysis workflows

WS4) How to interpret multi-omics single-cell and spatial transcriptomics data (Visium, 10x, MERFISH) with R and Python (Advanced level)

WS5) Federated Ensemble Learning for Biomedical Data

WS6) Bioinformatics education

WS7) Constrained-based modelling in Python applied to plant systems

WS8) Build your own interactive, online network-based drug repurposing application using Drugst.One

WS9) FeatureCloud: privacy-aware federated learning in biomedicine

Detailed Workshop Programme:

WS1) 1001 reasons for a Data Management Plan

Organizers: Helena Schnitzer (ELIXIR Germany); Daniel Wibberg (Forschungszentrum Jülich)

Description: Have you ever wondered what research data management really is? Why is it so
important? Then ELIXIR-DE has created the perfect training for you!

More and more funders require the establishment of research data management
plans to distribute their grants. Over the course of just three hours, you get the
chance to learn about how to transfer research project proposals into proper Data
Management Plans (DMPs). Data management experts will guide you through the
basics of research data management, the dos and don’ts and how to improve the
management of the data produced in research projects.

After a short introduction of the research data life cycle and the FAIR data principles,
we will explore in multiple hands-on sessions what a data management plan (DMP)
is. We will sink our teeth into components, language, software and examples of
DMPs. In light of the FAIR principles, we will evaluate possible problems and
solutions and self-assess a drafted DMP for our own projects.

Learning goals:
● Know the research life cycle
● Know the FAIR principles and understand their importance
● Know the expectations for a DMP
● Be able to sketch a DMP
● Know the RDMkit

Provisional schedule:
● Basics: RDM & FAIR
● Parts of DMP
● Short Break
● Write a DMP sketch
● ELIXIR RDMkit and other DMP tools

WS2) How to analyze spatial transcriptomics data (visium 10x) with R and Python
Organizers: Sonja Hänzelmann; Fabian Hausmann; Robin Khatri (Universität Hamburg)

Description: Spatial transcriptomics represents a seminal development in the field of molecular biology, offering new possibilities for analyzing gene expression patterns within the context of tissue architecture. The Visium 10x platform is a commercially available and widely used platform providing researchers with a toolkit for comprehending cellular biology in its anatomical context.
In this workshop, attendees will receive a short introduction to spatial transcriptomics and the
Visium 10x platform and a comprehensive hands-on session to help understand the data
analysis process. The workshop will cover key concepts, including data visualization,
interpretation of gene expression patterns, and clustering analyses.
Overall, this workshop is designed for researchers, students, and technicians who are
interested in utilizing the latest advances in spatial transcriptomics and data analysis tools to
advance their knowledge of cellular biology. Whether you are a seasoned scientist or a
newcomer to the field, this workshop provides a unique opportunity to expand your skills and
gain a deeper understanding of this fascinating subject.

The workshop is organized in two parts:
1. Introduction to spatial transcriptomics, background, and technology
2. Hands-on data analysis using SquidPy, including: annotation, clustering, visualization and enrichment analysis.

Technical Requirements:
The workshop does not require any prior knowledge, but participants are assumed to feel
comfortable programming using Python for basic tasks. Attendees have to bring their
own laptops.

Github link for materials: https://github.com/sonjaha/workshop_spatial
Please check the link above one week before the workshop for further announcements.

WS3) Standardizing and harmonizing NGS analysis workflows

Organizers: Dr. Christian Mertes (Technical University of Munich, Workflow Coordinator for GHGA); Dr. Kübra Narci (German Cancer Research Center); Dr. Florian Heyl (German Cancer Research Center); Dr. Julia Philipp (European Molecular Biology Laboratory, Training Coordinator for GHGA)

Description: With increasing numbers of human omics data, there is an urgent need for adequate resources for data sharing while also standardizing and harmonizing the processing of the data. Within the federated European Genome-Phenome Archive (EGA), the German Human Genome-Phenome Archive (GHGA) strives to provide (i) the necessary secure IT-infrastructure for Germany, (ii) an ethico-legal framework to handle omics data in a data-protection-compliant but open and FAIR (Findable, Accessible, Interoperable, and Reusable) manner, (iii) harmonized metadata schema, and (iv) standardized workflows to process the incoming omics data uniformly.

GHGA is aiming to be more than an archive. GHGA will build on cloud computing infrastructures managed in a network of data generators. Based on GA4GH and FAIR standards, researchers will have controlled access to raw and processed sequencing data using recognized analysis workflows. For this, GHGA is developing and standardizing workflows for data analysis, benchmarking, statistical analysis, and visualizations in collaboration with the research community, particularly with the nf-core community. In such an international environment, it is essential to follow the principles of continuous integration and deployment (CI/CD) to test and benchmark the workflows with synthetic and experimental datasets such as CHM cell lines and Genome in a Bottle (GiaB) in order to serve the highest quality of the developed workflows both on technical and biological sides. Thus, through harmonized data processing, GHGA will enable cross-project analysis and hence promote new collaborations and research projects.

In this tutorial, we will explore how FAIR principles enable the standardization and
harmonization of nf-core-based NGS analysis workflows within GHGA. We will start with a short introduction of what GHGA is, its needs for harmonized workflows in the international context, and an overview of the FAIR data principles. We will then demonstrate the adaptability of nf-core workflows and discuss the importance of standardization of workflows. Finally, we will demonstrate how to make workflows scalable, robust, and automated for continuous
benchmarks with hands-on exercises using a subset of a public dataset with a variety of
configurations like local and cloud settings.

Learning Objectives for Tutorial:
● FAIR principles and their relevance for workflow standardization and harmonization
● The adaptability, portability and scalability of workflows within nf-core
● Automating robust workflows to ensure reproducibility and highest quality of code

WS4) How to interpret multi-omics single-cell and spatial transcriptomics data (Visium, 10x, MERFISH) with R and Python (Advanced level)

Organizers: Tore Bleckwehl; Rafael Kramann; Sikander Hayat (Institute of Experimental medicine and systems Biology, Uniklinik Aachen)

Description: Technological advances in sequencing technologies have made it possible to generate multi-omics data from diverse modalities at single-cell resolution. These modalities such as single-cell transcriptomics, ATAC-seq, and spatial transcriptomics capture different aspects of the underlying biological system in healthy and disease states. However, optimally integrating, analyzing, and interpreting such datasets brings up unprecedented challenges.
Our team has generated several multi-modal disease datasets^1–3 and developed many computational tools^4–7 to tackle these issues in a scalable, time and computational resource efficient manner. Additionally, biological interpretability is critical to leveraging the ever-increasing multi-modal data landscape.
In our tutorial, we propose to present a set of computational pipelines and machine learning tools to efficiently process, integrate, automatically annotate, identify disease relevant signatures, and interactively visualize large scale multi-omics datasets. We will focus on analyzing single-cell transcriptomics, 10X Visium and high resolution in-situ spatial transcriptomics data. We will showcase data analysis pipelines using our Human myocardial infarction spatial atlas consisting of single-cell RNA-seq, ATAC-seq and spatial transcriptomics, and additional data obtained from fibrosis models of heart and kidney diseases. Finally, we will highlight the utility of our computational framework to better understand cardiometabolic diseases and identify potential targets.

^{1. Schreibing, F. et al. Dissecting CD8+ T cell pathology of severe SARS-CoV-2 infection by single-cell immunoprofiling. Front. Immunol. 13, 1066176 (2022).}
^{2. Kuppe, C. et al. Spatial multi-omic map of human myocardial infarction. Nature 608, 766–777 (2022).}
^{3. Hoeft, K. et al. Platelet-instructed SPP1+ macrophages drive myofibroblast activation in fibrosis in a CXCL4-dependent manner. Cell Rep. 42, 112131 (2023).}
^{4. Xu, Y., Kramann, R., McCord, R. P. & Hayat, S. MASI enables fast model-free standardization and integration of single-cell transcriptomics data. Commun Biol 6, 465 (2023).}
^{5. Xu, Y., Baumgart, S. J., Stegmann, C. M. & Hayat, S. MACA: marker-based automatic cell-type annotation for single-cell expression data. Bioinformatics 38, 1756–1760 (2022).}
^{6. Jain, D. et al. SciViewer- An interactive browser for visualizing single cell datasets. Preprint at https://doi.org/10.1101/2022.02.14.480435.}
^{7. Li, X., Chakraborty, J., Jameson, M. & Moore, H. Minuteman–A versatile cloud computational platform for collaborative research. bioRxiv (2023).}

WS5) Federated Ensemble Learning for Biomedical Data

Organizers: Prof. Dr. Anne-Christin Hauschild; Hryhorii Chereda; Youngjun Park; Maryam Moradpour (University Medical Center Göttingen)

Description: The digital revolution in healthcare, fostered by novel high-throughput sequencing technologies and electronic health records (EHRs), transitions the field of medical
bioinformatics towards an era of big data. While machine learning (ML) have proven to be
advantageous in such settings for a multitude of medical applications, they generally depend
on a centralization of datasets. Unfortunately, this is not suited for sensitive medical data,
which is often distributed across different institutions, comprises intrinsic distribution shifts
and cannot be easily shared due to high privacy or security concerns.

Initially proposed by Google in 2017, Federated learning, allows the training of machine
learning models on geographically or legally divided data sets without sharing sensitive data.
When combined with additional privacy-enhancing techniques, such as differential privacy or
homomorphic encryption, it is a privacy-aware alternative to central data collections while still
enabling the training of machine learning models on the whole data set.
However, in such federated settings, both infrastructure and algorithms become much more
complex compared to centralized machine learning approaches.
Some of the most intuitive implementations rely on ensemble learning approaches, where
only the model parameters are transferred. For example, we can exchange split values of tree
nodes as in federated random forest or combine local subgraph-based graph neural network
(GNN) models into a global federated Ensemble-GNN.

We will first introduce the theory of federated ensemble learning in general and federated
graph neural networks in particular using real world examples in Python. The participants will
employ the acquired knowledge with the help of published tools, federated random forest and
Ensemble-GNN. These tools allow for implementing and executing federated algorithms in a
federated setting and will be used to provide the participants with a practical hands-on
experience, implementing a prediction model on biomedical health data. In summary, this
tutorial will provide the participants with both theoretical and practical expertise about
federated ensemble learning and privacy-enhancing techniques in the context of biomedical
health care data, including implementation and deployment in a federated system.

Target audience: Bioinformaticians, Data scientists, Medical informaticians

WS6) Bioinformatics education

Organizers: Jan Grau (MLU Halle); Stefan Kurtz (ZBH Hamburg); Kay Nieselt (University Tübingen); Sven Rahmann (Universität des Saarlandes); Ralf Zimmer (LMU München)

Description: This workshop shall bring together people involved in bioinformatics education. Currently, bioinformatik.de lists 38 B.Sc. programs with prominent bioinformatics contents (14 with bioinformatics as major topic) and 35 M.Sc. programs (17 bioinformatics major) in Germany. These programs put varying emphasis on certain bioinformatics topics and skills, and have different access requirements. In addition, there are initiatives to bring bioinformatics contents to schools, for instance in specific summer schools or by training teachers in bioinformatics topics. In this workshop, we will present some examples of bioinformatics programs (B.Sc. and M.Sc.) in Germany. Participants are welcome to present about programs or initiatives they are involved in.

Having a broader overview of commonalities and specifics of different bioinformatics
programs, we would like to discuss with all participants, for instance,

What might be considered “essential” topics and skills in bioinformatics?
How can/should we balance basic topics (biology, chemistry, mathematics, computer
science), practical skills (programming, application of tools), and original bioinformatics
topics on the B.Sc. and M.Sc. level?
What are participants’ perspectives and experience with students starting an M.Sc. in
bioinformatics having different educational backgrounds (biology, computer science,
bioinformatics) on the B.Sc. level?
What is and should be the role of Ph.D. students and PostDocs in bioinformatics
education?
How could bioinformatics topics be implemented in school education?
What are the implications of KI models (GPT et al.) for teaching, and exercises in
particular?

Target audience: Persons involved in bioinformatics education on the organizational and practical level.

Provisional schedule:

10 am– 11 am	Presentation of B.Sc. bioinformatics (and closely related) programs (Ralf Zimmer, Sven Rahmann, Stefan Kurtz, Jan Grau, participants)
11 am – 12 pm	Joint discussion: what makes a good bioinformatics B.Sc. program? How do we differentiate between bioinformatics programs and bioinformatics-related programs (computer/data science, biology) with a minor in bioinformatics? (all participants)
12 pm – 1 pm	Lunch break
1 pm – 2 pm	Presentation of M.Sc. bioinformatics programs (Ralf Zimmer, Sven Rahmann, Stefan Kurtz, Jan Grau, participants)
2 pm – 3 pm	Joint discussion: what makes a good bioinformatics M.Sc. program? Whatare requirements for starting an M.Sc. in bioinformatics and how do we handle different backgrounds on the B.Sc. level? (all participants)
3 pm – 3.30 pm	Coffee break
3.30 pm – 4 pm	Presentation of initiatives to bring bioinformatics education to schools/pupils (Jan Grau, participants)
4 pm – 4.30 pm	Joint discussion: What are the pros and cons of bioinformatics education in schools, and what topics could/should be covered in which manner?

WS7) Constrained-based modelling in Python applied to plant systems

Organizers: Stefano Camborda; Tiago Machado; Nadine Töpfer (University of Cologne)

Description: Part I: Introduction to Flux Balance Analysis and Metabolic Network Modeling

Metabolic network modeling can help us understand constraints and capacities in metabolic systems and guide metabolic engineer attempts. Starting from a generic model of plant metabolism, we introduce the concept of stoichiometry-based metabolic models. We further introduce flux-balance analysis (FBA) as a technique to simulate flux distributions in metabolic networks. To accomplish this, we introduce and use COBRApy as an in silico workbench for constraint-based metabolic network modeling and simulation in python.

Part II: Flux simulation

Simulation of in vivo conditions in a constraint-based modeling framework can be achieved through the manipulation of flux boundaries in a given metabolic model and by adding additional constraints. This allows the exploration of flux solutions for a wide range of environmental and biochemical conditions. In this session, we introduce a range of constraint-based modeling techniques which employ these manipulations, and we demonstrate how to apply them to questions related to plant metabolism.

Part III: Advanced modeling and network extensions

Network curation tools enable the curation and extension of pre-existing metabolic models.
Here, we use CobraMod, a tool for pathway-centric curation of metabolic models built upon the COBRApy toolbox. We demonstrate its utility to create multi-cellular network models of plant metabolic systems in a systematic and reproducible manner.

WS8) Build your own interactive, online network-based drug repurposing application using Drugst.One

Organizers: Michael Hartung; Andreas Maier (University of Hamburg)

Description: Drugst.One is a customizable plug-and-play solution integrating multiple interaction databases to enable interactive modeling and analysis of the associations between proteins, drugs, and diseases. With the capacity to convert any systems medicine software into an interactive web tool for identifying drug repurposing candidates with just three lines of code, Drugst.One provides a unique, powerful, yet accessible resource for advancing drug discovery efforts. Drugst.One is freely available at https://drugst.one.

Requirements: Bring your own laptop with wifi.

WS9) FeatureCloud: privacy-aware federated learning in biomedicine
Organizers: Mohammad Bakhtiari, Mohammad Mahdi Kazemi, Julian Klemm, Niklas Probul
Description: AI holds great promise for the advancement of biomedicine, but it requires access to large amounts of data, which is hampered by privacy laws such as the General Data Protection Regulation (GDPR). Federated learning, as a potential solution to this dilemma, has attracted much attention in recent years. In federated learning, data remains within its secure, original location, like a clinic, and the AI models are trained locally. The non-identifying model parameters are then aggregated into the global model. As such federated learning holds the power to train effective AI models on big data while complying with privacy regulations.

FeatureCloud, a platform for federated learning that was developed as part of the Horizon 2020 project of the same name, offers a wide range of solutions for applying federated learning in real-world biomedical scenarios. FeatureCloud provides a user-friendly, secure, and privacy-aware platform that facilitates federated collaborations via transparent workflows and handling of data communication in a secure manner. It also provides additional privacy-enhancing technologies to further increase the privacy- level based on user needs. At the same time, FeatureCloud simplifies app development for federated learning tools. The final apps are published in the app store (https://featurecloud.ai/) and certified by the development team. To further facilitate this, we recently added support for external app validation. This allows independent, authorized companies to certify FeatureCloud apps with regard to patient privacy, legal aspects and code safety.

Supported by

Institute for Computational Systems Biology

German Network for Bioinformatics Infrastructure - de.NBI

International Society for Computational Biology - ISCB

Syte – Strategy Institute for Digital Health