Data Sharing and Analysis in Collaborative Open Research Environments
Highlights of ICOR Public Meeting #4
“Data Sharing and Analysis in Collaborative Open Research Environments” held May 1, 2024
Hosts: Kristen Ratan and Michael Markie
Please see the streaming video and chat record for rich details from speakers and meeting attendees
Community members: suggest topics or volunteer to host future public meetings on this Google Form
Introduction
A major barrier to collaboration amongst research groups is lack of data sharing in the early stages of creation. This is arguably the optimal time to get expert feedback and analysis to maximise the utility of the data and point toward alternative interpretations. In today’s world the act of doing research (generating data) culminates with communicating research (publishing), two distinct processes that should be connected (and made seamless) through collaboration.
This meeting explored how collaborative research environments can help connect the generation and communication of early findings. The data we produce have enormous value, but that value is only achieved if it can be mined in different ways by multiple groups. The goal is to create spaces where teams can discuss, share and combine data from different instruments and multiple digital spaces. Collaborative open research environments can help us rethink how data generation and dissemination intersect and how they can be more reciprocal in the future.
The meeting hosted three talks that covered cutting edge cloud technologies, data infrastructures, and research initiatives that are creating a collaborative culture around data sharing. In each talk, the speakers described their experience with environments that build and strengthen research communities.
2i2c: A global network of community hubs for interactive learning and discovery
Chris Holdgraf, Executive Director, 2i2c [slides; streaming video 7-39 minutes]
The International Interactive Computing Collaboration (2i2c) is an open source computing platform providing communities with a digital home to collaborate, share and analyse data. Chris explained that in a data driven world, interactive computing environments are essential for creating and sharing knowledge. 2i2c enables research communities to design a cloud-based computing hub for their research workflows and provides frictionless access to tools and interactive interfaces for exploring data and creating new ideas. These hub environments can be easily shared, and 2i2c’s combined global network has enabled community growth and enhancements to spread rapidly. Chris shared an overview of the impact and journey of 2i2c over the past three years, explaining how they have grown a collaborative network of more than 80 communities and 6,000 monthly users.
2i2c is an excellent example of an organisation supporting and creating cloud-based data sharing environments for collaboration. One impactful case study of a 2i2c hub is a recent collaboration of 2i2c and Stratos working alongside the Frank lab at HHMI. Together, they are developing the Spyglass framework which provides virtual representations of the entire work of the lab, enabling others to compare or combine their own data as well as mine the data for new findings.
Data commons and mesh infrastructure: Using Gen3 to share and analyse biomedical data
Michael Fitzsimons and Aarti Venkat from the Center for Translational Data Science (CTDS), University of Chicago [slides; streaming video 40-68 min]
Effective collaboration requires an environment where data is structured, has rich metadata, and has options to analyse the data using a variety of commonly used, interoperable tools that are easily accessible. To create such an environment the CTDS has developed a cloud-based, open-source software platform, Gen3, which allows communities to build data commons and data meshes enabling the sharing and analysis of large or complex biomedical data. Michael explained how a data mesh combines multiple data commons, data repositories, computing resources, and other applications that can interoperate using a common set of framework services. Aarti followed up by providing a detailed case study and demo of the Biomedical Research Hub (BRH), a cloud-based data mesh built to search and discover across multiple data commons, to facilitate managing, analysing, and sharing patient data. The BRH provides access to over 6 PB of data from over 400,000 research participants. The system allows each resource that shares patient data to operate its own governance rules, allowing different platforms to seamlessly interoperate when authorised.
The BRH case study is a great example of a cloud-based environment providing a structured and governed service for generating and assessing findable, accessible, interoperable, and reusable (FAIR) patient data across multiple sources.
The UCSF Bakar ImmunoX initiative: A collaborative research model
Vincent Chan, Director of ImmunoX Office of Collaborative Research [slides; streaming video 68-92 min]
ImmunoX has a unique approach to performing research with collaboration at its core. Its vision is to embrace connected science through a collaborative community to build immune profiles for untapped streams of human diseases. To achieve this, the initiative has designed a novel concept called CoProjects, a series of investigator-led shared projects that takes advantage of world-class staff and technologies that reside in cutting-edge “CoLabs” environments. Vincent demonstrated the entire collaborative process through an example of a clinician-scientist who contributed her human specimens from the clinic and was able to enter an established data pipeline spanning multiple connecting Colabs. The result was large sets of data that had been carefully curated and stored in the UCSF data library, which could be instantly analysed by the clinician and then released to the larger UCSF community after a set embargo period. The ImmunoX “data sharing trust” model is an example of rapid, collaborative science that maximises the value of large datasets for all those with access to the UCSF data library.
Vincent further demonstrated how the UCSF Office of Collaborative Research is a strategic hub for fostering collaborations that bridge the clinic to the lab. This collaborative environment enables ImmunoX to maximise its resources by producing valuable specimens, with open access to vast swaths of data across diseases.
Summary
The meeting sparked much conversation with real-world examples of collaborative research environments that are changing the way scientists generate and share their data. We encourage the ICOR community to share more case studies (via our “Submit your Project” form) on collaborative spaces that address the larger objective of finding innovative solutions through harmonising the conduct and communication of research.
See the streaming video and chat record for more rich details from these speakers and meeting attendees
Community members: suggest topics or volunteer to host future public meetings on this Google Form
Feature photo by Marvin Meyer on Unsplash