News

REPORT on 5th ETC Auditorium with Dr. Juan Antonio Vizcaino

29 Jun 2023 10:57 AM | Anonymous

Open data practices in proteomics: the why, the how and the what for?

The goal of the ETC Auditorium "Stylish Academic Writing" professional development webinar series is to help students and trainees improve their scientific writing skills. The 5th webinar featured Dr. Juan Antonio Vizcaino, Proteomics Team Leader at the European Bioinformatics Institute (EMBL-EBI), as the lecturer. The discussion focused on the latest trends in open data practices in proteomics. Dr. Deepti Jaiswal Kundu, a Scientific Curator at the PRIDE database (EMBL-EBI) served as the host and Dr. Tiannan Guo, a Tenured Associate Professor at Westlake University, participated as a panelist.

This webinar focused on the advantages of open data sharing in proteomics and its potential to drive research, collaboration, and innovation. Dr. Vizcaino highlighted the importance of data repositories, such as PRIDE, MassIVE, JPOST, iProX, and Panorama, in facilitating open data sharing. The webinar emphasized the FAIR data principles (Findable, Accessible, Interoperable, and Re-usable) and encouraged researchers to contribute their proteomics data and promote collaboration within the scientific community. Inspiring examples of data re-use were showcased, along with the bottlenecks associated with utilizing public proteomics data, such as data complexity and lack of metadata annotation. The webinar stressed the significance of proper metadata documentation and introduced the Sample Description and Result Format (SDRF) file format to improve metadata annotation and enable meaningful re-use of proteomics data. Lastly, the challenges of data privacy, intellectual property rights, data standardization, and data curation were addressed, with strategies and recommendations provided to promote responsible and effective sharing of proteomics data.

Dr. Vizcaino also addressed some live questions (Q & A):

1. Q: How do you envision the future of the data repositories, knowing that data sets are now containing more and more samples, like single cells or instruments producing large data?

A: It’s a continuous struggle that usually happens. We keep on innovating in terms of infrastructure in terms of keeping and dealing with large data. We take the experience /suggestions from other EBI repositories (e.g. those devoted to DNA/RNA sequencing data) about how they manage large datasets.

2. Q: The availability of sample metadata is crucial for identifying samples which are an important aspect of open data practices. A single large amount of data in PRIDE have metadata missing in them. So, what are your suggestions for the submitters in PRIDE.

A: When ProteomeXchange started on that time the emphasis was put on data provision. But now it is more and more clear the necessity of more metadata. And that’s why the SDRF annotation has been proposed which is now promoted by PRIDE to submitters.

3. Q: Many journals have dedicated data availability sections where accessions are mentioned. Have you considered using these top detect publications with PRIDE id?

A: We do that. We check the abstract of the publication and the full text (in the case of open-access journals).

4. Q: Are there any options in PRIDE to keep the RAW data private even if the article is public?

A: The policy of ProteomeXchange is as soon as publication is out, the data needs to be public. There are some exceptional cases when data is sensitive like clinical data. We do not have any mechanism yet for the controlled access data but in the future, we might have to include it.

5. Q: Some data can be re-used, do you have suggestions on what type of data has been re-used?

A: Datasets that are re-used the most are the ones that are more scientifically relevant, e.g. those published in high-profile journals. A second criterium includes those which are annotated better, where people don’t have to work on those.

A full video recording of the session including the talk and Q&A session is available on the HUPO YouTube channel website (https://www.hupo.org/Webinars-and-Virtual-Presentations).

For those with no access to YouTube, an alternative link is: https://www.bilibili.com/video/BV1Ph411c7ij/?spm_id_from=333.788.recommend_more_video.-1&vd_source=052ff6e1ca06b197e00a9a80affeda05.