Poster Session

Start Date

6-4-2011 1:00 PM

End Date

6-4-2011 2:00 PM

Description

OBJECTIVE: Collaborating with researchers and curators from The Harvard School of Public Health Bioinformatics Core (HSPH/HBC) to annotate experimental descriptions and data sets.

METHODS: ISATab is an open source software suite that can be used to annotate and apply metadata to experimental data. HSPH/HBC curators create ISATab records tying together information from PubMed papers and associated data sets (GEO files). Curators annotate and describe both raw and derived data files for each investigation, as well as supplying metadata for the investigation as a whole. Once annotated, the records are validated and sent to an internal data management system.

RESULTS: As of Jan. ’11, HBC has collected over 50 annotated public studies comprising 900+ assays in their internal data management system. The ultimate goal is to make curated, metadata-enriched data sets openly available in public repositories, allowing for further data analysis & integration.

CONCLUSIONS: Researchers & curators in this group grapple with many of the same issues around data curation and discovery that librarians do. For example, how much metadata is adequate to ensure discovery, and where’s the sweet spot between too much and too little? Where are ontologies necessary? Do all experiments comprising a published work need to be described, or just a selection? My experiences working as a curator with HSPH/HBC have given me some good insights into how librarians can be involved in e‐science in ways that can benefit all concerned.

Keywords

e-science, data curation, bioinformatics, public health, ISA tools

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Share

Import Event to Google Calendar

COinS
 
Apr 6th, 1:00 PM Apr 6th, 2:00 PM

Description and annotation of biomedical experimental data sets: work in progress

OBJECTIVE: Collaborating with researchers and curators from The Harvard School of Public Health Bioinformatics Core (HSPH/HBC) to annotate experimental descriptions and data sets.

METHODS: ISATab is an open source software suite that can be used to annotate and apply metadata to experimental data. HSPH/HBC curators create ISATab records tying together information from PubMed papers and associated data sets (GEO files). Curators annotate and describe both raw and derived data files for each investigation, as well as supplying metadata for the investigation as a whole. Once annotated, the records are validated and sent to an internal data management system.

RESULTS: As of Jan. ’11, HBC has collected over 50 annotated public studies comprising 900+ assays in their internal data management system. The ultimate goal is to make curated, metadata-enriched data sets openly available in public repositories, allowing for further data analysis & integration.

CONCLUSIONS: Researchers & curators in this group grapple with many of the same issues around data curation and discovery that librarians do. For example, how much metadata is adequate to ensure discovery, and where’s the sweet spot between too much and too little? Where are ontologies necessary? Do all experiments comprising a published work need to be described, or just a selection? My experiences working as a curator with HSPH/HBC have given me some good insights into how librarians can be involved in e‐science in ways that can benefit all concerned.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.