The Fourth Paradigm: How Big Data is Changing Science
This talk will describe how science is changing as a result of the vast amounts of data we are collecting from gene sequencers to telescopes and supercomputers. This “Fourth Paradigm of Science,” predicted by Jim Gray, is moving at full speed, and is transforming one scientific area after another. The talk will present various examples on the similarities of the emerging new challenges and how Jim Gray’s vision is realized by the scientific community. Scientists are increasingly limited by their ability to analyze the large amounts of complex data available. These datasets are generated not only by instruments but also computational experiments; the sizes of the largest numerical simulations are on par with data collected by instruments, crossing the petabyte threshold this year. The importance of large synthetic datasets is increasingly important, as scientists compare their experiments to reference simulations. All disciplines need a new “instrument for data” that can deal not only with large datasets but the cross product of large and diverse datasets. There are several multi-faceted challenges related to this conversion, e.g., how to move, visualize, analyze, and in general interact with petabytes of data.
This talk will describe how science is changing as a result of the vast amounts of data we are collecting from gene sequencers to telescopes and supercomputers. This “Fourth Paradigm of Science,” predicted by Jim Gray, is moving at full speed, and is transforming one scientific area after another. The talk will present various examples on the similarities of the emerging new challenges and how Jim Gray’s vision is realized by the scientific community. Scientists are increasingly limited by their ability to analyze the large amounts of complex data available. These datasets are generated not only by instruments but also computational experiments; the sizes of the largest numerical simulations are on par with data collected by instruments, crossing the petabyte threshold this year. The importance of large synthetic datasets is increasingly important, as scientists compare their experiments to reference simulations. All disciplines need a new “instrument for data” that can deal not only with large datasets but the cross product of large and diverse datasets. There are several multi-faceted challenges related to this conversion, e.g., how to move, visualize, analyze, and in general interact with petabytes of data.
Presentation
Upcoming Events
- Pre-Cancer Atlases of Cutaneous and Hematologic OriginMarch 23, 2023Cancer Artificial Intelligence (AI) Research: Computational Approaches Addressing Imperfect DataApril 03, 2023 - April 04, 20232023 Co-Clinical Imaging Research Resource Program (CIRP) Annual Virtual MeetingMay 03, 2023 - May 04, 2023