3:30 pm Main Conference Registration

4:40 Welcoming Remarks

Stan Gloss, Founding Partner, Chief Executive Officer, BioTeam, Inc.

4:45 Moderator’s Remarks

Allison Proffitt, Editorial Director, Bio-IT World & Clinical Informatics News

4:50 OPENING KEYNOTE PRESENTATION: Convergence, Culture, and the Acceleration of Cancer Research

Matthew Trunnell, Vice President and Chief Information Officer, Fred Hutchinson Cancer Center

As we seek to incorporate larger and more diverse data into cancer research--and to shorten the effective distance between research and clinic--we face issues of interoperability at levels ranging from technical to cultural. These issues have drawn the attention of the Vice President’s Cancer Moonshot program, which has raised both the level of urgency and the level of opportunity in achieving new levels of collaboration. This talk will discuss the establishment of the Hutch Data Commonwealth, a novel organization within the Fred Hutch Cancer Center being established to accelerate the convergence of capabilities and competencies in data science that will propel the work of developing cancer cures and preventions.

5:50 Exascale Opportunities for Healthcare

Patricia Kovatch, Associate Dean for Scientific Computing, Mount Sinai School of Medicine

High performance computing has already been enlisted in the quest to better understand, diagnose and treat human disease. Through the expert guidance of computational scientists and advanced computing and data analytic infrastructures, advances have been made in such areas as drug discovery and genomic sequencing. However, enormous scientific challenges lie ahead to realize the promise of personalized medicine. To achieve personalized medicine’s full potential, commensurate advances to reach exascale also need to be made. This talk will outline the specific scientific challenges and impacts for three areas of medicine: personalized cardiac therapy, precision medicine and real-time accurate imaging diagnosis. Then it will discuss the limitations of existing HPC along with the expected computational and data parameters and new capabilities needed for each of these areas in 2025.

6:20 Welcome Reception with Exhibit Viewing

7:20 Close of Day


7:30 am Continental Breakfast


8:00 Welcoming Remarks

Stan Gloss, Founding Partner, Chief Executive Officer, BioTeam, Inc.

8:05 Moderator’s Remarks

Allison Proffitt, Editorial Director, Bio-IT World & Clinical Informatics News

8:10 KEYNOTE PRESENTATION: A Space Odyssey: One Decade of Scaling Research Computing from 200 to Over 60,000 Processors

James Cuff, Assistant Dean and Distinguished Engineer for Research Computing, Harvard University

Since 2006, Harvard University has been scaling their computing environment to support the demands and requirements of their advanced scientific research. They have seen an unprecedented growth in research storage from 20TB to over 55,000TB. This isn’t slowing down any time soon. For example, additional demands for GPGPU computing have forced scaling to over 1.4 million CUDA cores. Harvard was not alone. Four other research intensive universities in the North East (Boston University, MIT, Northeastern and the University of Massachusetts) alongside Harvard, the state government, and private industry all got together to build a state of the art LEED Platinum data center dedicated to research computing. James will tell the story of their voyage, how they got to where they are today, and what lies in the future of converged data, compute and the people and skill sets needed to continue to support the world’s very best research, science and scholarly output.

9:10 Cyberinfrastructure Architecture: Designing a Framework for Science Progress

Dan Stanzione, Ph.D., Executive Director, Texas Advanced Computing Center, The University of Texas at Austin

Modern Cyberinfrastructure is a large collection of enormously complicated parts and partially overlapping disciplines. Successfully delivering science results requires not only combining bioinformatics, algorithms, data, data integration, libraries, APIs, storage, cloud, and high performance computing, but combining them in some systematic way. This talk will examine lessons from TACC in evolving an architecture and ecosystem for cyber-enabled solutions to modern large scientific challenges, at the level of systems, software, and most importantly, people. Examples will be pulled from the iPlant/Cyverse project, the Araport information resource, the DesignSafe CI, and other projects, and organizing principles that can be extracted across these projects. In addition, some recent data will be included on the deployment of a new large scale supercomputer using Intel’s latest many core technology, and early experiences exploiting the huge numbers of available cores on scientific applications.

9:40 Science Gateways and Today’s Research Landscape

Nancy Wilkins-Diehr, Associate Director, San Diego Supercomputer Center

Science gateways, also known as web portals, virtual research environments, and virtual laboratories, are a fundamental part of today’s research landscape. In this talk, I will provide several examples of science gateways that are having a tremendous impact on how research is conducted. I will highlight major NSF investments such as the Science Gateways Community Institute, which will further the development of sustainable gateways. Finally, I will highlight international activities such as the recently launched International Coalition on Science Gateways.

10:10 Coffee Break with Exhibit Viewing


10:45 Harvesting Knowledge: Opportunities & Challenges for Big Data to Address Food Security Sustainably

Doreen Ware, Computational Biologist, USDA Agricultural Research Service

Technology advances in sequencing and imaging has ended the “data drought”. It is now possible to generate sequence-based reference models and molecular and phenotypes for any species. While these technologies have ended the drought, they have created other challenges associated with utilization of the information that prevents the maximum insight that could be gained to support increasing gains in production, through improved germplasm, management practices and education. In this this talk I will discuss recent work in genome sequencing and comparative genomics, focusing on data stewardship, infrastructure, and cultural changes that will be needed to address global challenges of food security.

11:15 High Performance Analysis Tools for Whole Genome Sequencing: Integrating Systems from the Bench to the Clinic

Shawn Levy, Faculty Investigator, HudsonAlpha Institute for Biotechnology

The field of genomics continues to rapidly evolve in the scale and scope of data production and resolution. These technological developments have resulted in significant computing and storage challenges. This presentation will discuss the lessons learned in the development of a distributed compute and storage system to enhance performance for research and clinical whole-genome sequencing.

11:45 Scientific Computing in the Cloud: Speeding
Access for Cancer Drug Discovery

Bret Martin, Principal Research Computing Architect,
Data Science and Information Technology, H3 Biomedicine

Jeff Tabor Sr., Director, Product Management and Marketing, Avere Systems

H3 Biomedicine has built a cloud infrastructure that reduces latency and provides storage flexibility, and does so in a way that helps save money and support their business strategy. H3 Biomedicine will discuss cloud technology and cloud services that have enabled application migration to the cloud in a hybrid IT environment.

Aspera 12:15 pm Luncheon Presentation: Time for Better Things: How to Spend Less Time Transferring Research Data

Charles Shiflett, Senior Software Engineer, Cloud, Aspera, an IBM Company

In this presentation, you’ll learn how Aspera can be used to enable faster movement of genomic research-enabling researchers to spend more time working with data. Charles will also explore new ways Aspera can be used to support file and streaming data transfers, and review on-premise and cloud models for sharing, collaborating, and distributing genomic data.

1:00 Dessert Break with Exhibit Viewing

1:30 Roundtable Discussions I

Join one of these interactive sessions designed to provoke thought and discussion in connection with specific topics facing IT and Life Science professionals. Each group will have a moderator to ensure focused conversations around key issues within the topic. This small group format allows participants to informally meet potential collaborators, share examples from their own work and discuss ideas with peers.Discussion topics may include:

  • IT Organizational Challenges
  • Molecular Modeling
  • Collaborative Science
  • Cybersecurity
  • Data Management Solutions
  • Infrastructure
  • Data Centers
  • Science Gateways
  • Networking
  • Cloud


2:15 Chairperson’s Remarks

2:20 Wearable Sensors in Applied Digital Health Research

Job G. Godino, Ph.D., Assistant Professor, Center for Wireless and Population Health Systems; Director, Research and Applied Technology, Exercise and Physical Activity Resource Center, Calit2's Qualcomm Institute, University of California, San Diego

Mobile and wearable technologies that capture data and contextualize health behaviors in real-time provide a mechanism through which theory-based interventions can be personalized and widely disseminated. Furthermore, they empower individuals to collect large amounts of personal health data, which, if appropriately leveraged, has the potential to transform the conduct of epidemiological research and improve clinical practice. However, before these goals can be achieved, there is much work to be done to fully understand the validity of data generated by mobile and wearable technologies. Additionally, an increase in the volume and intensity of clinically relevant labeled data is required to precisely automate sensor based treatment decision aids. In my presentation, I will discuss how my group is working to address the issues described above, and examples will be drawn from a large longitudinal study of healthy children and a small longitudinal study of pancreatic cancer patients.

2:50 Data Lakes, Ponds, Pools and Wetlands – and Channeling the Data Canals

Don Preuss, Chief Technology Officer, Starfish Storage

Over recent years, big data has made a big splash, with alarms and warnings of the big data deluge. Countless talks on the three, four, five Vs of big data, Internet of Things, and streaming data have been presented. Articles have been written and research on the effects of big data pronounced. It makes for great copy. And – it’s true. As the amount of data has increased, what used to be limited to the very top tier of organizations, like the Library of Congress, NIH, Facebook, and Google, is now a reality for many groups. Due to the growth of instruments generating large amounts of data, the ease of creating new video content, and the reduction in the price of storage, more organizations have petabytes of data, with billions of files, whether on site or remote. Managing this ensemble of data is time-consuming, risky, expensive, and usually boring. What is simple to do with 1 million files gets complex as you approach 1 billion. The outcome is that people try and ignore the problem, letting files accumulate and buying more storage. How do we handle this quantity of data?

General Atomics 3:20 Meeting the Storage Challenge of Exponential Genomics Data Growth 

Robert Murphy, Big Data Program Manager, Magnetic Fusion Energy, General Atomics

This presentation describes General Atomics' hardware agnostic data aware storage software, Nirvana, that implements intelligent, automated tiering across flash, parallel file systems, NAS, object and cloud storage. With Nirvana, genomics data can be moved across multiple storage tiers, reducing costs 75%. Nirvana also provides metadata extraction and search capabilities through an intuitive graphical user interface, making genomics data easy to find and manage throughout its lifecycle - while ensuring genomics data provenance across global workflows.

3:50 Transition to Buses

4:30 Reception and Tour at San Diego Supercomputer Center

Visit the San Diego Supercomputer Center, a leader in advanced computation and all aspects of “Big Data”, for a complimentary reception and facility tour. With its two newest supercomputers, a data-intensive system called Gordon, and Comet, a petascale system entering production in 2015, SDSC is a partner in XSEDE (eXtreme Science and Engineering Discovery Environment), a National Science Foundation (NSF) program that comprises the most advanced collection of integrated digital resources and services in the world. SDSC has also pioneered advances in data storage and in cloud computing, and now houses several “centers of excellence” in the areas of large-scale data management, predictive analytics, health IT services, workflow automation, and Internet analysis.

Transportation between the conference venue and SDSC will be provided for all registered attendees.

6:00 Close of Day


7:15 am Breakfast Presentation (Sponsorship Opportunity Available) or Morning Coffee


8:00 Moderator’s Remarks

Allison Proffitt, Editorial Director, Bio-IT World & Clinical Informatics News

8:10 KEYNOTE PRESENTATION: Trends from the Trenches

Chris Dagdigian, Co-Founder and Principal Consultant, BioTeam, Inc.

Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.

9:10 The NIH Data Commons – Digital Ecosystems for Using and Sharing Biomedical FAIR Data at Scale

Vivien Bonazzi, Ph.D., Senior Advisor for Data Science Technologies, National Institutes of Health

The challenges of using biomedical big data are now blocking scientists’ ability to do research and to replicate and build on previous work. We need to consider a digital ecosystem approach where biomedical big data is the central currency that can be easily accessed, shared and reused by others. The Data Commons is a platform that allows producers and consumers of scientific data to connect, interact, exchange, create value and generate new discoveries, creating the basis for a digital ecosystem that can support scientific discovery in the era of biomedical big data.

10:00 Coffee Break with Exhibit Viewing

10:30 Roundtable Discussions II

Join one of these interactive sessions designed to provoke thought and discussion in connection with specific topics facing IT and Life Science professionals. Each group will have a moderator to ensure focused conversations around key issues within the topic. This small group format allows participants to informally meet potential collaborators, share examples from their own work and discuss ideas with peers. Discussion topics may include:

  • Data Centers
  • Scientific Analytics
  • Cloud
  • Data Science
  • Storage
  • Data Transfers
  • Science Gateways
  • Cybersecurity

11:30 pm Enjoy Lunch on Your Own

12:45 Dessert Break with Exhibit Viewing


1:15 Chairperson’s Remarks

1:20 Extending Pharma R&D IT to the Cloud

David Ficenec, Global Head of Architecture, R&D IS/IT, Takeda Pharmaceuticals

Pharma and Biopharma IT departments are increasingly being pressured to provide necessary services to R&D business units requesting help managing data flow and data analytics stemming from complex projects and studies generating large volumes of primary data. Examples include image data analyses and clinical sample biomarker processing. Takeda Pharmaceuticals has historically provided these services using internal applications hosted on company infrastructure; however, the size and number of these studies has been increasing over time and a project was undertaken late in 2015 to leverage cloud services as a solution to these business drivers and challenges. This presentation will describe the process that was developed within the Takeda R&D IT group to develop a direct connection to Amazon Web Services and the process of working with the business to create a project governance model and IT support structure to manage this new environment.

1:50 iRODS, Data Commons, and Converged IT

Charles Schmitt, CTO & Director of Informatics, RENCI

The iRODS technology has been providing a data commons solution for large-scale research programs to manage their data for many years. In recent years, the shift towards data-driven research and the generation of large-scale research data sets, as well as new technologies for analysis, are further shaping the role of data management in supporting research. In this talk, I will present directions that iRODS is taking to support converged IT solutions through integration with other enabling data- and cloud-technologies that provide for data and knowledge representation, workflow analysis, and collaborative platforms.

2:20  Networking Refreshment Break


2:40 Roundtable Report-Outs

3:20 NCI Genomic Data Commons, Cloud Pilots, and FAIR

Warren Kibbe, Director, NCI Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute

The NCI Genomic Data Commons went live on June 6th, with genomic data on 14,000 cancer tumors and associated (but limited) clinical phenotype data. The NCI has also been exploring the use of commercial clouds to create a very different access, curation, analytics, and visualization model for these data, with the intention to democratize access and provide recognition and credit for data submitters, curators, algorithm creators, and software developers. Central to this model is the support of FAIR and broad data sharing of patient-level data.

3:50 CLOSING KEYNOTE PRESENTATION: Data Bloat Spectrum Disorder: Home Remedies and Alchemy for Life Sciences

Ari Berman, Vice President & General Manager, Consulting Services, BioTeam, Inc.

Data generation throughout the life sciences research and healthcare domains has risen at a rate far beyond that predicted by Moore’s Law. As a result, organizations are accumulating 10’s to 100’s of petabytes (PB) of data, spending millions on storage systems, and doing it all in a manner consistent with IT practices and policies from 2005. These practices include little to no data management, ineffective or non-existent data lifecycle policies, no metadata standards, and very few automated analysis pipelines that result in better understanding of the data. The result is a mass of unannotated data where the content is known only to the researcher who created it. This leads to tribal knowledge of the data and renders the possibility of merging the data with other datasets for large-scale modeling and discovery nearly impossible. In this presentation, we will discuss the general scope of the data bloat problem, how organizations have been approaching it, current solutions to data bloat, and how it might be approached in the future; through the convergence and abstraction of storage, network, and compute infrastructure into common analytics platforms that enable discovery and ease data management.

4:35 Close of Converged IT Summit


Founding Sponsors