home > ict > autumn 2012 > big data, big results
International Clinical Trials

Big Data, Big Results

The growing availability of data from electronic medical records and plummeting genomic sequencing costs presents a significant opportunity in the UK for pharmaceutical companies and academic medical centres. Effective management of this data will attract industry collaborators and ultimately lead to improved patient outcomes.
‘Big data’ is a hot topic in the healthcare and pharmaceutical industries, fuelled by the impact of low cost genomic sequencing, the adoption of electronic medical records, growth in personalised medicine approaches and the necessity of collecting more ‘real world’ data to support post-market drug surveillance. The industry wants to better stratify populations and monitor drug performance and adverse events; hospitals want to improve clinical decision making and outcomes; and academic medical centres (AMCs) want to improve disease understanding, explore potential biomarkers and attract industrial collaborators. Big data provides the foundation to deliver the solution to all these requirements.

However, it can also be viewed as new, complex, rapidly evolving and sometimes high risk because of the potential impact on resources and working practices, and the impact of compliance with patient privacy. Knowing what data to collect, when to use it and how to measure and maintain quality are also confounding issues. How can hospitals/ AMCs/industry build appropriate informatics and data management infrastructures for big data to meet their respective requirements? The answer is having the right information infrastructure that can support research studies, clinical decision support and industry collaboration. It should be flexible for research, a validated clinical decision-making system that makes consistent use of genomics and patient data for improved patient outcomes, and provide secure data sharing for collaboration. In short, a data landscape optimised to support translational medicine in all its guises.

Historically, meeting these individual requirements has led to data silos, bespoke solutions and a lack of interoperability. Nevertheless, new holistic approaches are demonstrating that meeting the requirements of each group can be achieved in a single solution. By systematically capturing, integrating and analysing high-quality patient and genomic data, while also ensuring compliance with patient confidentiality, organisations can support biomarker analysis, patient stratification, comparative effectiveness, outcomes analysis and clinical decision support and diagnosis in the same solution. This article looks at how big data in healthcare is impacting the UK and examines how organisations are leveraging its potential.

Access to High Quality Data

In the UK, investment in early stage clinical trials has been falling steadily and the government is promoting major initiatives to make the UK more attractive to the pharmaceutical industry and other new entrants looking to leverage the health and population benefits of their products (1). With a shortage of high quality clinical samples and a hunger for post-market research, population studies and outcome analysis, there is a clear opportunity here; well-characterised patient and genetic data sets of appropriate scale will attract collaborators for clinical studies. In the UK there are significant investments into the following patient data initiatives and biobanks to help attract industry collaborators:
  • The Clinical Practice Research Datalink (CPRD) is the new English NHS observational data and interventional research service, jointly funded by the NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA). CPRD services are designed to maximise the way anonymised NHS clinical data can be linked to enable many types of observational research, and deliver research outputs that are beneficial to improving and safeguarding public health
  • Academic Health Science Centres such as Manchester, Cambridge, University College London, Imperial College and King’s Health Partners were formed to bring acute hospitals and universities closer together to support translational medicine. This is now being extended to Academic Health Science Networks, where hospital groups will be brought together to share data and best practices (2)
  • Large scale biobanks are also being developed and commercialised (for example, UK Biobank and Abcodia)

Translational Medicine Maturity Model

To be part of this growth in big data from patient and genomics, organisations need to assess their current and future requirements. Hospital and academic medical centres in the UK are at varying stages of implementing their information infrastructure for translational medicine and the big data challenge (see Figure 1), due to the differing levels of investment, intent and governmental guidance. Some organisations are running smaller scale studies that do not require significant investments in clinical trials management Systems (CTMS) and work perfectly well for this purpose. As the number of studies grows, more data management is required to track subjects and samples leading to investments in CTMS, laboratory information management systems and Electronic Lab Notebooks. The first two stages of the maturity model are essentially research and study focused, and include their own data capture for the specific disease area.

The managed level of the maturity model is where integration with hospital records comes into play to bring in wider and more complex data from multiple diseases and disparate systems. Information governance and patient confidentiality are critical in this transition, with organisations selecting deidentification or pseudonymisation of patient identifiable data depending on what future access is required. Access to wider patient populations enables organisations to assess cohorts more effectively for study internally or in order to respond quickly to requests for clinical trials. Systems are inherently based on a research data warehouse that is fed data from multiple clinical systems, often requiring disease focus and sophisticated procedures to manage the complexity of clinical terminologies and harmonise the disparate source systems.

As larger organisations progress to the integrated level of maturity, their systems support multiple diseases from the same infrastructure, including self service analytics for patient clustering, genomic analysis and annotation, and a master patient index to ensure all patients are uniquely identified across source systems. The pervasive level works across hospitals and is geared towards collaboration: between research groups, between research and the clinic, and between research and industry. This is the goal, where big data can drive clinical decision making based on a knowledge base of comparative genomic and clinical attributes, and external information

Most groups in the UK are clustered towards the left of the maturity model, due to the slower adoption of electronic medical records, compared to the US where the stimulus package has significantly impacted uptake. Within the UK, there are many key examples of the more sophisticated patient systems discussed here:

  • North West eHealth is a successful non-profit organisation providing access to integrated primary and secondary patient data sets across Manchester’s hospitals
  • Scottish Health Informatics Programme (SHIP) is a Scotland-wide collaboration between the NHS and Scottish universities that provides a platform for the management, analysis and linkage of patient records
  • Secure Anonymised Information Linkage (SAIL) was developed at the Health Information Research Unit for Wales in Swansea and links data on health, environment and education (3)
  •  Oncology Research Information System (ORIS) was developed for King’s Health Partners and is focused on integrating the clinical, biobank and genomic worlds for 25 per cent of the London oncology patients
  • Cancer Research UK Stratified Medicine Project is gathering 9,000 patient and biospecimen sets to demonstrate the potential for genetics in medical treatment in the UK (4)

Case Study: ORIS

The ORIS project was initiated at King’s Health Partners in 2009 as part of a strategic initiative to accelerate clinical research and attract increased industry collaboration in cancer. Building on successful investments in clinical information systems at the Integrated Cancer Centre, King’s aim was to build a platform to support translational medicine across the King’s Health Partners partnership, comprising King’s College London, Guy’s and St Thomas’, King’s College Hospital and South London and Maudsley NHS Trusts. An extensive requirements-gathering project, commercial procurement project and implementation resulted in the system going live at the start of 2012.

The resulting platform has a significant effect on research productivity; the time taken to select cohorts of patients for research studies or potential clinical trial is reduced from 16 weeks to less than one day. Analysis of genomic data is similarly reduced from 12 weeks to three days, with most gains coming from self-service access to analyses rather than requiring support from the busy bioinformatics team.

The architectural approach selected involves a number of functional layers that support use by clinicians to access identifiable patient information and provide researchers with de-identified information for research purposes. It was a fundamental requirement to bring together information from clinical systems, biobanks and genomic data; these are complex domains in their own right, and require a unique approach to combine in a flexible and scalable manner. The initial deployment has been for breast cancer, with other cancers and disease areas to follow. The following section describes the architectural layers (see Figure 2).


Data Capture and Source System Integration
ORIS integrates data from five different clinical, diagnostic, pathological, biopsy, research and genomics centric systems including: an in-house cancer information system, electronic patient records, diagnostic imaging and breast pathology sources. Information from these systems is mapped into an XML format that can be loaded into ORIS. Considerable effort has been focused on making this a repeatable and scalable process that provides traceability back to the source data. Due to the complexities of the disease, there are two breast cancer ontologies in the system, which are managed by a terminology service that allows data to be loaded or viewed via ICD-10 or an internal definition. Patient consent and information governance rules are managed in the system, with a sophisticated data-transfer methodology being used to enable a third party to perform pseudonymisation of the patient data. Pseudonymisation is the process by which patient identifiers are de-identified and given a research identifier before clinical data can be used for research purposes. If patients in the system are suitable for clinical trials they can be re-identified, but only by clinicians with appropriate permissions and with a public key infrastructure (PKI) certificate and private key. A master patient index is maintained as part of this process to ensure an individual’s journey across the three hospitals is properly captured and loaded into the right research record. Extensive data-quality and completeness checks are incorporated into the system.

Research Data Management Layer
The complexity of the data domains and necessity to support the changing needs of research resulted in a research data repository formed of separate data models for clinical, biobanking and genomic data. This was chosen to ensure that additional domains could be easily added, for example for preclinical data, and enable the full richness of source systems to be captured. Of particular importance is the management of genomic results from every experiment so that highly regulated genes or interesting variants are searchable and shareable across research groups, providing a central store of knowledge.

Analysis Layer
The analysis layer also needed to be highly flexible to easily support the construction of analytical workflows ranging from copy number variation (CNV) and gene expression analysis in genomics, to running survival analysis to support outcomes assessment based on clinical data. In all cases, the analysis is built by experienced informaticians either from scratch or based on existing analytics in R. These services are provided to end-user researchers and clinicians as self-service, standardised analytics to ensure that any analysis is run properly. This approach removes significant bottlenecks in analysis and access to data. Configuration parameters are exposed at different levels depending on the capability of the user.

Presentation Layer
For clinicians and researchers, the web interface to ORIS is accessed via a dual authentication login to the externally hosted system. Users are able to easily slice and dice the clinical data to select a cohort of interest and then save the group into a project-working environment for further study. This can be shared with internal and external collaborators. Cohorts can be selected based on clinical parameters, sample availability, and condition and genomic variants when data is available. Significantly, an individual patient’s journey can be represented as a timeline and compared with other patients in the cohort (see Figure 3). Cohorts can then be exported with covariant data for analysis in external packages or run against ORIS services based on the data type.


If properly managed, big data can have a positive impact on patient outcomes, improve research and increase collaborations. Investments are needed to assess how best to ensure that the appropriate quality and coverage of data is provided without major impacts on clinical care. However, it is almost inevitable that healthcare will transition to a more data-driven industry for the benefit of research and patients. Organisations that move decisively to capture and use healthcare data and associated samples and genetics will gain a competitive advantage in attracting industry investment as this approach becomes pervasive.


  1. Available at: clinical_trials_in_dramatic_decline_NHS_Confederation_ says.aspx
  2. Available at:
  3. Available at: healthinformationresearchunit
  4. Peach JHF and Tuff A, Cancer Research UK Shows Way for Genetic Testing in Cancer, Oncology News 6(5): pp8-9, 2011

Read full article from PDF >>

Rate this article You must be a member of the site to make a vote.  
Average rating:

There are no comments in regards to this article.


Simon Beaulah is Marketing Director of Translational Medicine at IDBS, and is responsible for the promotion of the organisation’s market-leading capabilities in personalised medicine across the pharmaceutical, diagnostic and academic medical centre space. Simon has been working in life science and healthcare informatics for more than 20 years, initially in research and over the past 14 years for informatics vendors including LION bioscience, BioWisdom and InforSense. Simon has degrees from Aston University and Cranfield Institute of Technology.

As Director of Translational Medicine Solutions, Robin Munro is responsible for the direction of the IDBS Translational Medicine Solution and heads the global team of solution analysts and scientific presales. He has over 15 years of experience in bioinformatics and extensive knowledge of the pharmaceutical and healthcare industries, with in-depth knowledge of translational medicine, biomarker discovery and molecular analytics, as well as the drug discovery process, and clinical and healthcare information management systems. Robin holds a Computational Biochemistry PhD from University College London and an MSc in Biological Computation from York University.

Paul Denny-Gouldson is Vice President of Translational Medicine at IDBS, and leads the IDBS Healthcare Group’s strategic planning and execution on the advancement of personalised medicines in the healthcare and pharmaceutical industry. He joined IDBS in 2005 as part of the acquisition of his ELN company and has spearheaded the drive to make E-WorkBook the acknowledged technology and market leader in this space. Prior to this, he was Senior Scientist at Sanofi-Synthelabo (now Sanofi) for just under five years. Paul obtained his PhD in Computational Biology from Essex University in 1996, and has authored over 25 scientific papers and book chapters. 

Simon Beaulah
Robin Munro
Paul Denny-Gouldson
Print this page
Send to a friend
Privacy statement
News and Press Releases

Five methods for reducing pharmaceutical cargo theft

In North America, five percent of cargo thefts that occurred in 2020 were pharmaceutical products and 74 percent of all cargo theft occurred in transit, according to the BSI and TT Club Cargo Theft Report 2021. In the US and Canada, the biggest threat comes when containers or trailers are parked at insecure locations, as those were the most exploited vulnerabilities last year. Market dynamics also changed due to Covid-19, as demonstrated by the fact that the theft of stolen cargo in the form of medical supplies such as PPE, increased by more than 5,000 per cent in 2020, compared to 2019.
More info >>

White Papers

The EC Definition of a Nanomaterial - Potential Measurement Methodologies


In October 2011 the European Commission published a definition of Nanomaterials. This move followed more than six years of scientific consideration of the potential toxicological and environmental challenges posed by engineered nanomaterials.
More info >>




©2000-2011 Samedan Ltd.
Add to favourites

Print this page

Send to a friend
Privacy statement