home > ict > autumn 2012 > quest for quality
International Clinical Trials

Quest for Quality

Good clinical research heavily depends on the quality of the data collected, particularly for clinical studies. Unfortunately it is less obvious how to assure good quality standards for real world data and, at the same time, maintain a reasonable level of acceptability in term of time and costs for their collection and management.

The challenge in observational research is to achieve reliable and valid data with the smallest possible effort in order not to affect the overall study feasibility. The application of a strategy that foresees the use of aDAptive Real world daTa management (DART) could be a solution for overcoming the dilemma.

According to a recent definition from the EMA in February 2012, observational studies (ObS) are “defined by the methodological approach used and not by the scientific objectives” and they “include database research or review of records, where all the events of interest have already happened, for example, casecontrol, cross-sectional and cohort studies” (1). Following their definition, ObS also include “those studies involving primary data collection, such as prospective studies and registries in which the data collected derive from routine clinical care.” Moreover, ObS very often include patient reported outcomes (for example self-administered questionnaire on quality of life) or interviews performed by caregivers. An ObS research environment is totally different from classical randomised clinical trials (RCTs) and many of these differences affect data collection and management (see Table 1). As well as this, RCTs are usually conducted on a selected population which is fully controlled during the study, whereas observational studies provide data from a real life setting. Therefore, ObS are usually conducted by clinicians or other healthcare providers in the context of their clinical routine and working practice. This means the data recording and cleaning is not always the first daily priority. As the data very often get recorded after the visit and they also might be recorded on different supports (such as the specific hospital medical charts), this might affect the quality requested by the study protocol.

Moreover, site monitoring visits are the standard practice in RCT, according to Good Clinical Practice, and they often occur before the compiled CRFs get sent for data entry in order to allow investigators a better quality before data entry. The effect of such a method is that the number of inconsistencies, and therefore queries, is limited. This is not always the case for ObS. In fact Good Epidemiological Practice does not give specific advice for site monitoring visits, as they do not occur in all sites (3). An obvious disadvantage is that data are not controlled until the clinical data manager receives them and thus a high number of queries per patient could be sent back to investigators.

Last, but not least, in large multinational observational studies it is not unusual to experience a significant amount of missing data due to the fact that not all of the variables requested by the study protocol are always collected in each site involved.

Different Aims, Same Quality

After considering the heterogeneity of RCT versus ObS with respect to the method of data collection, it is interesting to focus on the differences of the study aims. In fact, Good Clinical Data Management Practices state that it is not practical to design a quality check for every possible error (2). Quality checks performed as part of data processing should target fields critical to the analysis, where errors are expected to be frequent and where it is reasonable to expect a high percentage of error resolution. There will always be errors that are not addressed by quality checks, which slip through the quality check process undetected.

For a large ObS designed to describe drug utilisation patterns in the clinical practice, should researchers apply the same approach as the one used for Phase 3 RCTs aimed to register a new drug? The latter is characterised by a comparison group, which is not necessary for that crosssectional study (of course, other types of ObS, such as case-control and cohort studies, have more similarities with randomised studies). These aspects differentiate the risk analysis process that the researcher follows in evaluating which type of target fields are critical to the analysis, and which errors are expected to be frequent and can be resolved. Thus, conduct of a solid risk analysis preliminary to the setup of the CRF and of the cleaning rules is critical in order achieve effective and efficient data cleaning. However, the study aim and its relative variables are not the only elements that should be taken into account – the recipient of the research results also needs to be considered, whether this is the FDA/EMA in the case of a PASS or PAES, or the scientific community in case of epidemiological studies.

The Five Ws of Real World Data Cleaning

When conducting ObS, clinical data managers should consider the ‘five Ws’ of data cleaning: what, who, when, where and why (see Figure 1).

Firstly, think about what to check and what not to – for example, in a case-control study the matching data could be as important as the randomisation process in a RCT and thus should be constantly monitored. Besides, ObS protocols usually have a primary objective and many different secondary goals which are very often something that would be great to have, but is not necessary. The service provider should define with the sponsor how data related to the secondary objectives will be checked. This is crucial in order to minimise the burden of the cleaning activities on the clinical sites. Moreover, as already mentioned, not all of the variables are generally collected in all the clinical sites and not always in the same manner, thus it is important to deal with a large degree of heterogeneity – will all missing data be queried? Will inconsistencies or out-of-range data be the object of cleaning?

Consider who will receive the queries, as this will drive data cleaning accordingly. If the investigator is the sole recipient, he or she is supposed to correct or justify all the queried data. Researchers should also take into consideration who the data cleaning will weigh on (clinicians in most cases). Thus, a plan to minimise the number of queries is strongly recommended, for example by a constant review of the data before the emission or during remote site monitoring activities.

Timeframe needs to be considered as well. When does sending queries have to be considered? Constantly or only at the conclusion of the study? The latter option could be the case for a retrospective study. For longitudinal studies, will it be at the end of enrolment and at the end of follow-up; monthly or after a specific number of enrolled patients? The type of data capture also plays a significant role. Most of the current electronic data capture systems are able to automatically send queries when data are input in the case report form; a different approach in data cleaning has to be followed with paper case report forms, taking into account the delay between recording and checking data.

The location of checks is also important. Some queries could be generated automatically, but a consistent part of clinical data management is also based on manual checks performed by evaluating single cases; as a result, a certain delay occurs between the data entry and query emission. For example, a retrospective study involving advanced stage cancer patients will often force the clinician or the site data manager, search for clinical records which could have already been archived. In many hospitals, medical records are kept out of the archive only for a short time. If data queries are sent a long time after recording the last patient, the investigator has to come and go from the archive several times. Could such an approach be successful? Maybe, but it would not be efficient. The time of query emission and where the data source are archived have an impact on data cleaning conduction and should be taken into account when planning a study.

Finally, why was the study implemented? This issue raises up all the previous ‘W’ questions. Data cleaning is performed in order to provide reliable data, but the combination of the issues to resolve, the correct recipient, the timeframe of the study, and the type of source data archiving, could all have a tremendous impact on the final result that the researcher aims to achieve.

Adaptive Real World Data Management

Having said that, it is clear that the application of traditional techniques and standards for a high quality data in a RCT are not applicable to ObS. DART management is a way to take all of these aspects into account in order to optimise and mould real world data cleaning for each specific type of ObS. DART management is a mixture of site management and data cleaning techniques – in fact, many aspects of cleaning strongly overlap with site management and vice versa. A single method does not fit all possible cases and one allocation of components may not be suitable for all of the sites involved in the study.

DART management could be compared to a sound equaliser. Equalisation is the process commonly used in sound recording and reproduction to correct the response of microphones, instrument pick-ups, loudspeakers and room acoustics. Equalisers have the flexibility to edit the frequency content of an audio signal to eliminate unwanted sounds or to make certain instruments or voices more prominent, to enhance particular aspects of an instrument’s tone and so forth. The best correction could be found for a specific instrument or room, but may be necessary to change it for another instrument or another room.

The same occurs in DART management. If an interim analyses has been planned during a study (for example, a five-year registry on a specific disease), ongoing cleaning might be useful in order to have updated cleaned data immediately available for statistical analysis and preliminary scientific communications. However, the type and the extension of the controls and the frequency of query emission should be constantly adapted based on recruitment status, date of the data extraction, type of data necessary for each individual analysis. It might also be based on the results obtained by previous interim analyses. In doing so, site management activities should be integrated into data management activities. Contracts with the investigators and carrying out remote reviews of the emerging query will vary according to the objective of the interim analysis and the time available to generate the necessary data. Moreover, balancing remote and site monitoring in order to create the best environment for clinical investigators to ‘play their best music’ can be achieved; in other words, collecting reliable and valid data in the most efficient way.


No single way is the best approach as the situation constantly changes during the natural evolution of the study. Using the ‘five Ws’ approach, along with the DART management method, an effective solution for breaking the efficiency barrier in observational research can be achieved.

  1. EMA, Guideline on Good Pharmacovigilance Practices (GVP), Annex I: Definitions, 20 February 2012
  2. Society for Clinical Data Management, Inc, Good Clinical Data Management Practices Committee, Good Clinical Data Management Practices, Version 4, 2005
  3. Good Epidemiological Practice – Revision 2, April 2007. Available at:

Read full article from PDF >>

Rate this article You must be a member of the site to make a vote.  
Average rating:

There are no comments in regards to this article.

Lucia Simoni is the Clinical Data Management and Biostatistics Unit manager at MediData srl. Lucia obtained an MSc degree in Statistics and a PhD in Population Genetics at Bologna University. She completed a two-year post-doctorate at Geneva University. She has been working at MediData srl for the past 12 years as biostatistician, and subsequently as manager.

Giovanni G Fiori is the founder and Scientifi c Director of MediData srl. He is a member of the Board of Directors of the Italian Society for Applied Pharmacological Sciences and is the national coordinator of the Observational Studies Working Group. Giovanni is a member of the Late Phase (coordinator) and of the paediatric working groups within the European CRO Federation (EUCROF). He is regularly invited as a teacher at the Master in Experimental Medicine, held annually at the University of Milan Bicocca.

Lucia Simoni
Giovanni G Fiori
Print this page
Send to a friend
Privacy statement
News and Press Releases

Turkish Cargo maintains its dual-terminal operations seamlessly

Completing the gradual transition process to Istanbul Airport, one of the largest airports of the world, the global air cargo brand Turkish Cargo maintains its dual-terminal operations with full capacity on 7/24 basis without any sales restriction.
More info >>

White Papers

pAVEway™ expression system for the efficient expression of therapeutic proteins

Fujifilm Diosynth Biotechnologies

One of the major bottlenecks in the production of biopharmaceuticals is the efficient expression of therapeutic proteins in microbial or mammalian cells. The Escherichia coli pAVEway™ expression system described here has been developed to ensure high product titres and efficient scale up to GMP manufacture, whilst minimising many common issues seen in other expression systems, such as ‘leaky’ expression (expression of recombinant protein in the absence of inducer).
More info >>

Industry Events

World Pharma Week 2019

17-20 June 2019, Seaport World Trade Center, Boston

World Pharma Week will bring together a unique and international mix of large and medium pharmaceutical and biotech companies, CROs, leading universities and clinical research institutions, emerging companies and tool providers—making it a perfect meeting-place to share experience, foster collaborations across industry and academia, and evaluate emerging technologies.
More info >>



©2000-2011 Samedan Ltd.
Add to favourites

Print this page

Send to a friend
Privacy statement