Where Do Target Trials Come From?
- Adigens Health

- 3 days ago
- 4 min read
Adigens Health helps sponsors design evidence strategies that meet the standard; from target trial specification to decision-ready confirmatory evidence packages. Get in touch at info@adigenshealth.com.
We talk a lot about data quality in real-world evidence. Is the dataset large enough? Are the outcomes reliably coded? Is there sufficient follow-up time? These are legitimate concerns, but they are downstream components of a more fundamental problem: most real-world evidence studies never clearly define the causal question they are trying to answer.
The target trial framework exists to fix this. A recent publication by Hernán and colleagues, “Where Do Target Trials Come From? Specifying the Protocol of A Target Trial When Repurposing Data For Causal Inference”, makes an argument that deserves wider attention: specifying a target trial is not a one-time act of pre-study planning. It is an iterative, data-informed process and how that process is managed determines whether your real-world evidence is credible or not.
The framework works by making the causal question explicit before you touch the data. You specify each component of a hypothetical trial; eligibility criteria, treatment strategies, assignment procedures, outcomes, and then attempt to emulate it using available data. This forces you to commit to your definitions before the results can influence them and prevents common design-related biases, but the protocol is always constrained by what the data can support. In practice, it must be developed iteratively as investigators learn what the database actually contains; a process which Hernán and colleagues explore.
The data will always push back – what matters is what you do next
Here’s how the story goes. An investigator specifies that the target trial includes three treatment strategies (A, B & C), eligibility criteria requires renal function measured by glomerular filtration rate and ischemic stroke is the primary outcome.
Then, they look at the data. Nobody in the dataset received treatment C. GFR is not recorded, instead there is only a binary flag for stage 3 or higher chronic kidney disease, and the database cannot distinguish ischemic stroke from other stroke types.
At this point the investigator faces a series of decisions that a study report never acknowledges; can you drop treatment C, is CKD flag an acceptable proxy for GFR? Is a question about all stroke clinically meaningful? Or have you moved to answering a question that is not important or appropriate?
The iterative nature of target trial specification is an inherent feature of working with data. This process of adaptation should be made visible; the original protocol, the data-driven modifications that led to the final target trial and the arguments defending each choice should be made explicit.
Even after investigators have settled on a particular estimand, all components of which can be mapped to the data, further modifications may be required when considering the identifying assumptions needed to learn about the estimand. Consider an investigator who discovers that treatment A is prescribed preferentially to patients with high blood pressure, a known stroke risk factor. If the database only records a binary high/low blood pressure indicator, confounding adjustment may not be possible. The investigator may have to restrict the analysis to patients with normal blood pressure, which adds an eligibility criteria not in the original protocol. The causal question has changed and the investigator only discovered this was necessary by looking at the data. This is the reality of observational research, and it needs to be presented as part of the evidence itself.
Knowing your data is not optional
Understanding not just what a dataset contains, but how it came to contain it is a rather underrated feature of observational research. Is “stroke” recorded when a physician confirmed the diagnosis, or when they suspected it and ordered the test? Are prescriptions captured at the point of prescribing or dispensing? Are contraindications documented consistently, or only when they’re clinically acted upon? These questions require understanding of the clinical workflow, the coding practices and ideally the validation studies that were conducted on the dataset.
Investigators with prior experience with a dataset may be significantly more efficient at the target trial specification step as they can anticipate which components are emulatable before they begin the iterative process. It does not take them weeks or months to understand that GFR is missing; they know before they write the first draft of the protocol.
Knowing your data also means knowing what you are allowed to look at. Reviewing the data dictionary, understanding variable coding and completeness and examining the distribution of potential eligibility criteria are widely accepted as appropriate data feasibility steps. Obtaining treatment-outcome associations and designing the target trial around the strongest associations is a clear no-go. In complex databases where proxies and related variables are deeply intertwined, the line between what is acceptable and not can become blurred. Hernán and colleagues make the point that a consensus still needs to be developed here. For those interested, the FDA has published guidance on assessing the relevance and reliability of real-world data that speaks to many of these same feasibility questions. We have summarized it in a blog which can be found here: Real-world evidence needs real-world discipline: the FDA raises the bar
On pre-specification: What is feasible
Full pre-specification of a target trial emulation using existing data is, in most cases, not feasible. What is feasible is pre-specifying the process of adaptation. Not every possible modification, but the rules by which modifications will be evaluated and documented. Which elements of the protocol are fixed? Under what conditions would an eligibility criterion be added or removed? What constitutes an acceptable proxy for a missing variable, and what does not? This is similar to adaptive trial protocols in randomised research, pre-specified flexibility, bounded by pre-specified constraints. However, it is harder to accomplish in observational settings because there is no experiment to adapt, the treatment strategies are often more complex, and the research team's familiarity with the data varies enormously.
Our take
The implication for anyone conducting or evaluating real-world evidence is this: the protocol of your study is not just a methods section artifact. It is the primary record of your causal reasoning. How it was specified, how it changed and why it changed are as important as the effect estimate it produces.
Studies that demonstrate this process transparently are more likely to influence decisions than those that present a polished protocol with no trace of how it got there. RWE is evolving and what it needs is more honesty about the messy, iterative, data-dependent process by which causal questions are formed. The target trial framework supports exactly that.

Comments