Non-financial conflict of interest: what is it, and why people who worship at its altar should generate some evidence
And other ways conflict of interest with Pharma is downplayed.
Many physicians think that our focus on financial conflicts of interest (FCOI) ignores non-financial conflicts. One example given by Lisa Rosenbaum in her controversial 3 part NEJM series is that, while on call, she had to decide if a patient should be transferred to her hospital and/ or receive fibrinolytics (reperfusion drugs). Although she wished the decision she made were based purely in the patient’s best interest, she worried that her conflict was that some decisions meant less sleep. Perhaps on the margin then, she was more likely to make the decisions that increased the chance of a restful night.
Many people who consider themselves ‘scientific’ or ‘logical’ love to tell anecdotes like this. Unfortunately, even when pressed, they are unable to provide any data supporting even a single non-financial conflict leads to different behavior. Believe me, I have asked (repeatedly)! (And still ask. If you have such data please send it to me via the contact link)
Sometimes, these ‘non-financial conflict people’ reference data that interventional cardiologists (as opposed to other physicians) are more likely to oppose a clinical trial that contradicts routine stenting for stable angina . But, last I checked, this does not totally separate financial from non-financial conflict of interest—as, spoiler alert—interventional cardiologists get paid (!!) to place these stents. At other times, they evoke the ‘non-financial COI’ of researchers caught in misconduct, like Anil Potti.
What about Anil Potti? they whine. He is an example of non-financial COI.
Presumably meaning: he did it for the fame.
Well, Anil Potti actually did have some financial ties , but more importantly, anecdotes are not data, and we really need data.
Just contrast this situation with the ROBUST data that financial conflicts affect the interpretation, publication, reporting, and even results of clinical evidence (I could put dozens of ref, but can’t waste time on a blog-post; just Google it). And there is evidence that financial conflict of interest is linked to increased brand name prescribing of drugs [3,4], and more favorable views of products . Non-financial conflicts have no single study meeting this mark, while FCOI has hundreds. Is it just that FCOI is measurable, or is that a pathetic excuse?
Why we need data not just BS stories
The anecdote by Rosenbaum is potentially testable. All one needs is to study a system where some decision used to be made by a third party doctor, but it was switched to a doctor who would presumably have a non-financial conflict. A colleague told me of a story that fit this bill.
In a hospital where he trained the decision of whether an ED patient went to general medicine or the ICU used to be made by a resident on call for just that purpose (triage resident), but the hospital switched it to the ICU resident. Thus before: a person with no skin in the game decided where a patient went, and after: a resident, who would get more work if the patient came to the ICU, began making the decision.
By Rosenbaum’s logic, we would expect a before and after study to show fewer ICU admissions. Or that patients admitted to the general medicine service would be of higher acuity, because the ICU resident was pushing off cases on the margin.
Yet, my colleague told me at his hospital this policy was analyzed and presented at Grand Rounds. The change in policy resulted in more not less ICU admissions, and the cases on the margin went to the unit. Researchers felt that perhaps the ICU resident recognized the potential conflict and overcompensated the other way.
My point here isn’t about this particular story, as these results have never been published, and I can’t confirm them, though the person who told the story has a steel-trap mind. But even hearing the story makes the point that we have to study these things. The real effects may not be what we think. Maybe they go the other way.
It is unscientific, lazy, and even ignorant to sit back and say we cannot study non-financial conflict. It can be done easily, as in this case. Countless natural experiments in residency fit the bill, and could be testing ground. One can think of other ways.
We may even find—god forbid—no relationship. Perhaps there is no net bias if the doctor who is deciding about transfer acceptance has to do the work if the transfer occurs and fibrinolysis not given—maybe there is but it is towards this happening. Who the hell knows? That’s why we engage in science and not just tell stories.
When we think of non-financial conflicts, we have to think about net bias. By that I mean, that with financial conflicts it is clear: the net bias is towards more favorable point estimates of activity, more favorable cost effectiveness, more favorable benefits, diminished or underemphasized harms, more treatment, less observation etc etc. That’s what dozens of studies have shown.
With non-financial conflict what is the net bias? We have to hypothesize a direction (as in the resident examples) and test for it. Some people speculate (again anecdotes) that doctors who develop chemotherapy combinations are more likely to favor the combinations they developed! But what is the net bias?
The NCI likes dose-adjusted R EPOCH, and MD Anderson likes HyperCVAD. Neither group makes money from picking one over the other (as all drugs are old), and in the absence of head to head data that could adjudicate which is better, each favor their own. So a study could show that institutions like to use regimens developed there.
But then I would say, what is the net bias? In one place it goes to one regimen, and in the other it goes to the other? Who cares? What’s the point: Patients should be told that other institutions do things differently, but neither has robust data theirs is best? Sure, yes, but fundamentally there is no net bias in one direction, as with financial conflict.
And, at some point it becomes absurd. People either like Tylenol or Ibuprofen for simple headaches, and doctors who like one or the other, give the one they like. Again, what is the net bias? In contrast with financial bias it is toward the more costly, more novel, and more aggressive option (rather consistently)
Teasing out financial bias
Some bias may be both financial and non-financial. There may be several layers even. Urologists cling to the horrible screening test of the PSA like their wages depend on it. Of course, their wages do! Joking aside, studies show they are less wiling to abandon this test as compared to general doctors 
So to tease apart financial conflict from non-financial conflict we might make 3 groups: general doctors, salaried urologists, fee-for-service urologists. The difference btw fee for service urologists and salaried tell you the financial component, and the difference between salaried and general doctors tell you the professional (non-financial bias) component.
But, this isn’t totally right because even salaried urologists justify their wages, in part; through the volume of surgery they get from PSA screening. If PSA vanished completely, salaried urologists would be hurting for sure. So bias in favor of PSA is somewhat financial and non-financial for (even salaried) urologists, and teasing these apart will require creativity.
Financial bias in writing books/ Directionality of conflict.
Jeff Drazen, the editor of NEJM, tends to minimize the issue of financial conflict of interest, in part because of the 3 part series he published and his accompanying editorial, but he was quick to point out the financial conflict of interest in his critics.
About Ben Goldacre, author and researcher, who is a proponent of data sharing and transparence Drazen says this: “Ben Goldacre, is “trying to sell his books and he’s trying to tell the world that clinical trials aren’t reliable,” Drazen said.” 
When it comes to multi-billion dollar companies bending data, selectively reporting data, using medical writers, etc, to create favorable impression of products, Drazen thinks FCOI is oversold; but when it comes to greedy Ben Goldacre selling a his books (now in paperback), he understands it clearly!
Of course, writing a book is similar and different than selling a drug. In both cases, once it is done, you get money from selling it. Though, as an author of a book, I can tell you, you don’t get much money. In fact, I would be richer if I had gotten minimum wage for the hours I spent on it.
But, the big problem with books is the directionality of the bias. Ben Goldacre could have written a book extoling the virtues of pharma as much as he has written a book cautioning about the limits. It is hard to believe Ben Goldacre’s or any author’s view exposed in (these kind of) books is based, largely, on the revenue they expect to make from books. There are books praising Pharma and damning Pharma, books that sing the praises of cancer screening and those that condemn it. If one authors a book in one direction, and later makes statements in that direction: is that a financial conflict? Similar to the drug maker paying the medical writer?
Perhaps celebrity doctor writers like Dr. Oz, etc. think about what moronic pandering advice will appeal most to their audience before writing and that is a non-financial bias. But, at the outset, I doubt that—for most books—and for Ben Goldacre—there is financial conflict in his work. He believes what he believes and is consistent about it. Again, though it would be nice to have data here.
Getting ahead by bashing pharma or siding with it?
There is a faction of people who believe that academics get ahead by bashing pharma and that is the real motivation to criticize the industry.
Of course anecdotally, there are many prominent academics who are critical of the industry, and many (think USC economists) who worship the industry and hail their actions. In my field oncology, it is almost as if, the more you work with industry the better your career (as you get more trials).
Nevertheless, I actually happen to think it is a provocative idea, and wish we had data. I must admit to doing some work right now in this space. But, again, believing that this is true without data, and being happy to continue believing it without data, is extremely unscientific.
Financial conflict of interest is smoking
Just as there are many potential carcinogens, but it is clear smoking is one; there are many potential conflicts, but it is clear financial conflict of interest leads to skewed outcomes. No one would advocate ignoring smoking while we flesh out whether Gatorade is a conflict, yet many think we ought to ignore financial COI while we flesh out whether accepting a transfer on call when you are sleepy is a conflict.
This is a distraction.
If you think non-financial COI is a problem, why don’t you dry your tears, and generate some credible evidence of it. And, if you don’t think it can be measured, or documented in any form, but you are still scared of it, then I have some ghost stories for you. Also, please don’t consider yourself someone who practices ‘science’ Because, scientists like to devise ways to measure things that hitherto were thought unmeasurable. Non-financial COI is ripe for measuring. Trouble is data may get in the way of preconceived notion. So, next time I ask you for data supporting nonfinancial COI, just say say you believe in ‘faith based’ not 'evidence based' conflict of interest.
 DeJong C, Aguilar T, Tseng C, Lin GA, Boscardin W, Dudley R. PHarmaceutical industry–sponsored meals and physician prescribing patterns for medicare beneficiaries. JAMA Internal Medicine 2016;176:1114-10.
 Fleischman W, Agrawal S, King M, et al. Association between payments from manufacturers of pharmaceuticals to physicians and regional prescribing: cross sectional ecological study. BMJ 2016;354:i4189.
 Lerner TG, Miranda Mda C, Lera AT, et al. The prevalence and influence of self-reported conflicts of interest by editorial authors of phase III cancer trials. Contemp Clin Trials 2012;33:1019-22.
Why a Cancer Moonshot is Unlikely to Give us a Cure. (Washington Post)
- Preventing cancer is biologically challenging: Why use of surrogates and patent incentives won’t improve early cancer or chemoprevention drug development (see Below)
- Five Years of Cancer Drug Approvals: Most Cost too Much and Give us Too Little. (RAPS invited rebuttal)
- Why Are Most Cancer Drugs So Expensive. (Guest of Allen Frances at the Huffington Post)
- Perspective on the Selective reporting of primary, secondary endpoints common in randomized oncology trials. (HemOnc Today)
- The Folly of Big Science Prizes. (The New York Times)
- The FDA generally errs on the side of approving drugs: Why the analysis by Montazerhodjat and Lo gets it wrong.
- When Doctors Rely on Promise Instead of Proof. (Baltimore Sun)
- Why Lisa Rosenbaum Gets Conflict of Interest Policies Wrong (Lown Institute)
- Let's Take a Close Look at PARADIGM-HF (CardioExchange)
- And, the redacted discussion (CardioExchange)
- President Bush's Cardiac Stent (Washington Post)
- How Should We Weight Conflicting Advice About our Health (Footnote1.com)
- Clinical Researchers Meet the Physician Scientists (Kevin MD)
- Work Hour Reform is a Moral Imperative (Kevin MD)
9/19/16 The Statin Debate
Do we have enough evidence to recommend statins as primary prevention?
Sadly, the answer is no
This debate is about whether or not doctors should recommend statins as primary prevention. Let’s be clear, it isn’t whether or not a person/ patient should heed that recommendation and take a statin—that remains a personal choice. It is about whether, based on what we know, it is good public policy/ medicine to advise a person to take (and encourage them to stick with) statins for PRIMARY PREVENTION—when they have not had and do not have evidence of cardiovascular damage, such as angina, a history of heart attack or stroke. It is for those who otherwise feel fine with no cardiac history—in fact, as some estimate, a recommendation that would affect nearly a billion health people globally .
In this debate, there are two sides. One side thinks the answer is yes, we ought to recommend it, the data are clear, and shut up, critics. The other side thinks the answer is no, we should not recommend it (or we are not obligated to recommend it), either because the data are unclear (this is where I fall), or the data are clear that it doesn’t offer a net benefit.
In this debate both sides also agree on the rules. In order to recommend a pill taken daily (probably for life), doctors should have evidence that the pill either A. Improves length of life (without sacrificing quality) or B. improves quality of life (without sacrificing length). This is true for all medical practices, but especially true in primary prevention—where you muck about in the life of someone who already felt fine. Sadly, the simple fact is that as of 2016 we do not have clear and convincing evidence of either A or B. I am sorry that we do not because if we did, statins would be a great thing for many people globally.
Before I launch into the data, let me again clarify this is a discussion of primary prevention. If one more person says but what about people who had a heart attack, I will say FOR THE LOVE OF GOD WE ARE NOT TALKING ABOUT SECONDARY PREVENTION, WHY CAN YOU NOT UNDERSTAND THESE ARE DIFFERENT THINGS? Just like chemotherapy may extend life for someone with lung cancer, but not someone without it, can you not understand that perhaps a pill may extend life for someone who has declared himself to have a plaque rupture phenotype in the setting of elevated LDL, but not someone who hasn’t demonstrated that phenotype?
Ok, now the data for primary prevention
Do statins A. Improve longevity without a decrement of quality of life?
ANS: Probably not
It is a tough question to address, and in just 5 years we were able to identify 24 meta-analyses of randomized trials that tried to tackle it.  The short answer is if you just focus on primary prevention (AS WE SHOULD BECAUSE THAT IS WHAT WE ARE TALKING ABOUT), you have to exclude all the secondary prevention patients from randomized trials that allowed them to be included. When you do this, you find that statins do not reduce all-cause mortality (Deaths statin v placebo 4.13% vs 4.44% p > 0.05) . As I will explain later, even this raw, non significant difference of 0.31 percent is likely to be inflated.
You may be puzzled to see this non-significant result, and instead cite the Cochrane report, which found a SIGNIFICANT improvement in overall mortality (Deaths statin v placebo 4.41% vs. 5.17%) . But this analysis did not exclude all secondary prevention patients. So, why should we look at this irrelevant number?
Other analyses like the one by the Cholesterol Treatment Trialists’ choose to report overall mortality based on LDL reduction. In other words, “risk of all-cause mortality (RR 0·91, 95% CI 0·88–0·93, p<0·0001 per every “per 1·0 mmol/L reduction” in LDL.” (Supplement P 13 )
Let me say, this is the most moronic way to report morality. It tells you statins’ effects if you take them, tolerate them, and based on (proportionate to) the LDL reduction you get from them. That’s not what I want to know. I want to know if you get a mortality benefit from being assigned to take them versus not taking them—not knowing what LDL reduction you will get. In other words, I want to know the data as it pertains to the clinical question. The fact the CTT keep analyzing their data based on LDL reduction to me is just another clear sign that someone else should really have access to these data.
The bottom line is that these estimates are not too far apart. An absolute risk reduction of 0.31% (not significant) or 0.76% (significant). But the truth is that both estimates are probably inflated. The Cochrane estimate is inflated in part because they include secondary prevention patients (and they should know better). The 0.31% estimate is inflated because it includes trials that use drug run in periods, odd inclusion criteria, weird control treatments, and antiquated data.
Drug run in periods
Drug run in periods are shitty trial design. The only reason they were developed was some investigator must have once said, “I am still nervous that this trial could be negative, I wish we could do something to make it more likely to be positive… what could we do..? I got it!!”
In a drug run in period, you don’t take people and randomize them to the intervention or the treatment. In other words, you don’t just replicate the clinical situation and question you want to answer. Instead, you enroll patients, put them all on either a placebo or the active drug, wait to see who doesn’t take the drug, who has bad side effects, and throw them out of the trial, and then randomize whoever remains. Doing this creates 1. An inclusion criteria that cannot be articulated at the outset or replicated 2. Enriches your population with people who are not like the real world, but much more likely to stick with the treatment and not complain. And, as a result 3. drug run in exaggerates benefits and minimizes harms.
The Heart Protection Study (included in the CTT & Cochrane analysis), and other statin trials utilized a drug run in period. In the HPS:
Potentially eligible people entered a prerandomisation “run-in” phase, which was intended chiefly to limit subsequent randomisation to those likely to take the randomly allocated study treatment for at least 5 years. Run-in treatment involved 4 weeks of placebo (to allow review of liver enzymes, creatinine, and creatine kinase by the central laboratory before starting any simvastatin) followed by 4–6 weeks of a fixed dose of 40 mg simvastatin daily (to allow a prerandomisation assessment of the LDL-lowering “responsiveness” of each individual. The general practitioner was informed during the run-in of their patient's lipid profile, including LDL cholesterol… If the general practitioner considered there to be a clear indication for (or, conversely, a clear contraindication to) statin therapy then that person was not to be offered randomisation. Compliant individuals who did not have a major vascular event or other serious problem during the run-in, and agreed to participate in the study for several years, were randomly allocated to receive 40 mg simvastatin daily or matching placebo tablets.
I could not think of a more useless, limiting, and biasing inclusion protocol.
Other weeding out normal people inclusion criteria
Run isn’t the only way to cherry pick the people you randomize, and make your trial less generalizable and more shitty. In the WOSCOPS trial, ~160,000 men ranging in age from 45 to 64 years were invited to attend the clinics to discuss the study. ~80,000 made the first visit, had their lipids checked, given dietary advice and asked to return. ~20,000 made the second visit, had their cholesterol checked, and asked to return if they met eligibility, and then on the third visit, ~13,000 returned, an EKG was checked, and again asked to return. On the fourth visit, ~7000 were randomized. All this sounds great, but what the hell does that do to inform a doctor who has to make this decision on the first or second visit with a person? Should they also demand 4 visits? These sorts of non-pragmatic inclusion criteria screen out normal people, and leave behind super people—probably people who complain a lot less. Because, I am pretty sure that I would not be able to make all 4 visits, and would complain a lot about it.
Other useless (or harmful) drugs
In some randomized trials, like the Japanese MEGA trial, “Physicians could prescribe mild hypolipidaemic drugs (eg, γ-oryzanol, riboflavin butyrate, pantethine) to patients in the diet group if they deemed that such treatment would be useful to prevent dropout”
But such a design may mask the harms of statins, as the control patients are given harms from all of these drugs. It may even exaggerate the benefits as these drugs may not only be placebos, but perhaps even harmful. We have no idea what these drugs do—why is this part of the trial design?
A lot has changed
Another thing we have to mention is the vast differences in people over time. WOSCOPS enrolled pts from 1989-91. Lot has changed since then, and not just my hairstyle. We eat differently than we used to, we smoke less (which robs statins of benefit), but we are fatter. Do these secular trends alter the risk benefit calculus of statins? Do the old data apply equally to today’s patients? Can we re-analyze trials with weighting towards people with risk factors more akin to modern people? These are all important questions.
When it comes to living longer statins either have not met this bar, or have a sliver of an absolute risk reduction that is on the cusp of significance and comes from very biased trials that likely overestimate the benefits. This is a hard sell. What’s next, you want me to sell Mary Kay Cosmetics?
Do statins improve quality of life without losing longevity?
Without a doubt statins prevent vascular events that are bad-- like stroke or myocardial infarction. They also cause side effects that leave patient’s miserable like aching , cramping legs, and a litany of complaints that every primary care doctor has heard. They also increase blood glucose, and rates of diabetes.
Some docs look at all of these and say that this is obviously a benefit. A heart attack is way worse than a little leg pain or a touch of sugar. But this is simply an assumption and not a proven fact. It is possible statins prevent just mild troponin elevations, and the leg cramps are far greater burden to people. Anyone who has ever given statins knows the challenge of repeatedly coaxing a miserable person to keep taking the pill. As Richard Lehman so wisely put it, “muscle pain and fatigability are not a figment of misattribution and public misinformation. They are too prevalent and recurrent in people who desperately want to stay on statins. Rather than discount a widely observed phenomenon, we should ask why there is such a mismatch with reporting in the trials. There is an urgent need for studies in the elderly, to test the hypothesis that their borderline daily functioning may be impaired by statins, tipping people into deconditioning and dependency.”
The way to sort all of this out is well done quality of life instruments. Unfortunately, for the most part, statins trials did not include such scales—something statins proponents acknowledge “Few of the large long-term randomised placebo controlled trials of statin therapy specifically assessed quality of life, ” For this reason, we cannot say honestly and for certain that statins improve quality of life (even if they don’t improve quality).
Data sharing will surely help this debate, but it won’t settle it unfortunately because you cannot share data you never collected and that is a big part of the problem here. We cannot know the true harms of therapy because trialists did not meticulously seek them out. To be fair, many of these trials were conducted in a bygone era where how patient’s felt on therapy was as irrelevant to doctors as the cost of the drug.
Why do so many say there is no debate?
As I have framed it, it is clear there should be a debate here, but I understand why many think this is settled. It must be hard to contemplate that something you do everyday—a veritable backbone of primary care and cardiology-- may be in error. I think it is much easier to go to work believing that the benefits must outweigh the harms. So I understand why many ‘experts’ cannot even entertain the idea that this entire paradigm is erroneous. Some call this phenomenon cognitive dissonance.
But, I suspect that some of the protest is disingenuous. Some experts know that there are legitimate questions/ disagreement here, but they compare the debate to that of vaccines and autism to stifle it. I would argue this is a mean-spirited tactic.
The abuse of statins
Let’s talk about the great big elephant in the room, the ABUSE of statins. How many of us have seen a 40-year-old woman with no cardiac history and no family history and fantastic blood pressure on a statin because her LDL was a tad high?? —forget that her HDL is through the roof and her vascular risk is beneath the floorboard. And she is complaining of muscle aches, or just forgetting to take a statin.
Or the 30 year old male cardiology fellow who takes a statin because he had an uncle who had a heart attack. And the uncle was a smoker, oh yeah, and he was 78.
Or a doctor re-checking a yearly cholesterol in a 27 year old woman, whose last LDL was 151… with plans to do what exactly if it is 162 this time?
We have all seen these total irrational uses of statins; in America, in many places, tstatins are being passed out like skittles to many people with very low vascular risk who will almost surely not benefit. And the passers have an irrational zeal they are saving the world.
Would I take a statin?
Yes, if I had a heart attack and high cholesterol, but OF COURSE NOT for primary prevention—are you crazy! The present data is far too shitty to have any reasonable certainty of this. As for Zetia or god help us fenofibrate or niacin—get the F&*k outta here. Data for all of those range from total garbage to outright contradicted. As for the PCSK9 drugs, the jury is still out, but taking a gamble is not best suited for primary prevention.
But I do think this is a great topic to study. We ought to do a very large (DOUBLE BLIND—no more open label) randomized trial paid for by NHLBI using generic atorvastatin or placebo (NO DRUG RUN IN and pragmatic, 2016 inclusion criteria – that don’t allow other unproven drugs). We should collect detailed quality of life and mortality. We should try to settle this question in our lifetime, so at least we can stop wasting so much page space talking about it.
Things I won’t talk about because of space/ time constraints
Low vs high dose trials & whether to include them
Importance of statins as LDL reducers vs. other pleotropic effects
Harms in RCT vs. Observational studies
Treat to target LDL/ based on LDL/ Treat based on CV risk/ Fire and forget
Conclusion: When there is clear data that recommending statins improve survival (without decrement in quality of life) or quality (without loss of longevity), then there will be no debate. Unfortunately, as of 2016, we are not there yet.
Statins will be continued to be debated; and trialists won’t yet be able to add them to the water supply. Also, share the data.
Errors in the NEJM’s portrayal of Randomized Controlled Trials
Wrongly Tarnishing the Gold Standard
Rumors of the demise of RCTs have been greatly exaggerated
It’s time to End the Misleading Rhetoric about Moving beyond the RCT.
This week the NEJM published, “Assessing the Gold Standard — Lessons from the History of RCTs.” The article rather boringly describes the history of RCTs, and makes some uncontroversial points, but at the same time it systematically denigrates the role of the RCTs and undermines their importance to justify future medical treatments.
Most of the criticisms of RCTs made in the article are completely in error. For this reason, the article is a subtle threat to the pursuit of evidence-based medicine and a threat to better decisions for patients. And, precisely because it is subtle, its danger is all the greater. It is likely to be swallowed, hook line and sinker. Furthermore, the article comes at a time when the fundamental editorial direction of the Journal has been questioned. This article is likely further evidence of the NEJM’s regressive thinking, and is a strategic move by the Journal to undermine evidence.
Before I dismantle the erroneous arguments in this piece, let me make a comparison to Eric Lander’s Heroes of CRISPR. Mike Eisen astutely noted that Lander’s history was a work of calculated genius in that it subtlely, but steadily denigrated the role of Berkeley and other scientists, and raised the work of Broad research Zhang, which of course would benefit Broad leader Lander, who is on the cusp of a billion dollar patent ruling. The present essay is similar. It subtlety undermines the role of RCTs in future regulatory decisions, and would undermine the RCT to adjudicate precision medicine.
First let me start by saying what we have to focus on. There are lots of things doctors do. We make diagnoses. We give prognoses. We try to understand the underlying biology, but the bread and butter of what we do for people is making recommendations: You should take this pill, have this surgery, undergo this screening test, follow up with that blood tests, undergo that surveillance PET/CT. In short, we recommend interventions to make people better off. And that is something that really demands an RCT. Maybe not of the precise situation you are in, but ideally at least some proof of principle that what you are doing can work as you intend.
Along these lines, the article gets some things right. They say RCTs minimize bias, particularly confounding by indication. RCTs were used to strengthen the rigor of medical science. RCTs have debunked many charlatan claims. RCTs have gotten very bureaucratic, and many cost much more than they ought to. This is all true.
But throughout the paper the authors also advance their other argument “the past seven decades also bear witness to many limitations of this new “gold standard.” Most of these limits are mistaken—red herrings used to push a lesser evidence agenda.
The first one is this early criticism of RCTs, “ Some critics worried about the ethics of withholding promising new interventions from control groups.” The article nicely points out that RCTs are able to show if interventions actually work, but fails to mention that many RCTs have revealed that interventions in use actually HARMED people. The CAST trial showed anti-arrhythmics killed. Auto-transplant for breast cancer showed the intervention dramatically raised harms and did not improve survival. Hormone therapy for women several years post-menopause increased heart attacks and strokes—the precise things they purported to lower. In fact, in many RCTs it was the control group who bailed out the treatment arm—allowing us to note that we were harming people. RCTs also took the wind out the sails of things many were damn sure would work, renal artery denervation, Pro-MACE-Cytabom for lymphoma. I could go on and on, but I don’t need to. We wrote a book loaded with examples of this, called Ending Medical Reversal.
Moreover, even negative RCTs (no benefit, no harm) show we were wasting time and effort, and of course, RCTs are rarely continued until definite proof of harm.
Other data is valuable too
The NEJM authors argue that other data is valuable, “A quick scan of the medical literature reveals that older methods, including case series and even case reports, continue to be valuable.” Here the authors are correct. They are valuable. A case series of a few patients alerted us to PCP and AIDS.
But they are totally wrong when it comes to the main question. Case series and case reports are rarely valuable when it comes to MAKING A THERAPEUTIC RECOMMENDATION TO A PATIENT AND KNOWING YOU AREN’T SELLING THEM BULLSHIT.
Paul Glaziou and colleagues keep a list of interventions that are universally considered beneficial without RCTs  There are real examples here. Good examples. But we have to acknowledge that this list is very small. Compared to hundreds of thousands of things doctors do day in and day out, the list is a tiny fraction. In fact, if you are recommending something without an RCT, it is much more likely not to work than to work.
The elephant we have to mention on this topic was a tongue-and-cheek paper in the BMJ saying there are no RCTs of parachutes. Yes, great point, and quite funny. You don’t need RCTs of parachutes. But parachutes improve survival from <1% to >99.99%---there are just not interventions of that magnitude in medicine. Let me repeat. There is NO INTERVENTION IN ALL OF MEDICINE WITH THIS MAGNITUDE OF BENEFIT.
Ioannidis showed that only 1/80000 interventions improve mortality 5 fold in all of Cochrane’s database (empirical effects) . Gleevec may be a “parachute” like drug (3 yr survival 50 to 95%), but most of what doctors do are not parachutes. Most have effect sizes that require RCTs to separate bias from signal.
Observational research is generating important data
The authors offer this point that new observational research is generating important data. “New methods of observational research continue to emerge — for instance, using large databases of patients to produce comparative effectiveness data on various treatment outcomes relatively efficiently in settings of routine care.”
Sure, yes, and observational studies are great to show things we thought worked don’t work in the real world. For e.g. Sorafenib in liver cancer has a marginal 2-3 month survival benefit in a contrived randomized trial of otherwise very healthy people. But in the real world, the toxic and marginal drug shows no benefit in a Medicare data set, with older and sicker people. And of course, the absolute survivals are much lower.
But, at the same time, the opposite is far more tenuous. Recommending a practice based solely on observational data alone is an uncertain business.
For example, observational research is notoriously prone to confounding by indication—aka healthier people get the intervention. Let me give one example.
Recently, oncologist Maurie Markman  asked ‘Do we really need an RCT?’ of adjuvant therapy in fully resected small cell lung cancer. First, know that small cell lung cancer is a highly aggressive and lethal disease with very few cures obtained, and only if the disease is localized on presentation. Second, know that the disease is almost never localized on presentation. Third, know that even more rarely is the disease so limited that someone can try to cut it all out. Fourth know that there is a legitimate question of whether, in this situation—after cutting something very localized-- chemotherapy and radiotherapy (no walk in the park) is worth it. Fifth, know that probably most oncologists in 2016 would try to offer this intervention in people who could tolerate it knowing what we know about how fast small cell lung cancer grows. For this reason, I think adjuvant treatment is a great question. I don’t know the answer. I would love to see an RCT on it. I don’t think an obs study will cut it because of confounding by indication. Maurie doesn’t want that RCT. He thinks he knows the answer. Why?
Well, a recent observational study  showed that patients who got chemo or radiation after surgery did better than those who didn’t. Median overall survival was 66 months versus 42 months for patients who had not received adjuvant treatment, with a 5-year overall survival of 53% versus 40% (P < .01). The authors even adjusted for a handful of covariates.
Of course, patients who didn’t get adjuvant therapy were older, less likely to have private insurance, and probably sicker in many ways the poor capture of co-variates misses. This is something the authors understand “there is a possibility that selection bias contributed to the much higher survival” but something Maurie doesn’t understand as he argued that no RCT is needed because we already know the answer. In fact, if you test such an intervention in an RCT don’t be surprised if it turns out to be negative, or if the benefits are far less than in observational work. It has happened so often… well we wrote a book on it.
When it comes to establishing that an intervention can benefit people under SOME or ANY circumstances, there is no substitute for an RCT. Observation just doesn’t cut it. Even newer techniques to match covariates, like propensity score, fall short.
Proof that propensity score falls short
One way to validate observational studies is to compare concordance between observational studies and subsequent RCTs. Of course, this is a very select group (probably skewed towards inflated concordance), but it can still be instructive. On this topic, the largest empirical study nearly 15 years ago by John Ioannidis shows marked discordance . In other words, observational studies and RCTs reach conclusions that diverge beyond chance in a non-trivial percentage—but the real trouble is not the percent, but that no one knows when they will diverge without doing the RCT.
Now, consider the concordance of propensity score matched and RCTs on the same question. In 2014, Dahabre and Kent  examined this and again found a similar amount of discordance. Discordance again had no predictable pattern. For this reason, for the time being it is unlikely that even propensity score analysis is sufficient to justify a medical practice.
Almost alluding to precision oncology and precision medicine, the NEJM authors say, “critics have argued that it is inappropriate, and sometimes impossible, to evaluate such long-term, highly individualized interventions”—like psychotherapy. But this is a pathetic objection. One could easily randomize people with psychiatric ailments to psychotherapy or a control arm of a similar amount of time spent with a provider, and follow them. This is child’s play.
They then argue that there is little funding for such trials. But this is a limit not with RCTs ability to judge evidence, but rather our collective failure to run a public, non-conflicted trials agenda. As long as we have obstructionist politicians, we will not have enough funds for science. Solution: vote them out of office.
Along these lines, many say RCTs are not needed for precision oncology where drugs are given for specific mutations at an N of 1 level. How can we test an individualized therapy? Easy, you randomize pts to a precision oncology strategy or to usual care. And, thus far, when you do it, you get no benefit—as seen in the Lancet Oncology RCT SHIVA. So, hello people!! We are already doing RCTs on this topic. IF you oppose RCTs as impossible but some have been reported, something is wrong with your thinking.
We need to prove that the strategy of individualized oncologic care is superior to empirical treatments—how can we do that. Well, we are going to need an RCT, or many RCTs to show which strategy can work (because expect some more null results).
Even though NEJM authors awknowledge that Sham controls debunked the ligation of internal mammary artery, and many orthopedic interventions. They argue “Sham controls could not be used for major operations, which limited opportunities for blinded trials.”
But why is this is case? Only because of hubris among surgeons and interventionalists that what they are doing actually works. If the profession understood the history of placebo effect, sham surgery, and subjective outcomes, they may be more willing to embark on trials that could assess whether they offered valuable treatments or expensive, invasive placebos.
If you want proof of this, see a recent Twitter string started by John Mandrola, where he questioned whether afib ablation requires a sham RCT to demonstrate QoL benefits beyond placebo. The response from EP doctors was fierce. Many tweets in reply showed images of afib that terminated after ablation, but of course, this is besides the point. No one doubts that you can terminate afib (a surrogate) in some patients, the question is whether this translates into an improvement in quality or quantity of life beyond the procedure itself. Same is true for stenting for stable coronary disease. No one doubt it opens the narrowed artery, but it doesn’t decrease MIs or mortality (COURAGE), and are gains in symptoms real, or simply an expensive placebo effect? (COURAGE did not have a sham control) A sham trial will tease it out. Sham controls aren’t needed for OBJECTIVE outcomes, but they are needed for SUBJECTIVE ones.
Now what about sham trials of total knee arthroplasty for pain—in other words big surgeries. Well, I bet if we bucked up, wiped away surgeon tears, and ran them (control arm gets sedation and a long incision on the skin, and maybe some superficial hardware for them to palpate to simulate replacement), we would find that there may be a benefit for people with true joint instability, but maybe not for those merely with pain. Don’t get me wrong-- I don’t know what such a trial will show, but it is entirely plausible that we will find that the procedure is no more than a sham operation for a large portion of customers. It will probably take another 10 years for orthopedic surgeons to muster the courage to do these trials, as it would threaten 30 billion a year in revenue. But such a trial would be a tremendous good for the public. Even if it confirmed benefit it would assure us that hundreds of billions and many complications were not in vain (and give us real $ per QALY figures not guesses). I however do not assume benefit here.
Trials are outdated by the time they are done
NEJM authors argue, “One long-standing, possibly intractable, concern has been the discrepancy between the time frame of RCTs and the fast pace of innovation”
This is another silly argument made by proponents of therapies that have never shown benefit. Of course, advances in radiation or imaging may change over time, but if you can’t show a benefit in an RCT at any point, you really have to wonder.
Instead of whining, prove if an intervention improves the outcome at any time point.
RCTs do not always influence practice
“Even well-conducted RCTs sometimes failed to influence medical practice,” they write. Like COURAGE, or negative arthroscopic surgeries—we still do them! But this is not a limit of RCTs, but a failure to educate doctors on how to interpret and evaluate evidence. It is a failure of payers to push for de-adoption of debunked procedures. It is testimony to how hard it is to put the genie back inside the bottle once it is out, and if anything it should make us more worried about approving, using and paying for therapies that have not shown upfront benefit in RCT.
“RCT results have been accepted as fact but have later proved lacking in external validity.” They don’t give examples, but of course, the sorafenib example I gave is one. But the solution is not to embrace observational research, but to actually conduct PRAGMATIC RCT with broad criteria and few restrictions. In fact, a recent pragmatic registry based RCT (called TASTE) was conducted for just $50 per person) undermining the “it costs too much” complaint.
Pragmatic RCTs probably offer divergent (and stronger) truth claims than observational registry based studies. How can we find out? We have to conduct many and then perform concordance analysis studies.
Selective reporting bias
The authors complain of selective reporting bias for RCTs, “In addition, by the 1990s, it became clear that positive results tended to be published more often than negative results, to the detriment of medical knowledge. “ Sure, yes, but this is why we need 1. Trial registry (got em) 2. Data sharing 3. Mandates to publish. Do the authors think selective reporting of case reports or observational studies was a lesser problem? No, it was worse. In 1982, Sacks noted that on the same question 77% of historically controlled studies thought treatments work, but just 20% of RCTs.
Coronary stents for STABLE ANGINA (I am not talking about plaque rupture people!!)
The article gets some of this right. Rose to prominence based on pathophysiology alone. Patients and doctors still think it saves lives, but the best RCT on the topic COURAGE finds no benefit on MI or mortality. Proponents still think that some subgroups or high-risk groups benefit, but instead of showing this with positive RCTs they continue to harp on “limits” of COURAGE.
This strategy of saying negative data is not good enough is akin to asking someone to prove Santa Clause does not exist. After the person does a census of all people in the world, and finds no Santa Clause, believers note that he didn’t look under the ocean, in all of the deserts, etc. You cannot prove an intervention does not work under ANY circumstances, you must instead show that it can work under SOME circumstances. Proponents of stable coronary stenting have FAILED to do that, and now are sabotaging the ongoing ISCHEMIA trial from enrolling enough patients. After all, ISCHEMIA can only erode their 10-15 billion dollar market share.
“The idea that RCTs would be the only authoritative arbiter to resolve medical disputes has given way to more pragmatic approaches. Experimentalists continue to seek new methods of knowledge production, from meta-analyses to controlled registry studies that can easily include large numbers of diverse patients. “
Uh, meta-analysis is most often pooled RCTs—so this makes no sense. You can’t say we don’t need potatoes because we have French fries—what are fries made of?
Again, registry based RCTs have been pragmatic (like TASTE), but observational studies without randomization remain inadequate to base therapy.
They write before RCTs, “a single physician, drawing on clinical experience, could write an article that might change clinical practice”. But, who gives a shit if it is a single person or many doctors to prove something works. Patients don’t care about individual doctors’ glory, they want treatments that work. Period. This is silly.
“By the 21st century, a single phase 3 RCT could cost $30 million or more” Correct, and in the case of the TASTE RCT it can be just $50 bucks a participant costing $300 grand total for a several thousand person trial. So, get creative, and find ways to lower the price. If we had the same attitude about gene sequencing as we do RCTs (just give up and say it is too expensive rather than lower the price), then well, we may not have so much fools errands precision medicine. Ha ha kidding kidding.
“Furthermore, in part because of high trial costs, researchers and their funders have had substantial interests in achieving positive trial results”
Wrong, companies have interest in positive results not because the trial is costly, but because they can make a billion dollars off of a marginal drug if they squeeze out a p of 0.047. Whether the trial is cheap or costly, they still stand to make a fortune and will distort, pervert, and bias trial design in so far as they can. The cost of the trial is not the issue, the gain from success is. This statement is illogical.
They write “Economist Angus Deaton, for example, argues that RCTs “cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as ‘hard’ while other methods are ‘soft.’”” This is called appeal to authority—it is an empty persuasive tactic. I could not give two shits what Angus Deaton thinks about patient care until he becomes my intern and see if he knows anything about medicine.
When you really don’t need RCTs
There are times when you don’t need RCTs, but I won’t rewrite the last chapter of our book. We discuss them there. Ending Medical Reversal. These are truly situations where even the most ardent proponents of EBM will agree we ought not be fundamentalists. I am not an RCT absolutist, but I do think the present article does a disservice to RCTs.
If you want to tell a person to do something, you better have an RCT showing that practice actually does what you think it does otherwise you are treading on thin ice.
The rumors of the demise of RCTs are greatly overstated. In 2016, they are needed, now more than ever. The limitations of RCTs: cost, external validity, and straw man comparators are not solved by idolatry of lesser evidence. They are solved by developing registry based RCTs (which lower cost), by designing pragmatic studies, and by setting a rigorous trials agenda by non-conflicted people.
“RCTs are now just a part — though perhaps the most critical part — of a broad arsenal of investigative tools used to adjudicate efficacy and regulate the therapeutic marketplace”
No that is wrong. They are the most important part, and efforts to reduce their importance will lead to a deluge of unreliable evidence.
Over the last 15 years, in addition to other failures, and flawed leadership, the NEJM has failed to lead in evidence-based medicine. JAMA, JAMA IM, BMJ, PLOS Medicine, the Lancet, and the Annals have published better articles, more provocative articles on EBM and meta-research. As anecdote, John Ioannidis has now published over 800 articles, but only a stray letter to the editor ever landed in NEJM. Here the NEJM continues to further its agenda to replace better evidence with lesser evidence. History will show that doctors, the public, but most importantly-- patients will be the worse off because of it.
The solution to a bad RCT is a better RCT, not no RCT. But history will show that the RCT is the backbone of reliable evidence, and NEJM will lose this fight only because they are wrong.
The NEJM’s timid editorials on data sharing consist mostly of self serving, inconsistent arguments
A Full Throated Defense of Trial Data Sharing WITHOUT GATEKEEPERS
The NEJM has published 4 related editorials on data sharing. Despite some occasional flashes of reason, most of the ideas presented are timid and some are outright restrictive; at least 2 rely on fear mongering regarding the threat of incorrect analyses (to preserve status quo or endorse a weak solution), and the NEJM appears committed to publishing both sides of the story: the “we better not change too fast”, and the begrudging “change is good, but lets make it painfully incremental”. Of course, a full-throated defense of data sharing without a gatekeeper is absent. Let me make that case.
First, what are we talking about? We are talking about TRIAL data sharing. TRIAL data. There are many types of data, but some data pertain to the conduct of clinical trials. Trials are experiments where human being subject themselves to treatments in the hopes of better outcomes for themselves (sometimes), and improving treatments for patients like them in the future (always).
With this in mind, there are 2 motivations for the current ICMJE proposal and other calls for data sharing from clinical trials.
The two reasons for DATA SHARING of TRIAL DATA
First, there is the practical motivation. Many clinical trials have inadequate reporting of harms, huge delays in reporting secondary findings, and distortion and spin (even error and fraud) surrounding the primary analysis. Data sharing has potential to expand the number of scientifically valid findings from these trials, to explore ideas that the primary authors could not even imagine, and to correct & clarify the scientific record.
Second there is the ethical motivation. Patients consent to these trials with the implicit and often explicit idea that their contribution will be used maximally to 'improve outcomes for patients like them' in the future; develop better therapies, and help scientists. How many patients would consent to a trial if the researchers disclosed that a main purpose is to allow them to create a database so they can milk secondary publications for the rest of their miserable, self serving careers?
I say miserable in part as a joke, but also because of the reality that many secondary publications by primary investigators deal with trivialities and appear in very low tier journals. Some journals are so bad even mothers would not allow their reprints on refrigerators-- forget about readers and influence.
In other words, there is a tension between a researchers desire to hog or hoard data for their career glory, and patients and society's desire to maximally leverage the data for the greater good.
In these articles, some authors worry that data sharing may threaten their way of life that has existed nicely by hoarding data—or as they call it “remove the incentive” to do trials. This is perhaps the most dishonest, self serving argument I have heard. Here is why
Why Data Sharing “removing the incentive” to do trials is a 99% BS argument.
Imagine if a contractor for a federal bridge says after finishing the bridge, which was paid for by the US government, that only the contractor can use the bridge for 2 years before others are allowed to use the bridge---because that is part of the incentive to build it!! You would say, but you were paid to build it, that was the incentive! You would also follow that with a string of expletives.
Same is true here. Researchers are paid by the NIH to run the trialà that is the incentive. You got salary support, so you could do this work, and not see patients. You may have gotten promoted, used it to leverage a nice, new job.
In other words, you are being paid to do you job, have a nice career, and that is your compensation. Hoarding data for years to milk secondary publications can no longer be a part of the deal. Sorry it ever was, but that was a silly precedent.
Just like you got paid to build that public bridge, and don't get 2 years to sit on top all by yourself thinking about all the cars you intend on letting through, if only you had time. **But, and here is the big but, we will still give you the first drive across the bridge. You still get the BIG PRIMARY paper.
But I am a realist, and maybe some investigators were also counting on having 2 years of their own private bridge, so I am willing to play ball. Do you want more money to build bridges? We can negotiate; eventually you will agree. But we can't negotiate your exclusive license to sit on the bridge. Especially if we look around and see most bridges are just rotting and not allowing trafficà and if bridges were built by public volunteer laborers. So if you want more salary support in NIH trial funding, that's ok, but you don't get the data.
With data sharing some may object, well we also DESIGNED the bridge. But, many actual bridges are contracted and have architects design it, so I see no difference. NIH funded trials are public bridges.
It is really just the culture of medicine, which is arbitrary and a historical accident, where researchers "own" their data. It is time to end that culture. Patients own the data. We just help build the bridge, and we get paid for that. Let the bridge be used widely.
Now let me take issue with the misuse argument.
These articles keep complaining that poor scientists will misuse the data. This is a silly, unfounded, and frankly fear mongering. Mostly because these same authors are not saying anything as they conduct secondary analyses (potentially wrong) on datasets that are already widely available.
These hypocrites are silent about public data sets that you can just download in 2016. Is anyone misusing genomic data? SEER data? What about MEDICARE data you can just buy? What about Internet data? Census data? -- Maybe there is some misuse, but no one argues these require extra gatekeepers. The same people who say trial data should have a gatekeeper who reviews the analysis plan, don’t say the same for their Medicare analysis. Both MEDICARE and trial data can be used to cast aspersion on a marginal cancer drug, for instance. So why a gatekeeper for one, but not the other? There is no real distinction here.
Also consider that when wikipedia first debuted, many argued that it would lead to widespread misinformation—You can’t have average people curate an encyclopedia!!-- but time has shown that the truth wins out, and wikipedia is pretty accurate.
Republicans current say crime is up (its not) and jobs are down (they are not). Sharing trial data seems neither necessary or sufficient for erroneous claims or conclusions. Science is won by convincing other scientists, so trial data sharing is UNLIKELY to lead to widespread false beliefs if the analyses are truly wrong. Or at least no more likely than any other data that is shared. I have a good example coming.
Extra Gatekeepers are not necessary for clinical trial data, run the risk of perpetuating the harms of the current system we seek to remedy, and in time I am sure they will be removed. History moves towards progress, so there is no way EXTRA gatekeepers will last. I say extra because we already have many, many gatekeepers.
Current gatekeepers are journal editors and peer reviewers, and these have been thought sufficient for people who 'create' data, and have been sufficient for thousands of papers using publicly available data, yet are somehow inadequate for people using trial data?-- such a bizarre distinction. Why don't people submit requests before data mining Medicare or SEER data? They don't. They just purchase the data, or download it for free (respectively), and last I checked, there are some papers that are disputed, but that is part of the scientific process.
Here is an example:
Recently shared data was used to make a misleading claim about prostate cancer, and it was publicized, but then the NY Times and Otis Brawley ripped them apart (http://www.nytimes.com/2016/07/21/health/advanced-prostate-cancer-false-alarm.html). Science is use to dealing with people who make claims while failing to understand how to conduct proper analyses (they are called nutritional epidemiologists haha kidding, not really). At other times, we may do a bad job at this (considered vaccines/ autism).
But the key point is that in either case the problem exists whether or not you allow trial data sharing. We already deal with this well or poorly, depending on how you see it.
Here is a bold idea:
Post de-identified data sets from clinical trials on the web when the primary article is posted.
AKA true data sharing.
At least as an option, we must experiment with true data sharing: posting de-identified data alongside the primary article. Anyone with excel or STATA or SAS can run any analysis they wish, and we will rely on all the usual gatekeepers (journal editors/ reviewers) to police claims.
You may say-- what a crazy idea!-- but many articles in PLOS one, right now, post the data set of the paper. And there is no deluge of dubious re-analysis. The top econ journal-- the American Economic Review posts the data set (https://www.aeaweb.org/journals/policies/data-availability-policy) for many papers, but there is not widespread re-writing of claims---though I found this policy helpful as I work on re-analyzing an AER paper.
I don't know the best way to data share, but at least some proposal or some pilot program should be TRUE data sharing. I would support a variety of pilot programs to see what works best. But all of these authors EXCLUDE this at the outset. That is unscientific and timid.
Why is the NEJM afraid of data sharing?
Lets talk about the elephant in the room, why is the NEJM scared of data sharing? True data sharing may result in many of NEJM's randomized trials (especially industry sponsored trials) being completely dismantled in re-analysis. NEJM papers have broad influence, support regulatory approval, and will the first ones people want to look under the hood. The NEJM is right to be worried that entire issues may collapse like a house of cards.
If that happens it will be good for patients—who want to know the truth about their treatments—and bad for academics and the NEJM who would prefer to be unquestioned.
In these articles, the persistent fear mongering that data sharing will lead to false analyses strikes me as disingenuous, hypocritical and self serving. As with any data—even those generated by investigators-- sometimes a bad analysis is published, sometimes the scientific record works to correct itself, or at other times it doesn’t, but C'est la vie. In all cases, having easier access to data is no more likely for this to a problem, or if it is, then we should hold the same standard to MEDICARE data, and investigators can get in line to submit their proposals to a central committee who can review them. The only end result will be fewer MEDICARE papers not better ones, I suspect.
It is time to dry the crocodile tears about TRIAL data sharing. The reasons to oppose it are BS. If you want to experiment with ways on how to do it, great, but at least some proposals should experiment with TRUE, UNFETTERED data sharing. I suspect the fears are vastly overblown, and simply posting the data alongside the paper will prevail as the easiest, most ethical, most logical, and most informative practice.
I am 100% confident than in 20 years posting the de identified data alongside the published paper WILL BE THE NORM. No gatekeeper, and the sky won’t fall. Once again, the NEJM and these authors are largely on the wrong side of history.
The FDA generally errs on the side of approving drugs: Why the analysis by Montazerhodjat and Lo gets it wrong.
A recent paper by Montazerhodjat and Lo is called “Is the FDA Too Conservative or Too Aggressive?: A Bayesian Decision Analysis of Clinical Trial Design.” In the paper, the authors argue that the FDA’s bar for approval is sometimes too high, and sometimes too low. Unfortunately, the paper contains so many errors regarding cancer drug development that it cannot be used to say anything useful about the present FDA approval standards. Besides saying something entirely obvious in principle, it says nothing of practical value.
First, let's get very clear what Montazerhodjat and Lo set out to do. By taking into account the lethality of a disease, the authors believe the FDA should vary their threshold of statistical significance (specifically alpha error) in pivotal trials. Why does it have to be p<0.05 (or p<0.025 for a one sided test) for everything? they argue. For diseases with high lethality, Lo thinks we should accept large type 1 error, and smaller ones for indolent conditions.
Here is what Lo says: “So do you really want to be as stringent in those cases where patients are going to die anyway? You'd take a bigger chance of making a mistake.”  In another article Lo expands, “Imagine if I had pancreatic cancer,” Lo said. “I’m willing to take a 1 in 4 chance the drug you give me is not going to work. Because the alternative is: I’m dead.”  And from their paper, "one of the design principles called for by  is less stringent statistical significance levels to be employed in efficacy trials for drugs targeting life-threatening diseases and/or rare conditions. Our BDA framework provides an explicit quantitative method for implementing this principle."
Then on the flip side, the authors contend that for other cancers the bar is too low. Consider prostate cancer. Here is how Lo’s analysis would treat it; nicely summarized on fivethirtyeight:
“On the other hand, the FDA’s 2.5 percent threshold is too high, according to this metric, for trials of drugs that treat less severe diseases. Take prostate cancer: Lo’s method says that the FDA’s standard leads to the approval of too many ineffective drugs for treating it and that a false positive rate of 1.2 percent should be used instead.” 
The fivethirtyeight website has a great graphic to explain the central thesis: that as the severity of a disease rises, we should be willing to accept drugs with a higher false positive rate. I.e. A higher chance the approved drug truly doesn’t work.
In principle, this argument is reasonable— Although the gold standard would be asking whether changing our false positive rate threshold leads to approvals that better patient outcomes on average over alternative schemes (this is an empirical question). But, the fact is that so many things are wrong with the Lo analysis that it provides no useful information for modern regulators. It is not worth reading by anyone at FDA. Here is why:
What does it mean for a cancer drug to work. And, how does the FDA actually approve drugs.
Cancer drugs work if they improve survival or quality of life. Period. The FDA approves some drugs based on pivotal trials that test these endpoints, but many more (about 2/3) of drug approvals are based on surrogate endpoints. Surrogates stand in for the endpoints that matter.
We also know that surrogates sometimes get it wrong. Bevacizumab was one drug that improved progression free survival (by many months) in metastatic breast cancer, with a p value of <0.001, though at the time of approval its effects on overall survival were unknown (p value of 0.16) .
Side note: So, here is exactly an example of the FDA approving a drug based on a type 1 error higher than 0.025 for a highly lethal condition, yet— Lo does not appear aware that such approvals occur.
Back to the example: Then, just a few years later, multiple studies showed that bevacizumab did not improve overall survival, added toxicity, and the drug was withdrawn from market. It didn't work.
Bevacizumab was a great example of the FDA approving a drug with a higher false positive rate than 2.5% for a lethal disease. Since most approvals are based on surrogates, the FDA is approving drugs for diseases based on higher false positive rates ALL THE TIME. That is most of what they do.
So, if we approve 100 cancer drugs based on progression free survival, and then later test them rigorously for overall survival— how many later meet this mark? The answer is— we don’t know, because follow up studies are often not done.
Montazerhodjat and Lo talk about statistical significance in trials as if it is the largest driver of real world uncertainty, but it is not— the choice of endpoints and controls and patients are. Ioannidis famously recognized that the false positive rate of research finding is often a lot bigger than its p-value 
Cancer drug trials aren’t representative of real world patients
Pivotal cancer drug trials include patients much younger than average cancer patients . We know that the effect of drugs on these patients is often not replicated in real world settings. Since the FDA looks at trials of younger patients— the real world false positive rate (approving a drug that in reality has no benefit) is again higher than any statistical test can tell you.
We don’t approve drugs for "prostate cancer", we approve drugs for INDICATIONS
The analysis by Montazerhodjat and Lo is used to make a specific statement about the threshold for approval in “prostate cancer.” But we don’t approve a single a drug for all of prostate cancer. There is early prostate cancer, which is treated with active surveillance (often the best choice), prostatectomy, or radiation. There is biochemical recurrent prostate cancer, where some drugs are used. There is metastatic castrate sensitive prostate CA, and castrate resistant PCA. There is pre-chemo and post-chemo prostate cancer. There is symptomatic and asymptomatic metastatic castrate resistant prostate cancer.—— The point is that the survival of a group of patients with prostate cancer has to do with the details of where they are, and what point they are dealing with their disease. Cancer drugs are approved for only a portion of the process.
So, when Montazerhodjat and Lo look at aggregate statistics as how all patients with prostate cancer do— they are mostly looking at the outcomes of early stage prostate cancer, who do well, and aren’t getting drugs; while most approvals are for late stage prostate cancer, which can be a devastating and lethal disease. So their basic conclusion about the FDA’s threshold for prostate cancer makes no sense, because the FDA does not — nor do any cancer practitioners— treat all “prostate cancer” the same.
The FDA is already flexible.
The FDA’s standard for approval is already very flexible , and approvals are often granted in trials without control arms— what is the type 1 error there? often granted based on surrogates— what does that do to the false positive rate?
Name a drug that should be approved and one that shouldn’t
The value of any model, such as this, is to make a claim that our current system should have done something differently. Many drugs are halted in the drug development pipeline (in fact, most are) . These failures are well known, and these drugs are well known to drug developers. I suggest that Montazerhodjat and Lo name a single drug that should have been approved, but wasn’t, or vice-versa. (I personally can name some that I think shouldn’t be approved—because, as I’ve hinted, the real world permitted false positive rate of approval is much much higher than any statistical test can tell us). But let's see what they say.
Andrew Flowers asks whether this is akin to the stem cell research ban, “how is this not like asking "name a major breakthrough in stem cell research under the ban” (twitter). The difference is that unlike stem cell research prohibitions where research does not get done, thousands of compounds have been made over the last decade  that are not on the market. Many of these have entered clinics in trials that I and others have been apart of—so their names are known. Some have shown preliminary measures of benefit, but failed to meet the mark in subsequent trials. Surely among these thousands, there is at least 1 drug for a lethal disease that Montazerhodjat and Lo believe exceeds the FDA’s threshold, but falls within theirs.
I will help them start even. Liver cancer is high on their list (saying we should accept a high false positive rate). Indeed advanced or metastatic hepatocellular carcinoma is a deadly disease, and there have been dozens of published randomized trials in the last decade that were unsuccessful. Surely one of these has a p-value that Montazerhodjat and Lo would accept— can they point out which of these drugs they want on the market?
If there is— tell us, so we can chat about it. And if their isn’t— the model has no value as it makes no different predictions than what we are presently doing. We don’t need a new model that tells us to fill up the gas tank when it is low, and not when it is high.
The analysis of Montazerhodjat and Lo does not account for the fact that many approved drugs are approved in trials without control arms, most are based on surrogate endpoints , or nearly all are conducted in unrepresentative populations. What this does to the real world false positive rate of improving survival or quality of life is unknown. Their analysis lumps together early and late stage prostate cancer to say something about where the bar for prostate cancer approval should be— but we approve drugs for specific indications, not blanket approvals for a cancer. This isn’t statins for Christ-sake (just kidding).
And, the biggest conclusion is that conducting such 30,000 foot view economic analyses that ignore all the realities of cancer drug approvals is simply silly— a pointless exercise. Another example of a great model with no useful predictions. The fact that this can get press coverage and shape a debate is simply disappointing.