Saturday, 11 July 2015

Question Time

I don't watch Question Time, because decades ago when I did I used to find myself shouting at the television.  But last night, driving down the A1, I listened to it on the radio (and when at one point I turned it off, my daughter said she was interested in it so I switched it on again).  You can watch it here (but you ought to have something better to do).

The panel was Conservative minister Anna Soubry, Labour shadow minister Chuka Umunna, SNP MP Tommy Sheppard, UKIP MEP Louise Bours, and journalist Rachel Johnson, with David Dimbleby as presenter.

The politicians seem better than I remember at seeming likeable, but the level of debate is no higher.  The program started with a discussion of the effects of the "living wage" and tax credit measures in this week's budget.  Umunna started by pointing out that Cameron promised, a few months ago on the same programme, not to cut tax credits.  Soubry flatly denied it.  Of course she was wrong, and Dimbleby might have said so, since it was he who put the question to Cameron, but either he'd forgotten or he chickened out.  None of the panel talked about the effect of the new minimum wage structure on employers and the patterns of employment they will now favour, or the effective marginal tax rate of 78% imposed on some claimants by the sharper taper of tax credits (it's OK to examine these things from the left).

Later, Bours and Johnson gave us their analysis of the Greek referendum.  The proposed package, Johnson informed us, would be "a bailout for the bankers, again".   "You're right, absolutely", Bours confirmed.  Well, the 2012 package sorted out the debts to commercial banks, in return for their accepting a write-down of 53.5% of face value, together with reduced coupons and extended maturities.  So that was a fraction of a bailout.  This time the debts are owed almost entirely to the Eurozone and the IMF, so if a bailout benefits the creditors, those creditors are the people of the contributing countries.  A member of the audience wanted to "tax the bankers" to pay for it, and Sheppard suggested specifically a levy on bailed-out banks: he didn't say if he would exempt Greek banks from that.   Johnson and Bours agreed that Greece would have no great problems reverting to the Drachma (except, I say, that they'd have nothing to pay for imports with in the months it took for the new currency to find a foreign exchange level).  Umanna then observed that "it's not as simple as...bankers against the Greek people" and he and Soubry spoke sensibly but vaguely.

Then there was a discussion of student grants and loans, the lowlight of which was Bours arguing with a nursing student in the audience about the content of her university course - not what the content should be, but what it is.  Bours then repeatedly disparaged a degree in "David Beckham studies" - a horror largely of her own imagining.

At the end the panel discussed the next leader of the Labour Party, most of them expressing their disappointment that Umunna is no longer standing.  Bours put her hand on his shoulder "we couldn't wait to see Chuka sitting in a working men's club drinking a pint of stout".

Two years ago I wrote a post in response to Nigel Farage's question about his party's potential MPs "could we be worse?".  Having listened to this programme, and watched the ending, I can add that, criminality and defections aside, the answer is "yes you are".

Sunday, 28 June 2015

The end of the lifeline

Three years or so ago I wrote several posts about the financial crises in Greece, in which my view of the bail-out was that "it can't work and it won't work".  Well, it couldn't, but I was wrong to think that "this could break down very quickly."   In fact, the Eurozone did an outstanding job of kicking the can down the road, keeping Greece afloat for three years without ever coming close to fixing anything.

The difficulty with this strategy was that the electorate in Greece hated the austerity imposed on it, and the electorates in the rest of Europe hated paying for Greek pensions.  The beginning of the end was the election in Greece early this year of the anti-austerity Syriza party, which was committed not to accept the inevitable terms of the next bail-out package.  Meanwhile, the rest of the Eurozone had successfully reduced the vulnerability of its banks and debtor nations to a Greek default.

The recent bail-out negotiations between the Eurozone finance ministers and the Greek government (represented by prime minister Alexis Tsipras and finance minister Yanis Varoufakis) have been a game of chicken.  Both sides know that a Greek default would be horrible for both of them, but the negotiations have failed because the Eurozone isn't quite scared enough of the consequences of default, and Tsipras is scared enough of the consequences from his party and his electorate of giving in to Eurozone demands.  Tsipras has now fallen back on holding a referendum next Sunday to find out if the Greeks want him to give up the policies they elected him on, which would be admirably democratic were it not for the fact that the government needs to pay €1.5bn to the IMF on Tuesday, and apparently it hasn't got it.

Oh well, not paying the IMF is embarrassing — Greece is richer than most IMF members — and could be treated as a default by the Eurozone (page 33 here.)  But neither the IMF nor the EU need rush to do anything about it.  So in practice a Yes vote in the referendum could still result in a bail-out which enables Greece to meet its interest payments.

But there are more immediate problems.  First Robert Peston for the BBC reported that "the European Central Bank's governing council is expected to turn off Emergency Liquidity Assistance (ELA) for Greek banks at its meeting later today", and later the ECB issued a statement, saying that it would "maintain the ceiling to the provision of emergency liquidity assistance (ELA) to Greek banks at the level decided on Friday (26 June 2015)".  Which is a gentle way of saying that it would not increase the ceiling — it would allow no new ELA.

ELA is necessary because there's a run on the Greek banks, whose depositors reasonably fear that the banks may run out of cash or the government may redenominate deposits into a new, rapidly depreciating currency.  Withdrawals take two forms: cash or transfers to banks outside Greece, and both need bank reserves at the Bank of Greece, which the Greek banks haven't got, either to buy banknotes or to fund interbank transfers via the Eurozone's TARGET2 system.

However, contrary to many reports, ELA is not money supplied to Greece by the ECB.  It's provided by the Bank of Greece; all the ECB does is to authorize it.  And it doesn't cost the Bank of Greece anything either — the money it lends the banks is just an accounting entry, the banknotes it can print in exchange for owing money to the ECB, and TARGET2 is the same without the printing.  The banks borrowing the money have to post collateral for the money they're borrowing, because it's suppose to help banks which are illiquid but not insolvent.  I suppose the Greek banks are in fact solvent so long as their billions of Greek government T-bills are treated as sound (which is absurd, but the ECB hasn't said so up to now).

So what ELA means in practice is the Bank of Greece incurring increasing 'Eurosystem' debts to the ECB.  But Eurosystem debts are not like ordinary debts: there is no requirement for them ever to be settled, and they carry currently a very low interest rate — I haven't found a clear statement from the ECB on interest charges, but this helpful paper says they pay the Main Refinancing Operation rate currently 0.05%, whereas this press release says the deposit facility rate, currently -0.2%, is applicable to TARGET2 balances (presumably only if positive), and this report quotes an ECB spokesman denying that.  (Frances Coppola says the balance isn't even a debt: I don't quite agree, but the difference between her view and mine seems not very important in practice.)

I wonder what would happen if Greece chose simply to continue with ELA, without the ECB's permission.  That would be a gross breach of the rules, but when you're about to default, you might live with that.  What would the ECB do?  I suppose it would suspect Greek access to TARGET2, but it's not clear to me what it could do beyond that, other than promise future non-cooperation.

If it's not willing to defy the ECB, Greece can keep its banks afloat only by redenominating their liabilities into a new currency - Grexit.  I suppose the government won't take that step before the referendum, so there would have to be severe limits on withdrawals and transfers, or simply bank closures, for the next week.

The other problem is that the Greek government has salaries and pensions to pay.  Varoufakis said a month ago that he'd rather pay the pensioners than the IMF, and my guess is that they've got the money somewhere for this month.  If I'm wrong, they're faced with the same two choices: raise money by selling the banks T-bills paid for with money created and loaned by the Bank of Greece, all in defiance of the ECB, or pay the pensions in a new currency.

The Euro has been a remarkable experiment - a currency shared by dissimilar countries with independent governments, and run by independent national banks.  I confess to being intrigued by the way in which it's in part unravelling.  If only there weren't more at stake than entertaining bloggers...

Tuesday, 6 January 2015

Two-thirds of cancers - collected links

News sites getting the meaning of "two thirds of cases" wrong: Independent ,Telegraph, Mail, Express, Mirror, Huffington Post
News site getting it wrong in the headline but right in the text without one having to scroll down: Reuters.
News sites getting it right: BBC, Guardian.

The press release.


The Science abstract, with paywalled link to the paper
Free preview of the paper
Supplement on the data and methodology


Long critical review of the paper and its reporting: David Gorski
Discussion of the reporting: Andrew Maynard, Science-Presse (in French)
Criticism of the interpretation of correlation: Guardian, statsguy, Antonio Rinaldi (in Italian), with his own model, me, with a toy model
Criticism of the correlation calculation: StatsChat
Criticism of the clustering methodology: Understanding Uncertainty (with discussion of the reporting), statsguy, me, with discussion of the methodology generally
Criticism of the message: Cancer Research UK (with discussion of the reporting and the paper)
Expressing doubts about the accuracy of the data: Paul Knoepfler

A few comments on the paper: Science

Support for the paper: Steven Novella
Support for the message: PZ Myers, expressing disdain for those reluctant to accept the role of random chance

Monday, 5 January 2015

Cancer risk - an analysis

My previous post discussed this paper, and its claim that two thirds of cancer types are largely unaffected by environmental or hereditary carcinogenic factors.  While I'm unimpressed by the paper, the idea behind it is interesting, so here's my analysis of its data.

The hypothesis is that "many genomic changes occur simply by chance during DNA replication rather than as a result of carcinogenic factors.  Since the endogenous mutation rate of all human cell types appears to be nearly identical, this concept predicts that there should be a strong, quantitative correlation between the lifetime number of divisions among a particular class of cells within each organ (stem cells) and the lifetime risk of cancer arising in that organ."

So let's suppose that each stem cell division gives rise to cancer with a small probability p.  Then if there are n lifetime divisions, the probability that none of them leads to cancer is (1-p)n, so the lifetime risk of cancer, R, is 1 - (1-p)n.  We can rearrange that to find an expression for p, ln(1-p) = ln(1-R)/n.  For very small p, ln(1-p) = -p, so p = -ln(1-R)/n.  If we plot ln(1-R) against n we should expect to find that for all the organs where carcinogenic factors are absent the values fall on the same straight line through the origin.

However, the values of n range through several orders of magnitude, so we can't create this plot unless we're willing to make all the rare cancers invisibly close to the origin.  Instead, let's take logs again, giving log(p) = log(-ln(1-R)) - log(n).  So on graph of log(-ln(1-R)) against log(n), all the cancers satisfying our hypothesis should fall on a straight line with slope one crossing the y axis at log(p).  (I've switched to base-10 logarithms for this step, to make the powers of ten easier to follow)

Here's the graph, which looks not unlike the one in the paper.  The correlation between the x and y data series is 0.787, again not unlike in the paper.  But the slope of a line through the points is not unity, nor is there a subset of points at the bottom of the envelope of points for which the slope is unity.

(I've arbitrarily given FAP colorectal a cancer risk of one millionth less than one, because the method doesn't allow a risk of exactly one.  Its point could be moved vertically by choosing a different number.)

To explore further how well the data fit the model, I've backed out implied values of p for each cancer type.

Here's the problem.  If the data matched the theory, there would be a group of cancer types at the left end of the chart with similar implied probabilities.  It seems in particular that the risk of small-intestine adenocarcinoma is anomalously low.

[A commentator points out that there is a group of cancer types near the left end of the chart which do have similar implied probabilities (the same eight cancers lie roughly in a straight line in the scatter plot).  But the theory in the paper is that there's a background rate of cancer in any tissue type, depending only on the number of stem cell divisions, because "the endogenous mutation rate of all human cell types appears to be nearly identical".  This theory can't be casually modified to allow for a background rate of cancer in all tissue types except for in the small intestine.  (Oncologists are of course aware that small-bowel cancers are strangely rare.)]

Let's try an alternative theory: that for every tissue type, some fraction of stem cell divisions, call it α, are affected by environmental or heriditary influences in a way which gives them a probability, call it q, of causing cancer.  q is the same for all tissue types.  The remaining divisions carry negligible risk by comparison.  Somewhat arbitrarily, we'll assume α is one for the cancer with the highest implied probability in our previous analysis: that is, q is equal to the p implied for Gallbladder non-papillary adenocarcinoma.  We can now back out a value of α for each cancer.

Well, it's a simplistic theory, but it does have the advantage over our previous model that it fits the data.

It seems to me that picking out gallbladder cancer as high-alpha is a plus for this model, because that cancer has a peculiar geographic spread which can only be due to environmental or hereditary factors.

And I've been mischievous.  In this theory, despite the correlation in the input data between stem cell divisions and cancer risk, every cancer is caused by environmental or hereditary factors.

Saturday, 3 January 2015

Science by press release

Yesterday's Times has a front page story "Two thirds of cancer cases are the result of bad luck rather than poor lifestyle choices...". (paywall)

That doesn't match my preconceptions, so I looked for the story online.    The Independent and the Telegraph agree.  So does the Mail.  And the Express. And the Mirror.

Reuters' headline agrees, but its story suggests something a bit different - that two thirds of an abitrary selection of cancer types occur mainly at random.

The BBC speaks unambiguously of "most cancer types" and so does The Guardian.

The press release which must have given rise to this story features the phrase "two thirds of adult cancer incidence across tissues can be explained primarily by 'bad luck'".  I can't make much sense of "cancer incidence across tissues", so I can't blame the journalists for stumbling over it likewise.  But the reporters who got the story right must have managed to scan down to the paragraph where the press release explains that the researchers "found that 22 cancer types could be largely explained by the “bad luck” factor of random DNA mutations during cell division. The other nine cancer types had incidences higher than predicted by "bad luck" and were presumably due to a combination of bad luck plus environmental or inherited factors."

I emphasize that "two thirds of cancer types" is not at all the same as "two thirds of cancer cases".  Two rare cancers apparently unrelated to environmental factors will count for far fewer cases than one common cancer in the other category.

So what of the paper behind the press release?  Here's the abstract, with a paywalled link to the whole paper.  Or you can 'preview' the paper for free here, to the extent your conscience permits.  Supplementary data and methodology descriptions are here.

The hypothesis behind the paper is that cancer is to a large extent caused by errors arising during stem cell division, at a rate which is independent of the tissue type involved.  The researchers therefore obtain estimates of the lifetime number of stem cell divisions various tissue types, and plot that against lifetime cancer incidence, obtaining a significant-looking scatter plot (Figure 1 in the published paper).  So far so good.

But they've used a log-log plot, necessary to cover the orders of magnitude variations in the data.  Now, if you think, as the researchers apparently do, that cancer risk is proportional to number of stem cell divisions, it follows that the slope of a log-log plot should be unity.  It isn't, by eye it's more like two thirds.  The researchers, busy calculating a linear correlation between the log values seem not to have noticed this surprising result.  Instead they square the correlation to get an R2 of 65%, which may (it's not clear) be the source of the "two-thirds of cancer types" claim.

If so, that claim is based on a total failure of comprehension of what correlation means.  Imagine a hypothetical world in which cancer occurs during stem cell division with some significant probability only if a given environmental factor is present, and that environmental factor is present equally in all tissue types.  In this world cancer incidence across tissue types is perfectly correlated with the number of stem cell divisions, but nevertheless all cancer is caused by the environmental factor.

It's simply impossible to say anything about the importance of environmental factors in a statistical analysis without including those factors as an input to the analysis.

However, the press release also features the paragraph I quoted about 22 out of 31 cancer types being largely explained by bad luck.  Perhaps that's what they mean by two thirds.  To get this number, they devised an Extra Risk Score - ERS for short.  Then they used AI methods to divide cancer types into two types based on the ERS values.  So what's the ERS?  The Supplement describes it as "the (negative value of the) area of the rectangle formed in the upper-left quadrant of Fig. 1 by the two coordinates (in logarithmic scale) of a data point as its sides." That is, it's the product of the {base-10 logarithm of stem cell divisions} and the {base-10 logarithm of lifetime cancer risk}.   (The cancer risk logarithm is negative (or zero) since lifetime risk is less than (or equal to) one.)

Shorn of the detail, it's the product of two logarithms.  How does that make sense?  Multiplying two logarithms is bizarre; for all ordinary purposes you're supposed to add them.  For this analyis, a simple measure would seem to be the ratio of lifetime incidence to stem cell divisions, or you might prefer the log of that ratio, which would be the log of the incidence minus the log of the stem cell divisions.

(On further reflection, the number I'd use would be {log(1-incidence)/divisions}.  That doesn't give a defined answer for lifetime incidence of unity, but you can get a number by using an incidence of just less than unity.  Among the other cancer types, it picks out gallbladder cancer as having the highest environmental or heriditary risk, which is consistent with that cancer's unusual geographical variation of incidence.)

The Supplement attempts to justify multiplying the logarithms by explaining why dividing them woudn't make sense.  Which is a bit like advocating playing football in ballet shoes because it would be foolish to wear stilettos.

Whatever ERS calculation you used, the clustering method would still divide the cancers into two groups, because that's what clustering methods do, but different calculations would put different cancer types in the high-ERS group.  If you want, as the senior author does, to draw conclusions from composition of the high-ERS cluster, you need a sound justification for your ERS calculation.


To its credit, The Guardian has published a piece pointing out the correlation misunderstanding.  This piece is also highly unimpressed by the paper, and this review of it has mixed feelings.

Me, I suppose the underlying idea has some truth in it.  But the methodology is the worst I've ever seen in a prominently published paper.

Update: more commentary from Understanding Uncertainty and StatsGuy 

Update: Bradley J Fikes, author of this piece in the San Diego Union-Tribute, complains in comments here that the title of this post is ill-chosen.  He points out that he didn't write his story simply from the press release, but checked it with John Hopkins before it was published.  He's got a point about the title: more than half of what I say here is criticism of the paper not of the press release.