Monday, December 1, 2014

Planck at Ferrara

There is a conference starting today in Ferrara on the final results from Planck.

Though actually these won't be the final results from Planck, since although all scientists in the Planck team have been scrambling like mad to prepare for this date, they haven't been able to get all their results ready for presentation yet. So the actual release of most of the data and the scientific papers is scheduled for later this month. December 22nd, in fact — for European scientists, almost the last working day of the year (Americans tend to have some conferences between Christmas and New Year) — so at least we will technically have the results in 2014.

Except even that isn't really it, because the actual Planck likelihood code will only be released in January 2015. Or at least, I'm pretty sure that's what the Planck website used to say: now it doesn't mention the likelihood code by name, referring instead to "a few of the derived products."

If you're confused, well, so am I. The likelihood code is one of the most important Planck products for anyone planning to actually use Planck data for their own research — to do so properly normally means re-running fits to the data for your favourite model, which means you need the likelihood code. (Of course, some people do take the short cut of simply quoting Planck constraints on parameters derived in other contexts, and this is not always wrong.) This means that having the final, correct version likelihood code is rather important even for Planck scientists themselves to be completely confident in the results they are presenting. So it would make more sense to me if the likelihood code were released at the same time as the rest of the data. Perhaps that is what is actually going to happen, I suppose we'll find out soon.

Incidentally, my information is that the "final, correct" version of the likelihood code was distributed for internal use within the Planck collaboration about 4 weeks ago or so. Considering that it is only after this happens that proper model comparison projects can begin, that obtaining parameter constraints for each model can take a surprisingly large amount of computing time, that the various Planck teams responsible for this step had scores of different models to investigate, that the "final, correct" version may well have undergone a subsequent revision, and that the process of drafting each paper at the end of the analysis must itself take a couple of weeks minimum ... I suppose I'm not very surprised that the date for data release has been pushed back.

There's some uncertainty about whether the videos from the conference will be made available, as a statement on the website saying this would happen has been removed. For those interested here is a Youtube channel purporting to provide video from the conference, but disappointingly it doesn't appear to actually work. 

Wednesday, November 26, 2014

Quasar structures: a postscript

A few days ago I discussed the purported 'spooky' alignment of quasar spins and the cosmological principle here. So as to focus better on the main point, I left a few technical comments out of that discussion which I want to mention here. These don't have any direct bearing on the main argument made in that post — rather they are interesting asides for a more expert audience.

Quasars can't prove the Universe is homogeneous


Readers of the original post might have noticed that I was quite careful to always state that the distribution of quasars was statistically homogeneous, but not that the quasars showed the Universe was homogeneous. The reason for this lies in the properties of the quasar sample itself.

There are two main ways of constructing a sample of galaxies or quasars to use for further analysis, such as testing homogeneity. The first is that you simply include every object seen by the survey instruments within a certain patch of sky that lies between two redshifts of interest. But these objects will vary in their intrinsic brightness, and the survey instruments have a limited sensitivity, so can only record dim objects when they are relatively close to us. Intrinsically bright objects are rarer, but if they are very far away we will only be able to see the rare bright ones. So this strategy results in a sample with very many, but largely dim, galaxies or quasars relatively close to us, and fewer but brighter objects far away. This is known as a flux-limited sample.

The other strategy is to correct the measured brightness of each object for the distance from us, to determine its 'intrinsic' brightness (otherwise known as its absolute magnitude), and then select a sample of only those objects which have similar absolute magnitudes. The magnitude range is chosen in accordance with the range of distances such that within the volume of the Universe surveyed, we can be confident we have seen every object of that magnitude that exists. This is called a volume-limited sample.

Testing the homogeneity of the Universe requires a volume-limited survey of objects. For a flux-limited sample the distribution in redshift (i.e., in the line-of-sight direction) would not be expected to be uniform in the first place: the number density of objects would ordinarily decrease sharply with redshift. But looking out away from Earth also involves looking back in time; so if the redshift range of the survey is large, the farthest objects are seen as they were at an earlier time than the closest ones. If the objects in question had evolved significantly in that time, near and far objects could represent significantly different populations even in a volume-limited sample, and once again we wouldn't expect to see homogeneity along the line of sight, even if the Universe were homogeneous.

So to really test the cosmological principle without having to assume homogeneity at the outset,1 we really need a volume-limited sample of galaxies that cover a very large volume of the Universe but span a relatively narrow range of redshifts. Such surveys are hard to come by. For example, the study confirming homogeneity in WiggleZ galaxies (see here and here) actually used a flux-limited sample, so required additional assumptions. In this case one doesn't obtain a proof, rather a check of the self-consistency of those assumptions — which people may regard as good enough, depending on taste.

Anyway, the key point is that the DR7QSO quasar sample everyone uses is most definitely flux-limited and not volume-limited (I was myself reminded of this point by Francesco Sylos Labini). Despite this, the redshift distribution of quasars is remarkably uniform (between redshifts 1 and 1.8). So what's going on? Well, unlike certain types of galaxies that live much closer to home, distant quasar populations are expected to evolve rather quickly with time. And the age difference between objects at redshifts 1 and 1.8 is more than 2 billion years!

It would appear that this effect and the flux-limited nature of the survey coincidentally roughly cancel each other out for the sample in question. A volume-limited subset of these quasars would be (is) highly inhomogeneous — but then because of the time evolution the homogeneity or otherwise of any sample of quasars says nothing much about the homogeneity or otherwise of the Universe in general.

Luckily this is only incidental to the main argument. The fact that the distribution of these (flux-limited) quasars is statistically homogeneous on scales of 100-odd Megaparsecs despite claims for the existence of Gigaparsec-scale 'structures' simply demonstrates the point that the existence of single structures of any kind doesn't have any bearing on the question of overall homogeneity. Which is the main point.

Homogeneity is sample-dependent 


Of course the argument above cuts both ways.

Let's imagine that a study has shown that the distribution of a particular type of galaxy — call them luminous red galaxies — approaches homogeneity above a certain distance scale, say 100 Megaparsecs. Such a study was done by David Hogg and others in 2005. From this we may reasonably conclude (though not, strictly speaking, prove) that the matter distribution in the Universe is homogeneous above at most 100 Mpc. But we are not allowed to conclude that the distribution of some other sample of objects — radio galaxies, quasars, blue galaxies etc. — approaches homogeneity above the same scale, or indeed at all!

Even in a Universe with a homogeneous matter distribution, the scale above which a volume-limited sample of galaxies whose properties are constant with time approaches homogeneity depends on the galaxy bias. This number depends on the type of galaxies in question, and so too to a lesser extent will the expected homogeneity scale. Of course if the sample is not volume-limited, or does evolve with time, all bets are off anyway.

More generally, for each sample of galaxies that we wish to use for higher order statistical measurements, the statistical homogeneity of that particular sample must in general be demonstrated first. This is because higher order statistical quantities, such as the correlation function, are conventionally normalized in units of the sample mean, but in the absence of statistical homogeneity this becomes meaningless.

There was a time when the homogeneity of the Universe was less well accepted than it is today, and the possibility of a fractal distribution of matter was still an open question. At that time demonstrating the approach to homogeneity on large scales in a well-chosen sample of galaxies was worth a publication (even a well-cited publication) in itself. This is probably no longer the case, but it remains a necessary sanity check to perform for each galaxy survey.

1Properly speaking, even the creation of a volume-limited sample requires an assumption of homogeneity at the outset, since the determination of absolute magnitudes requires a cosmological model, and the cosmological model used will assume homogeneity. In this sense all "tests" of homogeneity are really consistency checks of our assumption thereof.

Sunday, November 23, 2014

A 'spooky alignment' of quasars, or just hype?

In the news this week we've had a story on the alignment of quasar spins with large-scale structure, based on this paper by Hutsemekers et al. The paper was accompanied by this press release from the European Space Observatory, which was then reproduced in various forms in a number of blogs and news outlets — almost all of which stress the 'spooky' or 'mysterious' nature of the claimed alignment 'over billions of light years'.

At least one of these blogs (the one at The Daily Galaxy) explicitly claims that the alignment of these quasar spins is a challenge for the cosmological principle, which is the assumption of large-scale statistical homogeneity and isotropy of the Universe, on which all of modern cosmology is based. This claim is not contained in the press release, but originates from a statement in the paper itself, where the authors say
The existence of correlations in quasar axes over such extreme scales would constitute a serious anomaly for the cosmological principle.
I'm afraid that this claim is completely unsupported by any of the actual results contained within the paper, and is therefore one of those annoying examples of scientific hype. In this post I will try to explain why.

I have actually covered much of this ground before — in a blog post here, but more importantly in a paper published in Monthly Notices last year — and I must admit I am a little surprised at having to repeat these points (especially since my paper is cited by Hutsemekers et al.). Nevertheless, in what follows I shall try not to sound too grumpy.

The immediate story started with a paper by Roger Clowes and collaborators, who claimed to have detected the 'largest structure' in the Universe (dubbed the 'Huge-LQG') in the distribution of quasars, and also claimed that this structure violated the cosmological principle. My paper last year was a response to this, and made the following points:

  1. the detection of a single large structure has essentially no relevance to the question of whether the Universe is statistically homogeneous and isotropic;
  2. the quasar sample within which the Huge-LQG was identified is statistically homogeneous, and approaches homogeneity at the scale we expect theoretically, thus providing an explicit demonstration of point 1;
  3. the definition of 'structure' by which the Huge-LQG counts as a structure is so loose that by using it we would find equally vast 'structures' even in completely random distributions of points which (by construction!) contain no correlations and therefore no structure whatsoever; and 
  4. therefore the classification of the Huge-LQG set of quasars as a 'structure' is essentially empty of meaning.


Quasar structures don't violate homogeneity


Since I am already repeating myself, let me elaborate a little more on points 1 and 2. Our Universe is not exactly homogeneous. The fact that you exist — more generally, the fact that stars, galaxies and clusters of galaxies exist — is sufficient proof of this, so it would a very poor advertisement for cosmology indeed if it were all founded on the assumption of exact homogeneity. Luckily it isn't. In fact our theories could be said to predict the existence of structure in the potential $\Phi$ on all scales (that's what a scale-invariant power spectrum from inflation means!), and even the galaxy-galaxy correlation function only goes asymptotically to zero at large scales.

Instead we have the assumption of statistical homogeneity and isotropy, which means that we assume that when looked at on large enough scales, different regions of the Universe are on average the same. Clearly, since this is a statement about averages, it can only be tested statistically by looking at large numbers of different regions, not by finding one particular example of a 'structure'. In fact there is a well-established procedure for checking the statistical homogeneity of the distribution of a set of points (the positions of galaxies or quasars, in this case), which involves measuring its fractal dimension and checking the scale above which this is equal to 3. I've described the procedure before, here and here, and Peter Coles describes a bit of the history of it here.

The bottom line is that, as I showed last year, the quasar distribution in question is statistically homogeneous above scales of at most $\sim130h^{-1}$Mpc. There is therefore no 'structure' you can find in this data which could violate the cosmological principle. End of story.

Scaled number counts in spheres as a measure of the fractal dimension of the quasar distribution. On scales where this number approaches 1, the distribution is statistically homogeneous. From arXiv:1306.1700.


Structures and probability


Of course, there are many different ways of being statistically homogeneous. It is perfectly possible that within a statistically homogeneous distribution one could find a particular structure or feature whose existence in our specific cosmological model (which is one of many possible models satisfying the cosmological principle) is either very unlikely or impossible. This would then be a problem for that cosmological model despite not having any wider implications for the cosmological principle. But to prove this requires some serious analysis, which should include a proper treatment of probabilities — you can't just say "this structure is big, so it must be anomalous."

In particular, any serious analysis of probabilities must take into account how a 'structure' is defined. Given infinitely many possible choices of definition, and a very large Universe in which to search, the probability of finding some 'structure' that extends over billions of light years is practically unity. In fact the definition used for the Huge-LQG would be likely to throw up equally vast 'structures' even if quasar positions were not at all correlated with each other (and we know they must be at least somewhat correlated, because of gravity). So it really isn't a very useful definition at all.


'Spooky' alignments


This brings us to the current paper by Hutsemekers et al. The starting assumption of this paper is that the Huge-LQG is a real structure which is somehow distinguished from its surroundings. This assumption is manifest in the decision that the authors make to try to measure the polarization of light from only those quasars that are classified as part of the Huge-LQG rather than a more general sample of quasars. This classic case of circular reasoning is the first flaw in the logic, but let's put it to one side for a minute.

The press release then tells us that the scientists
found that the rotation axes of the central supermassive black holes in a sample of quasars are parallel to each other over distances of billions of light years
and that the spins of the central black holes are aligned along the filaments of large-scale structure in which they reside.

I find this statement extremely problematic. Here is a figure from the paper in question, showing the sky positions of the 93 quasars in question, along with the polarization orientations for the 19 which are used in the actual analysis:

Quasar positions (black dots) and polarization alignments (red lines). From arXiv.1409.6098.

Do you see the alignment? No, me neither. In fact, looking at the distribution of angles in panel b, I would say that looks very much like a sample drawn from a perfectly uniform distribution.

So what is the claim actually based on? Well, for a start one has to split up the (arbitrarily defined) 'structure' into several (even more arbitrarily defined) 'sub-structures'. Each of these sub-structures then defines a different reference angle on the sky:

Chopping the data to suit the argument (Figure 4 of arXiv:1409.6098). On what basis are sub-structures 1 and 2 defined as separate from each other?

And now one has to measure the angles between the quasar polarization direction and the reference direction of the particular sub-structure, and the direction perpendicular to the reference direction, and choose the smaller of the two. In other words, rather than prove that quasars are aligned parallel to each other over distances extending over billions of light years (the claim in the press release), what Hutsemekers et al. are actually doing is attempting to show that given arbitrary choices of some smaller sub-structures and reference directions, quasars in different sub-structures are typically aligned either parallel to or perpendicular to this direction. This is a much less exacting standard.

Even this claim is not particularly well supported by the evidence. That is, looking at the distribution of angles, I am really not at all convinced that this shows evidence for a bimodal distribution with peaks at 0 and 90 degrees:

Distribution of angles purportedly showing two distinct peaks at 0 and 90. Figure 5 of arXiv:1409.6098.

So in summary I think the statistical evidence of alignment of quasar spins is already pretty weak. I don't see any analysis in the paper dealing with the effects of a different arbitrary choice of sub-structures, nor do I see any error analysis (the error in measuring the polarization direction of a quasar can be as large as 10 degrees!). And I haven't even dealt with the fact that the polarization data is used for only 19 quasars out of the full 93 — in other words, for the majority of quasars in the sample the central black hole spins are aligned along some other, undetermined, direction such that we can't measure the polarization.


Extraordinary claims require extraordinary evidence


Now, it's worth repeating that we've already seen that in fact the space distribution of quasars is statistically homogeneous in accordance with the cosmological principle. That simple test has been done, the cosmological principle survives. So if you've got some more nuanced claim of an anomaly, I think the onus is on you not only to describe the measurement you made, but also say what exactly is anomalous about it. What is the theoretical prediction we should compare it to? Which model is being rejected (or otherwise) by the new data?

So, for instance, if quasar spins in sub-structures are indeed aligned either parallel or perpendicular to each other (and I still remain to be convinced that they are), is this really something 'spooky', or would we expect some degree of alignment in the standard $\Lambda$CDM model?

Such an analysis has not been presented, but even if it had, it's worth bearing in mind the principle that extraordinary claims require extraordinary evidence. I'm afraid throwing out a p-value of about 1% simply doesn't cut it. Not only is that actually not an enormously impressive number (especially given all the other things I mentioned above), such a frequentist statistic doesn't take account of all our prior knowledge.

Other people have banged this drum at length before, but the point is easily summarized: the p-value tells us the probability of getting this data given the model, but doesn't tell us the probability of the model being correct despite the new data appearing to contradict it. This is the question we really wish to answer. To do this requires a Bayesian analysis, in which one must account for the prior belief in the model, which is the result of confidence built up from all other experimental results that agree with it. We have an incredible amount of observational evidence in favour of our current model, that would probably not be consistent with a model in which gigantic structures could exist (I say 'probably' because no such model actually exists at present). 

So my prior in favour of $\Lambda$CDM is pretty high — 19 quasars and an analysis so full of holes are not going to change that so quickly.

Monday, September 22, 2014

Biting the dust

Sorry about the obvious pun in the title. Today's important announcement is of course the long-awaited Planck verdict on the level at which the BICEP2 "discovery" of primordial gravitational waves had been contaminated by foreground dust. That verdict does not look good for BICEP.

(Incidentally, back in July I reported a Planck source as saying this paper would be ready in "two or three weeks". Clearly that was far too optimistic. But interestingly many members of the Planck team themselves were confidently expecting today's paper to appear about 10 days ago, and the rumour is that the current version has been "toned down" a little, perhaps accounting for some of the additional delay. Despite that it's still pretty devastating.)

Let me attempt to summarize the new results. Some important points are made right in the abstract, where we read:
"... even in the faintest dust-emitting regions there are no "clean" windows in the sky where primordial CMB B-mode polarization measurements could be made without subtraction of foreground emission"
and that
"This level [of the dust power in the BICEP2 window, over the multipole range of the primordial recombination bump] is the same magnitude as reported by BICEP2 ..."
(my emphasis). Although
"the present uncertainties are large and will be reduced through an ongoing, joint analysis of the Planck and BICEP2 data sets,"
from where I am looking unfortunately it now does not look as if there is a realistic chance that what BICEP2 reported was anything more than a very precise measurement of dust.

The Planck paper is pretty thorough, and actually quite interesting in its own right. They make use of the fact that Planck observes the sky at many frequencies to study the properties of dust-induced polarization. Whereas BICEP2 was limited to a single frequency channel at 150 GHz, the Planck HFI instrument has 4 different frequencies, of which the most useful is at 353 GHz. Previous Planck results have already shown that dust emission behaves sort of like a (modified) blackbody spectrum at a temperature of 19.6 Kelvin. Since this is a significantly higher temperature than the CMB temperature of 2.73 K, dust emission dominates at higher frequencies, which means that the 353 GHz channel essentially sees only dust and nothing else. Which makes it perfect for the task at hand, since in this particular situation roles are reversed and it is the dust that is the signal and the primordial CMB is noise!

The analysis proceeds in a number of steps. First, they study the power spectra of the two polarization modes (EE and BB) in several different large regions in the sky:

The different large sky regions studied are shown as increments of red, orange, yellow, green and two different shades of blue. The darkest blue region is always excluded. Figure from arXiv:1409.5738.
In all these different regions, both power spectra $C_\ell^{EE}$ and $C_\ell^{BB}$ are proportional to $\ell^{\alpha}$, consistent with a value of $\alpha=-2.42\pm0.02$. Fixing $\alpha$ to this value, the amplitude of the power spectra in the different large regions then shows a characteristic dependence on the mean intensity of the dust emission — i.e. regions with more dust overall also show more polarization power — and this purely empirical relationship is characterized by
$$A^{EE,BB}\propto\langle I_{353}\rangle^{1.9},$$though with a bit of uncertainty in the fit. The amplitudes of the polarization power spectra then also show a dependence on frequency from 353 GHz down to 100 GHz which matches previous Planck results (the dependence is something close to a blackbody spectrum at 19.6 K, but with a specific modification).

It then turns out that if the sky is split into very many much smaller regions close to the poles rather than the 6 large ones above, the same results continue to hold on average, though obviously there is some scatter introduced by the fact that dust in different bits of the sky behaves differently. So this allows the Planck team to take the measured dust intensity in any one of these smaller regions and extrapolate down to see what the contribution to the BB power would be if measured at the BICEP2 frequency of 150 GHz. The result looks like this:

The level of dust contamination across the in measurements of the primordial B-mode signal. Blue is good, red is bad. The BICEP2 window is the black outline on the right.
This really sucks for BICEP2, who chose their particular patch of the sky precisely because, according to estimates of the 1990s and early 2000s, it was supposed to have very little dust. Planck is now saying that isn't true, and that there is a better region just a little further south. Even that better region isn't perfect, of course, but it may be clean enough to see a primordial GW signal of $r\sim 0.1$ to $0.2$ — if such a signal exists, and if we're lucky and/or figure out cleverer ways of subtracting the dust foreground.

The problem with the BICEP2 region is that Planck's estimate of the dust contribution there looks like this:

Planck's estimate of the dust contribution to the BB power spectrum at 150 GHz and in the BICEP2 sky window. The first bin is the one that's most relevant. The black line is the contribution primordial GW with $r=0.2$ would make, if they existed.
So it appears that in the BICEP2 window, in the $\ell$ region where primordial gravitational waves produce a measurable BB signal (and BICEP2 has measured something), dust is expected to produce the same amplitude of signal as does an $r=0.2$. In fact, even accounting for the uncertainties in the Planck analysis (the extent of the pink error bars on the plot) it is clear that (a) dust will be contributing significantly to the BICEP2 measurement, and (b) it's pretty likely that only dust is contributing.

Planck avoid explicitly saying that BICEP2 haven't seen anything but dust. This is because they haven't directly measured the dust contribution in that window and at 150 GHz. Rather what's shown in the plot above is based on a number of little steps in the chain of inference:
  1. generally, the BB polarization amplitude is dependent on the average total dust intensity in a region;
  2. the relationship between these two doesn't vary too much across the sky;
  3. generally, the frequency dependence of the amplitude shows a certain behaviour;
  4. and again this doesn't appear to vary too much across the sky
  5. Planck have measured the average dust intensity in the BICEP2 window, and this gives the value shown in the plot above when extrapolated to 150 GHz;
  6. and the BICEP2 window doesn't appear to be a special outlier region on the sky that would wildly deviate from these average relationships;
  7. so, the dust amplitude calculated is probably correct.
Update: See the correction in the comments — the Planck paper actually does better than this. That is to say, they present one analysis that relies on all steps 1-7, but in addition they also measure the BB amplitude directly at 353 GHz and extrapolate that down to 150 GHz relying only on steps 3 and 4. The headline result is the one based on the second method, which actually gets a lower number for the dust amplitude. 

So they leave open the small possibility that despite having been unlucky in the original choice of the BICEP2 window, we've somehow ultimately got very lucky indeed and nevertheless measured a true primordial gravitational wave signal. 

Time will tell if this is true ... but the sensible betting has now got to be that it is not.

Incidentally, I have just learned that in two days' time I will be presenting a 30 minute lecture to a group of graduate students about this result. The lecture is not supposed to be very detailed, but I'm also not very much of an expert on this. So if you spot any errors or omissions above, please do let me know through the comments box!

Monday, August 25, 2014

A Supervoid cannot explain the Cold Spot

In my last post, I mentioned the claim that the Cold Spot in the cosmic microwave background is caused by a very large void — a "supervoid" — lying between us and the last scattering surface, distorting our vision of the CMB, and I promised to say a bit more about it soon. Well, my colleagues (Mikko, Shaun and Syksy) and I have just written a paper about this idea which came out on the arXiv last week, and in this post I'll try to describe the main ideas in it.

First, a little bit of background. When we look at sky maps of the CMB such as those produced by WMAP or Planck, obviously they're littered with very many hot and cold spots on angular scales of about one degree, and a few larger apparent "structures" that are discernible to the naked eye or human imagination. However, as I've blogged about before, the human imagination is an extremely poor guide to deciding whether a particular feature we see on the sky is real, or important: for instance, Stephen Hawking's initials are quite easy to see in the WMAP CMB maps, but this doesn't mean that Stephen Hawking secretly created the universe.

So to discover whether any particular unusual features are actually significant or not we need a well-defined statistical procedure for evaluating them. The statistical procedure used to find the Cold Spot involved filtering the CMB map with a special wavelet (a spherical Mexican hat wavelet, or SMHW), of a particular width (in this case $6^\circ$), and identifying the pixel direction with the coldest filtered temperature with the direction of the Cold Spot. Because of the nature of the wavelet used, this ensures that the Cold Spot is actually a reasonably sizable spot on the sky, as you can see in the image below:

The Cold Spot in the CMB sky. Image credit: WMAP/NASA.

Well, so we've found a cold spot. To elevate it to the status of "Cold Spot" in capitals and worry about how to explain it, we first need to quantify how unusual it is. Obviously it is unusual compared to other spots on our observed CMB, but this is true by construction and not very informative. Instead the usual procedure quite rightly compares the properties of the cold spots found in random Gaussian maps using exactly the same SMHW technique to the properties of the Cold Spot in our CMB. It is this procedure which results in the conclusion that our Cold Spot is statistically significant at roughly the "3-sigma level", i.e. only about 1 in every 1000 random maps has a coldest spot that is as "cold" as* our Cold Spot.** (The reason why I'm putting scare quotes around everything should become clear soon!)

So there appears to be a need to explain the existence of the Cold Spot using additional new physics of some kind. One such idea that that of the supervoid: a giant region hundreds of millions of light years across which is substantially emptier than the rest of the universe and lies between us and the Cold Spot. The emptiness of this region has a gravitational effect on the CMB photons that pass through it on their way to us, making them look colder (this is called the integrated Sachs-Wolfe or ISW effect) — hence the Cold Spot.

Now this is a nice idea in principle. In practice, unfortunately, it suffers from a problem: the ISW effect is very weak, so to produce an effect capable of "explaining" the Cold Spot the supervoid would need to be truly super — incredibly large and incredibly empty. And no such void has actually been seen in the distribution of galaxies (a previous claim to have seen it turned out to not be backed up by further analysis).

It was therefore quite exciting when in May a group of astronomers, led by Istvan Szapudi of the Institute for Astronomy in Hawaii, announced that they had found evidence for the existence of a large void in the right part of the sky. Even more excitingly, in a separate theoretical paper, Finelli et al. claimed to have modeled the effect of this void on the CMB and proven that it exactly fit the observations, and that therefore the question had been effectively settled: the Cold Spot was caused by a supervoid.

Except ... things aren't quite that simple. For a start, the void they claimed to have found doesn't actually have a large ISW effect — in terms of central temperature, less than one-seventh what would be needed to explain the Cold Spot. So Finelli et al. relied on a rather curious argument: that the second-order effect (in perturbation theory terms) of this void on CMB photons was somehow much larger than the first-order (i.e. ISW) effect. A puzzling inversion of our understanding of perturbation theory, then!

In fact there were a number of other reasons to be a bit suspicious of the claim, among which were that N-body simulations don't show this kind of unusual effect, and that several other larger and deeper voids have already been found that aren't aligned with Cold Spot-like CMB features. In our paper we provide a fuller list of these reasons to be skeptical before diving into the details of the calculation, where one might get lost in the fog of equations.

At the end of the day we were able to make several substantive points about the Cold Spot-as-a-supervoid hypothesis:
  1. Contrary to the claim by Finelli et al., the void that has been found is neither large enough nor deep enough to leave a large effect on the CMB, either through the ISW effect or its second-order counterpart — in simple terms, it is not a super enough supervoid.
  2. In order to explain the Cold Spot one needs to postulate a supervoid that is so large and so deep that the probability of its existence is essentially zero; if such a supervoid did exist it would be more difficult to explain that the Cold Spot currently is!
  3. The possible ISW effect of any kind of void that could reasonably exist in our universe is already sufficiently accounted for in the analysis using random maps that I described above.
  4. There's actually very little need to postulate a supervoid to explain the central temperature of the Cold Spot — the fact that we chose the coldest spot in our CMB maps already does that!
Point number 1 requires a fair bit of effort and a lot of equations to prove (and coincidentally it was also shown in an independent paper by Jim Zibin that appeared just a day before ours), but in the grand scheme of things it is probably not a supremely interesting one. It's nice to know that our perturbation theory intuition is correct after all, of course, but mistakes happen to the best of us, so the fact that one paper on the arXiv contains a mistake somewhere is not tremendously important.

On the other hand, point 2 is actually a fairly broad and important one. It is a result that cosmologists with a good intuition would perhaps have guessed already, but that we are able to quantify in a useful way: to be able to produce even half the temperature effect actually seen in the Cold Spot would require a hypothetical supervoid almost twice as large and twice as empty as the one seen by Szapudi's team, and the odds of such a void existing in our universe would be something like a one-in-a-million or one-in-a-billion (whereas the Cold Spot itself is at most a one-in-a-thousand anomaly in random CMB maps). A supervoid therefore cannot help to explain the Cold Spot.***

Point 3 is again something that many people probably already knew, but equally many seem to have forgotten or ignored, and something that has not (to my knowledge) been stated explicitly in any paper. My particular favourite though is point 4, which I could — with just a tiny bit of poetic licence — reword as the statement that
"the Cold Spot is not unusually cold; if anything, what's odd about it is only that it is surrounded by a hot ring"
I won't try to explain the second part of that statement here, but the details are in our paper (in particular Figure 7, in case you are interested). Instead what I will do is to justify the first part by reproducing Figure 6 of our paper here:

The averaged temperature anisotropy profile at angle $\theta$ from the centre of the Cold Spot (in red),  and the corresponding 1 and $2\sigma$ contours from the coldest spots in 10,000 random CMB maps (blue). Figure from arXiv:1408.4720.

What the blue shaded regions show is the confidence limits on the expected temperature anisotropy $\Delta T$ at angles $\theta$ from the direction of the coldest spots found in random CMB maps using exactly the SMHW selection procedure. The red line, which is the measured temperature for our actual Cold Spot, never goes outside the $2\sigma$ equivalent confidence region. In particular, at the centre of the Cold Spot the red line is pretty much exactly where we would expect it to be. The Cold Spot is not actually unusually cold.

Just before ending, I thought I'd also mention that Syksy has written about this subject on his own blog (in Finnish only): as I understand it, one of the points he makes is that this form of peer review on the arXiv is actually more efficient than the traditional one that takes place in journals.

Update: You might also want to have a look at Shaun's take on the same topic, which covers the things I left out here ...

* People often compare other properties of the Cold Spot to those in random maps, for instance its kurtosis or other higher-order moments, but for our purposes here the total filtered temperature will suffice.

** Although as Zhang and Huterer pointed out a few years ago, this analysis doesn't account for the particular choice of the SMHW filter or the particular choice of $6^\circ$ width — in other words, that it doesn't account for what particle physicists call the "look-elsewhere effect". Which means it is actually much less impressive.

*** If we'd actually seen a supervoid which had the required properties, we'd have a proximate cause for the Cold Spot, but also a new and even bigger anomaly that required an explanation. But as we haven't, the point is moot.

Monday, July 14, 2014

Short news items

Over the past two months I have been on a two-week seminar tour of the UK, taken a short holiday, attended a conference in Estonia and spent a week visiting collaborators in Spain. Posting on the blog has unfortunately suffered as a result: my apologies. Here are some items of interest that have appeared in the meantime:
  • The BICEP and Planck teams are to share their data — here's the BBC report of this news. The information I have from Planck sources is that Planck will put out a paper with new data very soon (about a week ago I heard it would be "maybe in two weeks", so let's say two or three weeks from today). This new data will then be shared with the BICEP team, and the two teams will work together to analyse its implications for the BICEP result. From the timescales involved my guess is that what Planck will be making available is a measurement of the polarised dust foreground in the BICEP sky region, and the joint publication will involve cross-correlating this map with the B-mode map measured by BICEP. A significant cross-correlation would indicate that most (or all) of the signal BICEP detected was due to dust.
  • What Planck will not be releasing in the next couple of weeks is their own measurement of the polarization of the CMB, in particular their own estimate of the value of $r$. The timetable for this release is still October: this is a deadline imposed by the fact that ESA requires Planck to release the data by December, but another major ESA mission (I forget which) is due to be launched in November and ESA don't like scheduling "competing" press conferences in the same month because there's only so much science news Joe Public can absorb at a time. From what I've heard, getting the full polarization data ready for October is a bit of a rush as it is, so it's fairly certain that's not what they're releasing soon.
  • By the way, I think I've recently understood a little better how a collaboration as enormous as Planck manage to remain so disciplined and avoid leaking rumours: it's because most of the people in the collaboration don't know the full details of the results either! That is to say, the collaboration is split into small sub-groups with specified responsibilities, and these sub-groups don't share results with each other. So if you ask a randomly chosen Planck member what the preliminary polarization results are looking like, chances are they don't know any better than you. (Though this may not stop them from saying "Well, I've seen some very interesting plots ..." and smiling enigmatically!)
  • The conference I attended in Estonia was the IAU symposium in honour of the 100th birth anniversary of the great Ya. B. Zel'dovich, on the general topic of large-scale structure and the cosmic web. I'll try to write a little about my general impressions of the conference next time. In the meantime all the talks are available for download from the website here.
  • A science news story you may have seen recently is "Biggest void in universe may explain cosmic cold spot": this is a claim that a recently detected region with a relative deficit of galaxies (the "supervoid") explains the existence of the unusual Cold Spot that has been seen in the CMB, without the need to invoke any unusual new physics. The claim of the explanation is based on this paper. Unfortunately this claim is wrong, and the paper itself has several problems. My collaborators and I are in the process of writing a paper of our own discussing why, and when we are done I will try to explain the issues on here as well. In the meantime, you heard it here first: a supervoid does not explain the Cold Spot!
Update: It has been pointed out to me that last week Julien Lesgourgues gave a talk about Planck and particle physics at the Strong and Electroweak Matter (SEWM14) symposium, in which he also discussed the timeline of forthcoming Planck and BICEP papers. You can see this on page 12 of his talk (pdf) and it is roughly the same as what I wrote above (except that there's a typo in the year — it should be 2014 not 2015!).

Friday, May 16, 2014

BICEP and listening to real experts

First up, I'd like to provide a health warning for all people landing here after following links from Sean Carroll or Peter Woit (thanks for the traffic!): I am not a CMB data analysis expert. What I provide on this blog is my own interpretation and understanding of the news and papers I have read, largely because writing such things out helps me understand them better myself. If it also helps people reading this blog, that's great, and you're welcome. But there are no guarantees that any of what I have written about BICEP is correct! If you truly want the best expert opinions on CMB analysis issues, you should listen to the best CMB experts — in this case, probably people who were in the WMAP collaboration, but are not in either Planck or BICEP. Also, if you want to ask somebody to write a scholarly review article on BICEP (yes, I get strange emails!), please don't ask me.

Having said that, I'm not sure whether any WMAP scientists write blogs, so I can at least try to provide some sources for the non-expert reader to refer to. One thing that you definitely should look at is Raphael Flauger's talk (slides and video) at Princeton yesterday. I think it is this work which was the source of the "is BICEP wrong" rumours first publicly posted at Resonaances, and indeed I see that Resonaances today has a follow-up referring to these very slides.

There are several interesting things to take away from this talk. The first is to do with the question of whether BICEP misinterpreted the preliminary Planck data that they admit having taken from a digitized version of a slide shown at a meeting. Here Flauger essentially simulates the process by digitizing the slide in question (and a few others) himself and analyzing them both with and without the correct CIB subtraction. His conclusion is that with the correct treatment, the dust models appear to predict higher dust contamination than BICEP accounted for; the inference being, I guess, that they didn't subtract the CIB correctly.

How important is this dust contribution? Here there is a fair amount of uncertainty: even if the digitization procedure were foolproof, one of the dust models underestimates the contamination and another one overestimates it. Putting the two together, "foregrounds may be OK if the lower end of the estimates is correct, but are potentially dangerous" (page 40). Flauger tries another method of estimation based on the HI column density, using yet more unofficial Planck "data" taken from digitized slides. This seems to give much the same bottom line.

A key point here is that everybody who isn't privy to the actual Planck data is really just groping in the dark, digitizing other people's slides. Flauger acknowledges by trying to estimate the effect of the process of converting real data into a gif image, converting that into a pdf as part of a talk, somebody nicking the pdf and converting it back to gif and then back to useable data. As you can imagine, the amount of noise introduced in this version of Chinese Whispers is considerable! So I think the following comment from Lyman Page towards the end of the video (as helpfully transcribed by Eiichiro Komatsu for the Facebook audience!) is perhaps the most relevant:
"This is, this is a really, peculiar situation. In that, the best evidence for this not being a foreground, and the best evidence for foregrounds being a possible contaminant, both come from digitizing maps from power point presentations that were not intended to be used this way by teams just sharing the data. So this is not - we all know, this is not sound methodology. You can't bank on this, you shouldn't. And I may be whining, but if I were an editor I wouldn't allow anything based on this in a journal. Just this particular thing, you know. You just can't, you can't do science by digitizing other people's images."
Until Planck answers (or fails to definitively answer) the question of foregrounds in the BICEP window, or some other experiment confirms the signal, we should bear that in mind.

There are some other issues that remain confusing at the moment: the cross-correlation of dust models with BICEP signal doesn't seem to support the idea that all the signal is spurious (though there are possibly some other complicating factors here), and the frequency evidence — such as it is — from the cross power with BICEP1 also doesn't seem to favour a dust contaminant. But all in all, the BICEP result is currently under a lot of pressure. Having seen this latest evidence, I now think the Resonaances verdict ("until [BICEP convincingly demonstrate that foregrounds are under control], I think their result does not stand") is — at least — a justifiable position.

Footnote: I should also perhaps explain that throughout my physics education I have been taught, and had come to believe, that the types of models of inflation BICEP provided evidence for (those with inflaton field values larger than the Planck scale) were fundamentally unnatural and incomplete, and that those, small-field, models that BICEP apparently ruled out were much more likely to be true. So perhaps my conscious attempts to compensate for this acknowledged theoretical prejudice could have biased me too far in the opposite direction in some previous posts!

Wednesday, May 14, 2014

New BICEP rumours: nothing to see here

This week there has been a minor kerfuffle about some rumours, originally posted on Adam Falkowski's Resonaances blog, regarding the claimed gravitational wave detection by BICEP. The rumours asserted that Planck had proven BICEP had made a mistake, BICEP had admitted the mistake, and that this might mean that all the excitement about the detection of gravitational waves was misplaced and all that BICEP had seen was some foreground dust emission contaminating their maps. (Since then there has been a strong public denial of this by the BICEP team.)

Now, with the greatest respect to Resonaances, which is an excellent particle physics blog, this is really a non-issue, and certainly not worth offending lots of people for (see for instance Martin Bucher's comment here). I really do not see what substantial information these rumours have provided us with that was not already known in March, and therefore why we should alter assessments of the data  made at that time.

Let me explain a bit more. One of the important limitations of the BICEP2 experiment is that it essentially measured the sky at only one frequency (150 GHz) — the data from BICEP1, which was at 100 GHz, was not good enough to see a signal, and the data from the Keck Array at 100 GHz has not yet been analysed. When you only have one frequency it is much harder to rule out the possibility that the "signal" seen is not due to primordial gravitational waves at all but due to intervening dust or other contamination from our own Galaxy.

The way that BICEP addressed this difficulty was to use a set of different models for the dust distribution in that part of the sky, and to show that all of them predict that the possible level of dust contamination is an order of magnitude too small to account for the signal that they see. Now, some of these models may not be correct. In fact none of them are likely to be exactly right, because they may be based on old and likely less accurate measurements of the dust distribution or rely on a bit of extrapolation, wishful thinking, whatever. But the point is that they all roughly agree about the order of magnitude of dust contamination. This does not mean that we know there is or isn't any foreground contamination; this is merely a plausibility argument from BICEP (that is supported by and supports some other plausibility arguments in the paper).

Now the "new" rumour is based on the fact that it turns out that one of the dust models was based on BICEP's interpretation of preliminary Planck data, and that this data was not officially sanctioned but digitally extracted from a pdf of a slide shown at a talk somewhere. This is not exactly news, since the slide in question is in fact referenced in the BICEP paper. What's new is that now somebody unnamed is suggesting that the slide was in fact misinterpreted, and therefore this one dust model is more wrong than we thought, though we already accepted it was probably somewhat wrong. This is not the same as proving that the BICEP signal has been definitively shown to be caused by dust contamination! In fact I don't see how it changes the current picture we have at all. Ultimately the only way we can be sure about whether the observed signal is truly primordial or due to dust is to have measurements that combine several different frequencies. For that we have to wait a bit for other experiments — and that's the same as we were saying in March.

It's worth noting that when BICEP quote their result in terms of the tensor-to-scalar ratio r, the headline number $r=0.2$ assumes that there is literally zero foreground contamination. This was always an unrealistic assumption, but that hasn't stopped some 300 theorists from writing papers on the arXiv that take the number as face value and use it to rule out or support their favourite theories. The foreground uncertainty means that while we can be reasonably confident that the gravitational wave signal does exist (see here), model comparisons that strongly depend on the precise value of r are probably going to need some revision in the future.

So what new information have we gained since March? Well, Planck released some more data, this time a map of the polarized dust emission close to the Galactic plane.

The polarization fraction at 353 GHz observed by Planck. From arXiv:1405.0871.

Since these maps do not include the part of the sky that BICEP looked at (which is mostly in the grey region at the bottom), they don't tell us a huge amount about whether that part of the sky is or is not contaminated by polarized dust emission! Some people have speculated that this is something to do with the rivalry between Planck and BICEP, which is a bit over-the-top. Instead the reason is more scientific: the mask excludes areas where the error in determining the polarisation fraction is high, or the overall dust signal itself is too small. So the fact that the BICEP patch is in the masked region indicates that the dust emission does not dominate the total emission there, at least at 353 GHz (dust emission increases with frequency). This means there is not a whole lot of dust showing up in the BICEP region — if anything, this is good news! But even this interpretation should be treated with caution: dust doesn't contribute too much to the total intensity in that region, but it may well still contribute a large fraction of whatever B-mode polarization is seen. Based on my understanding and things I have learned from conversations with colleagues, I don't think Planck is going to be sensitive enough to make definitive statements about the dust in that specific region of the sky.

Another interesting paper that has come out since March has been this one, which claims evidence for some contamination in the CMB arising from the "radio loops" of our Galaxy. It also has the great benefit of being an actual scientific paper rather than a rumour on somebody's blog. (Full disclaimer: one of the authors of this paper was my PhD advisor, and another is a friend who was a fellow student when I was at Oxford.) 

The radio loops are believed to be due to ejected material from past supernovae explosions; the idea is that if this dust contains ferrimagnetic molecules or iron, it would contribute polarized emission that might be mistaken for true CMB when it is in fact more local. What this paper argues is that does appear to be some evidence that one of the CMB maps produced by the WMAP satellite (which operated before Planck) does show some correlation between map temperature and the position of one of these radio loops ("Loop I"). In particular, synchrotron emission from Loop I appears to be correlated with the temperature in the WMAP Internal Linear Combination (or ILC) map. I'm not going to comment on the strength of the statistical evidence for this claim; doubtless someone more expert than I will thoroughly check the paper before it is published. For the time being let us treat it as proven.

The relevance of this to BICEP is somewhat intricate, and proceeds like this: given our physical understanding of how the radio loops formed, it seems likely that they produce both synchrotron and dust emission which follow the same pattern on the sky. Therefore perhaps the correlation of the synchrotron emission from Loop I with the ILC map is because both are correlated with dust emission from the loop. If the correlation is because of dust emission, this might be polarized because of the postulated ferrimagnetic molecules etc., leading to a correlation between the WMAP polarization and Loop I. And if Loop I is contaminating the WMAP ILC map, it is perhaps plausible that a different radio loop, called the "New Loop", is also contaminating other CMB maps, in particular those of BICEP. Whereas Loop I doesn't get very close to the BICEP region, the New Loop goes right through the centre of it (see the figure below), so it is possible that there is some polarized contamination appearing in the BICEP data because of the New Loop. At any rate, the foreground dust models that BICEP used didn't account for any radio loops, so likely underestimate the true contamination.

Position of some Galactic radio loops and the BICEP window. "Loop I" is large one in the upper centre, that only skims the BICEP window; the "New Loop" is the one in the lower centre that passes through the centre of it. Figure from Philipp Mertsch.

So far so good, but this is quite a long chain of reasoning and it doesn't prove that it is actually dust contamination that accounts for any part of the BICEP observation. Instead it makes a plausible argument that it might be important; further investigation is required.

At the end of the day then, we are left in pretty much the same position we were in back in March. The BICEP result is exciting, but because it is only at one frequency, it cannot rule out foreground contamination. Other observations at other frequencies are required to confirm whether the signal is indeed cosmological. One scenario is that Planck, operating on the whole sky at many frequencies but with a lower sensitivity than BICEP, confirms a gravitational wave signal, in which case pop the champagne corks and prepare for Stockholm. The other scenario is that Planck can't confirm a detection, but also can't definitively say that BICEP's detection was due to foregrounds (this is still reasonably likely!), in which case we wait for other very sensitive ground-based telescopes pointed at that same region of sky but operating at different frequencies to confirm whether or not dust foregrounds are actually important in that region, and if so, how much they change the inferred value of r.

Until then I would say ignore the rumours.

Monday, March 24, 2014

BICEP2: reasons to be sceptical, part 2

This is the second part of three posts in which I wanted to lay out the various possible causes of concern regarding the BICEP2 result, and provide my own opinion on how seriously we should take these worries. I arranged these reasons to be sceptical into three categories, based on the questions
  • how certain can we be that BICEP2 observed a real B-mode signal?
  • how certain can we be that this B-mode signal is cosmological in origin, i.e. that it is due to gravitational waves rather than something less exciting?
  • how certain can we be that these gravitational waves were caused by inflation?
The first post dealt with the first of the three questions, this one addresses the second, and a post yet to be written will deal with the third.

How certain can we be that the observed B-mode signal is cosmological? 


Let's take it as given that none of the concerns in the previous post turn out to be important, i.e. that the observed B-mode signal is not an artefact of some hidden systematics in the analysis, leakage or whatever. From my position of knowing a little about data in general, but nothing much about CMB polarization analysis, I guessed that the chances of any such systematic being important were about 1 in 100.

The next question is then whether the signal could be caused by something other than the primordial gravitational waves that we are all so interested in. The most important possible contaminant here is other nearby sources of polarized radiation, particularly dust in our own Galaxy. We don't actually know how much polarized dust or synchrotron emission there might be in the sky maps here, so a lot of what BICEP have done is educated guesswork.

To start with, the region of the sky that BICEP looks at was chosen on the basis of a study by Finkbeiner et al. from 1999, which extrapolated from measurements of dust emission at certain other frequencies to estimate that, at the frequency of relevance to CMB missions such as BICEP, that particular part of the sky would be exceptionally "clean", i.e. with exceptionally low foreground dust emission. Whether this is actually true or not is not yet known for certain, but there exist a number of models of the dust distribution, and most of these models predict that the level of contamination to the B-mode detection from polarized dust emission would be an order of magnitude smaller than the observed signal. Similar model-dependent extrapolation to the observation frequency based on WMAP results suggests that synchrotron contamination is also an order of magnitude too small.

Predictions for foreground contamination for different dust models (the coloured lines at the bottom) versus the actual B-mode signal observed by BICEP2 (black points).


Now one real test of these assumptions will come from Planck, because Planck will soon have the best map of dust in our Galaxy and therefore the best limits on the possible contamination. This is one of the reasons to look forward to Planck's own polarization results, due in about October or November. In the absence of this information, the other thing that we would like to see from BICEP in order to be sure their signal is cosmological is evidence that the signal exists at multiple frequencies (and has the expected frequency dependence).

BICEP do not detect the signal at multiple frequencies. The current experiment, BICEP2, operates at 150 GHz only, and that is where the signal is seen. A previous experiment, BICEP1, did run at 100 GHz as well, but BICEP1 did not have the same sensitivity and could only place an upper limit on the B-mode signal. Data from the Keck Array will eventually also include observations at 100 GHz, but this is not yet available. Until we have confirmation of the signal at different frequencies, most cosmologists will treat the result very carefully.

In the absence of this, we must look at the cross-correlation between B2 and B1. Remember that although B1$\times$B1 did not have the sensitivity to make a detection of non-zero power, B2$\times$B1 can still tell us something useful. If B1 maps were purely noise, or B2 maps were due to dust, we would not expect them to be correlated. If both were due to synchrotron radiation, we would expect them to be strongly correlated. In fact the B2$\times$B1 cross power is non-zero at the $3\sigma$ level or about 99% confidence, which is something Peter Coles' sceptical summary ignores. This is indeed evidence that the signal seen at 150 GHz is cosmological.

Still, some level of cross-correlation could be produced even if both B2 and B1 were only seeing foregrounds. Combining the B2$\times$B1 data with B2$\times$B2 and B1$\times$B1 means that polarized dust or synchrotron emission of unexpected strength are rejected as explanations – though at a not-particularly-exciting significance of about $2.2-2.3\sigma$.

Verdict 


It's fair to say, on the basis of models of the distribution of polarized dust and synchrotron emission, that the BICEP2 signal probably isn't due to either of these contaminants. However, we don't yet have confirmation of the detection at multiple frequencies, which is required to judge for sure. At the moment, the frequency-based evidence against foreground contamination is not very strong, but we'd still need some quite unexpected stuff to be going on with the foregrounds to explain the amplitude of the observed signal.

Overall, I'd guess the odds are about 1:100 against foregrounds being the whole story. (This should still be compared with the quoted headline result of 1:300,000,000,000 against $r=0$ assuming no foregrounds at all!)

The chances are much higher – I'd be tempted to say perhaps even as much as better than even money – that foregrounds contribute a part of the observed signal, and that therefore the actual value of the tensor-to-scalar ratio will come down from $r=0.2$, perhaps to as low as $r=0.1$, when Planck checks this result using their better dust mapping.

Friday, March 21, 2014

BICEP2: reasons to be sceptical, part 1

As the dust begins to settle following the amazing announcement of the discovery of gravitational waves by the BICEP2 experiment, physicists around the world are taking stock and scrutinizing the results.

Remember that the claimed detection is enormously significant, in more ways than one. The BICEP team have apparently detected an exceedingly faint B-mode polarization pattern in the CMB, at an order of magnitude better sensitivity than any previous experiment probing the same scales. They have then claimed to have been able to ascribe this B-mode signal unambiguously to cosmological gravitational waves, rather than any astrophysical effects due to intervening dust or other sources of radiation. And finally they have interpreted these results as direct evidence for the theory of inflation, which is really the source of all the excitement, because if true it would pin down the energy scale of inflation at an incredibly high level, with extensive and dramatic consequences for our understanding of high energy particle physics.

However, as all physicists have been saying, with results of this magnitude it is important to be very careful indeed. Speculating who should get the Nobel Prize (or Prizes) for this is still premature. The paper containing the results will of course be subjected to anonymous peer review when it is submitted to a journal, but it has also already faced a rather extraordinary open peer review by social media, with a live group on Facebook, and all sorts of other discussion on blogs, Twitter and the like. (And to the great credit of the scientists on the BICEP team, they have patiently responded to questions and comments on these forums, and the whole process has been carried out very civilly!)

What I wanted to do today is to possibly contribute to that by gathering together all the main points of concern and reasons to be sceptical of the BICEP result. This is partly for my own purposes, since writing things down helps to clarify my thoughts. I will divide these concerns into three main categories, addressing the following questions:

  • how certain can we be that BICEP2 observed a real B-mode signal?
  • how certain can we be that this B-mode signal is cosmological in origin, i.e. that it is due to gravitational waves rather than something less exciting?
  • how certain can we be that these gravitational waves were caused by inflation?

I'll discuss the first category of concerns in part 1 of this post and the next two together in parts 2 and 3. I do not claim that any of the concerns I raise here are original, however any mistakes are definitely mine alone. I'd like to encourage discussion of any of these points via the comments below.

How certain can we be that BICEP2 observed a real B-mode signal?


This is obviously the most basic issue. The general reason for concern here — and this applies to any B-mode detection experiment — is that the experimental pipeline has to be able to decompose the polarization signal seen into two components, the E-mode and the B-mode, and the level of the signal in the B-mode is orders of magnitude smaller than in E. Now, as Peter Coles explains here, the E and B polarization components are in principle orthogonal to each other when the spherical harmonic decomposition can be performed over the whole sky, but this is in practice impossible. BICEP observes only a small portion of the sky, and therefore there is the possibility of "leakage" from E to B when the separating out the components. It would not take much leakage to spoil the B-mode observation.

Obviously the BICEP team implemented many tests of the obtained maps to check for such systematics. One of the ways to do this is to cross-correlate the E and B maps: if there is no leakage the cross-correlation should be consistent with zero. Another important test is the jackknife technique, also nicely explained here: you split your data into two equal halves, and subtract the signal found in one half from that in the other; the answer should also be consistent with zero.

Now one source of concern arises because of a combination of these two tests. The blue points in the following figure show the results of a jackknife test on the BB power:


These points are consistent with zero ... but they are possibly too consistent with zero! The $1\sigma$ error bars of each one of them passes through zero, whereas it would be more natural to expect some more scatter. In fact from the number on the plot you can see that there is only a 1% chance that all 9 blue points should be so close to zero.

This raises the possibility, pointed out by Hans Kristian Eriksen, that the errorbars on the blue points are overestimated. It may then be the case that the errorbars on other points in other jackknife tests are also too large. If that were the case then reducing those errors might mean that some of the other jackknife tests now fail — the points are no longer consistent with zero. As it happens, of the 168 jackknife test results listed in the table in the paper, quite a large number (about 7) of them already "fail" by the stricter standards (2% probability) some other experiments such as QUIET might apply. Obviously some number of tests are always expected to fail, but more than 7 out of 168 starts to look like quite a large number. This then becomes a little worrying.

On the other hand, this extrapolation may be a little exaggerated, because we are surmising that the errorbars might be too large purely on the basis of the one figure above. Clearly if you do a large number of jackknife tests, it becomes less surprising that one of them gives a surprising result, if you see what I mean. Looking through the table for the other BB jackknife results, the particular example from the figure is the only one that stands out as being odd, so it is hard to conclude from this that the errorbars are too large. Overall I'm not convinced that there is necessarily a problem here, but it is something that deserves a little more quantitative attention.

The second source of concern that has been highlighted is that the data at large multipole values appear to be doing something odd. Look at the 5th, 6th and 7th black points from the figure above, which are quite a long way from the theoretical expectation. Peter Coles helpfully drew a little blue circle around them:


The worry here is that even if the data appear to be passing jackknife tests for internal consistency and null tests for EB cross power, the fact that these points are so high suggests that there is still some undetected systematic that has crept in somewhere. This hypothesized systematic could account for the measured values of the crucial first four points, which constitute the detection of the gravitational waves.

Similarly, people are worried about the EE power spectrum, which appears to be too high in the $50< \ell<100$ region — again this could be a sign of leakage from temperature into polarization, which could perhaps be contaminating the B-mode maps despite not explicitly showing up in the jackknife consistency checks.

Now, the BICEP response to this is that you shouldn't judge things simply "by eye". The EE excess does not appear to be statistically significant. It's also not incredibly unlikely that the final two of the circled BB data points could simultaneously be as high as they are just due to random chance — they say "their joint significance is $<3\sigma$", which means that the chance is about 1%. (Of course the chance that all three of the circled points could simultaneously be high is smaller than that, and so presumably less than 1% ... )

Another justification some people have been providing (mostly people from outside the BICEP collaboration to be fair, though some from within it as well) is that the preliminary data from the Keck array, which is a similar instrument to BICEP but with higher sensitivity, appear to show no anomaly in that region. I think this is a somewhat dangerous argument, because the Keck data also don't seem to be quite so high in the region of the crucial first four bandpowers! In any case, the "official" word from BICEP is that any such speculation on the basis of Keck is to be discouraged, because the Keck data is still very preliminary and has not been properly checked.

Verdict

I'm a little bit worried about the various issues raised here, though overall I would say the odds are in favour of the B-mode detection being secure (this is a different issue to whether this detected signal is due to gravitational waves! More on that in the next post). I would not, however, put those odds at anywhere near 1 in 300,000,000,000 against there being an error, which is the headline significance claimed for the detection of a non-zero tensor-to-scalar ratio ($7\sigma$). If I were forced to quantify my belief, I would say something more like 1 or 2 in 100. That's not particularly secure, but luckily there are follow-up experiments, such as Keck and Planck itself, which should be able to reassure us on that score soon.

A final point: seeing the preliminary Keck data shown in a figure in the paper suggests to me that perhaps the final analysis of Keck data will now not be done "blind". I hope that's not the case, it would be very disturbing indeed if it were.