Pubsplained #2: How many forams for a good climate signal?

Citation

Thirumalai, K., J. W. Partin, C. S. Jackson, and T. M. Quinn (2013), Statistical constraints on El Niño Southern Oscillation reconstructions using individual foraminifera: A sensitivity analysis, Paleoceanography, 28(3), 401–412, doi:10.1002/palo.20037. (Free Access!)

Summary

We provide a method to quantify uncertainty in estimates of past climate variability using foraminifera. This technique uses numerous, individual shells within a sediment sample and analyzes their geochemistry to reconstruct seasonal and year-to-year variations in environmental conditions.

Here is a link to our code.

Pubsplainer

This plot shows how uncertainty in IFA statistics decreases (but not all the way!) as you increase the number of foraminiferal shells analyzed.

This plot shows how uncertainty in IFA statistics decreases (but not all the way!) as you increase the number of foraminiferal shells analyzed.

Planktic foraminifera are tiny, unicellular zooplankton that are widely found in the open ocean and can tolerate a large range of environmental conditions. During their short (2-4 weeks) lifespan, they build shells (or tests) made of calcium carbonate. The tests fall to the seafloor and continually become covered by sediments over time. We can access these foraminiferal tests using sediment-cores and analyze their geochemistry to unravel all sorts of things about past ocean conditions.

Typically, ~10-100 shells of a particular species are taken from a sediment sample, and collectively, analyzed for their isotopic or trace metal composition. This procedure is repeated with each subsequent sample as you move down in the core. Each of these measurements provides an estimate of the "mean climatic state" during the time represented by the sediment sample. In contrast, individual foraminiferal analyses (IFA), i.e. the geochemistry of each shell within a sample, can provide information about month-to-month fluctuations in ocean conditions during that time interval. The statistics of IFA have been used to compare and contrast climate variability between various paleoclimate time periods.

There are many questions regarding the uncertainty and appropriate interpretation of IFA statistics. We addressed some of these issues in this publication. We provided a code that forward-models modern observations of ocean conditions and approximates, with uncertainty, the minimum number of foraminiferal tests required for a skilled reconstruction. In other words: "how many shells are needed for a good climate signal?"

Armed with this algorithm, we tested various cases in the Pacific Ocean to obtain better estimates of past changes in the El Niño/Southern Oscillation, a powerful mode of present-day climate variability. We found that the interpretation of IFA statistics is tightly linked to the study location's climate signal. Namely, we found that the ratio of seasonality1 to interannual variability2 at a site controlled the IFA signal for a given species occurring throughout the year. We then demonstrated that this technique is far more sensitive to changes in El Niño amplitude rather than its frequency.

In the central equatorial Pacific, where the seasonal cycle is minimal and year-to-year changes are strong, we showed that IFA is skillful at reconstructing El Niño. In contrast, the eastern equatorial Pacific surface-ocean is a region where El Niño anomalies are superimposed on a large annual cycle. Here, IFA is better suited to estimate past seasonality and attempting to reconstruct El Niño is problematic. Such a pursuit becomes more complicated due to changes in the past synchrony of El Niño and seasonality.

Our results also suggest that different species of foraminifera, found at different depths in the water column, or with a particular seasonal preference for calcification, might have more skill at recording past changes in El Niño. However, care should be taken in these interpretations too because these preferences (which are biological in nature) might have changed in the past as well (with or without changes in El Niño).

You can use our MATLABTM code, called INFAUNAL, to generate your own probability distributions of the sensitivity of IFA towards seasonality or interannual variability for a given sedimentation rate, number of foraminifera, and climate signal at a core location in the Pacific. Do let me know if you have any difficulties running the code!

1 - The difference in environmental conditions between summer and winter, average over multiple years

2 Changes from year-to-year (could be winter-to-winter or summer-to-summer etc.) within the time period represented by the sediment sample

Pubsplained #1: How to fit a straight line through a set of points with uncertainty in both directions?

Publication

Thirumalai, K., Singh, A., & Ramesh, R. (2011). A MATLAB™ code to perform weighted linear regression with (correlated or uncorrelated) errors in bivariate data. Journal of the Geological Society of India, 77(4), 377–380. 
doi: 10.1007/s12594–011–0044–1

Summary

We present a code that fits a line through a set of points (“linear regression”). It is based on math first described in 1966 that provides general and exact solutions to the multitude of linear regression methods out there. Here is a link to our code.

Pubsplainer

Fitting a straight line through a bunch of points with X and Y uncertainty.

Fitting a straight line through a bunch of points with X and Y uncertainty.

My first peer-reviewed publication in the academic literature described a procedure to perform linear regression, or, in other words, build a straight line (of “best fit”) through a set of points. We wrote our code in MATLAB and applied it to a classic dataset from Pearson (1901).

“Why?”, you may ask, perhaps followed by “doesn’t MATLAB have linear regression built into it already?” or “wait a minute, what about polyfit?!”

Good questions, but here’s the kicker: our code numerically solves this problem when there are errors in both x and y variables… and… get this, even when those errors might be correlated! And if someone tells you that there is no error in the x measurement or that errors are rarely correlated - I can assure you that they are most probably erroneous.

York was the first to find general solutions for the “line of best fit” problem when he was working with isochron data where the abscissa (x) and ordinate (y) axis variables shared a common term (and hence resulted in correlated errors). He first published the general solutions to this problem in 1966 and subsequently published the solutions to the correlated-error problem in 1969.

If these solutions were published so long ago, why are there so many different regression techniques detailed in the literature? Well, it’s always useful to have different approaches to solving numerical problems, but as Wehr & Saleska (2017) point out in a nifty paper from last year, the York solutions have largely remained internal to the geophysics community (in spite of 2000+ citations), escaping even the famed “Numerical Recipes” textbooks. Furthermore, they state that there is abundant confusion in the isotope ecology & biogeochemistry community about the myriad available linear regression techniques and which one to use when. I can somewhat echo that feeling when it comes to calibration exercises in the (esp. coral) paleoclimate community. A short breakdown of these methods follows.

Ordinary Least Squares (OLS) or Orthogonal Distance Regression (ODR) or Geometric Mean Regression (GMR): which one to use?!

Although each one of these techniques might be more appropriate for certain sets of data versus others, the ultimate take-home message here is that all of these methods are approximations of York’s general solutions, when particular criteria are matched (or worse, unknowingly assumed).

  • OLS provides unbiased slope and intercept estimates only when the x variable has negligible errors and when the y error is normally distributed and does not change from point to point (i.e. no heteroscedasticity).

  • ODR, formulated by Pearson (1901), works only when the variances of the x and y errors do not change from point-to-point, and when the errors themselves are not correlated. ODR also fails to handle scaled data i.e. slopes and intercepts devised from ODR do not scale if the x or y data are scaled by some factor. Note that ODR is also called “major axis regression”.

  • GMR transforms x and y data and can thus scale estimates of the slope and intercept but works only under the condition when the ratio of the standard deviation of x to the standard deviation of the error on x is equal to that same ratio in the y coordinate.

Most importantly, and perhaps quite shockingly, NONE of these methods involve the actual measurement uncertainty from point-to-point in the construction of the ensuing regression. Essentially, each method is an algebraic approximation of York’s equations, and whereas his equations have to be solved numerically in their most general form, they provide the most unbiased estimates of the slope and intercept for a straight line. In 2004, York and colleages showed that his 1969 equations, (based on least-square estimation) were also consistent with (newer) methods based on maximum likelihood estimation when dealing with (correlated or uncorrelated) bivariate errors. Our paper in 2011 provides a relatively fast way to iteratively solve for the slope and estimate.

In our publication, besides the Pearson data, we also applied our algorithm to perform “force-fit” regression - a unique case where one point is almost exactly known (i.e. very little error and near-infinite weight) - on meteorite data and showed that our results were consistent with published data.

All in all, if you want to fit a line through a bunch of points in an X-Y space, you won’t be steered too far off course by using our algorithm.

References

#Pubsplained

I am introducing a new series on this blog called Pubsplained, where I plan on breaking down my peer-reviewed publications into (more) digestible blog-posts. The motivation for this is threefold:

  1. To see if its possible to broaden the audience of some of these manuscripts
  2. To be more productive on Paleowave
  3. To “keep in touch” with my older publications.

The idea is to provide an accessible summary (perhaps a tweet-length synopsis) on our publications, and also provide a little more background on the topic, including problems and challenges, for those who might be interested.