How much is enough? A new technique for quantifying precision of community surveys

March 31, 2015April 7, 20157 Comments

Being an ecologist is all about the trade-off between effort, and time and money. Given infinite amounts of both, we would undoubtedly sample the heck out of nature. But it would be an exercise in diminishing returns: after a certain amount of sampling, we would fail to unturn any stone that has not already been unturned. Thus, ecology is a balancing act of effort: too little, and we have no real insight. Too much, and we’ve wasted a lot of time and money.

I’m always looking for ways to improve my balance, which is why I was interested to see a new paper in Ecology Letters called “Measures of precision for dissimilarity-based multivariate analysis of ecological communities” by Marti Anderson and Julia Santana-Garcon.

In a nutshell, the paper introduces a method for, “assessing sample-size adequacy in studies of ecological communities.” Put in a slightly different way, the authors have devised a technique for determining when additional sampling does not really improve one’s ability to describe whole communities — both the number of species, and their relative abundances. Perfect for evaluating when enough is enough, and adjusting the output of time and money!

In this post, I dig into this technique, show its applications using an example, and introduce a new R function to assess multivariate precision quickly and easily.

Continue reading →

A practical guide to machine learning in ecology

February 6, 201515 Comments

Recently, I was exploring techniques to interpolate some missing environmental data, and stumbled across something called ‘random forest’ analysis. Random what now? I did a little digging and came across the massive and insanely complicated field of machine learning. I couldn’t find a concise guide to machine learning techniques, or when I might want to use one or the other, so I thought I would cobble together a brief guide on my own. Below is a rough stab at explaining and exploring different machine learning techniques, from CARTs to GBMs, using R.

Continue reading →

A basic tutorial to version control using git

November 4, 2013May 28, 20152 Comments

I’ve been hacking away at this post for a while now, for a few reasons. First, I’m a git novice, so I’m still trying to learn my way around the software. Second, this is an intimidating topic for those who are not used to things like the command line, so it was a challenge to identify which ideas were critical to cover, and which could be ignored without too much of a loss in functionality. Finally, there are always lots of little kinks to work out, especially in a software that is cross-platform. Therefore, please take the following with a grain of salt and let me know if anything is unclear, needs work, or is flat out wrong!

Continue reading →

Using parallel processing in R

June 13, 2013June 30, 2013Leave a comment

Lately I’ve been running a lot of complex models with huge datasets, which is grinding my computer to a halt for hours. Streamlining code can only go so far, but R is limited because the default session runs on only 1 core. In a time when computers have at least 2 cores, if not more, why not take advantage of that extra computing power? (Heck, even my phone has 2 cores.*)

Luckily, R comes bundled with the “parallel” package, which helps to distribute the workload across multiple cores. It’s a cinch to set up on a local machine:

Continue reading →

Continuing a ‘for’ loop in R after an error

May 20, 2013May 20, 20137 Comments

Lately, I’ve been using loops to fit a number of different models and storing the models (or their predictions) in a list (or matrix)–for instance, when bootstrapping. The problem I was running into was the for loop screeching to a halt as soon as a model kicked back an error. I wanted the function to register an error for that entry, then skip to the next one and finish off the loop.

Continue reading →

R^2 for linear mixed effects models

March 13, 2013March 14, 2016229 Comments

Linear mixed effects models are a powerful technique for the analysis of ecological data, especially in the presence of nested or hierarchical variables. But unlike their purely fixed-effects cousins, they lack an obvious criterion to assess model fit.

[Updated October 13, 2015: Development of the R function has moved to my piecewiseSEM package, which can be found here under the function sem.model.fits]

Continue reading →

Black theme for ggplot2

March 11, 2013June 6, 20164 Comments

[EDIT 06/05/16: See updated version of this code here for ggplot2 version 2.X.X: https://gist.github.com/jslefche/eff85ef06b4705e6efbc]

I’ve long extolled the virtues of using ggplot2 as a graphing tool for R for its versatility and huge feature set. One of my favorite aspects of ggplot2 is the ability to tweak every aspect of the plot using intuitive commands. With the recent release of version 0.9.2 though, creator Hadley Wickham overhauled the theme options, which broke my preferred black theme, theme_black(), found here. I’ve updated theme_black() to work with the current version of ggplot 0.9.3.1. Enjoy!

Continue reading →

Mapping in R, Part II

February 14, 2013October 14, 2013Leave a comment

The other day, I posted an introductory demo to mapping in R using some of the built-in maps. But of course there are only a few regions represented in the “maps” package: the US states, the US as a whole, Italy, France, and the world. Even then, there are some limitations to these: for instance, the USSR is still alive and kicking in the world of “maps.” If you are living in the 21st century–or working somewhere other than these locations–you may want to supply your own, more updated maps. The most popular filetype is, of course, the GIS shapefile. But to access, visualize, manipulate, and plot on shapefiles, it was formerly necessary to use ArcGIS, which is proprietary and thus costly. I’ll show you how to do it all in R!

Continue reading →

Mapping in R, Part I

February 12, 2013October 14, 20133 Comments

I’m consistently amazed at the capabilities of R: if it can be done, it can be done in R. And so is the case with mapping. Recently, I had the need to do some complicated geospatial analysis, and I wanted to do it in R for the obvious reasons: it’s free, it’s open-source, and there is a great support community. As it turns out, R has much of the functionality of ArcGIS, albeit with a lot less flash and a lot more hair-pulling. But once you’ve done the legwork to get the data in and formatted, and the functions set, it’s a snap to run through all sorts of data.

Continue reading →

Dealing with multicollinearity using VIFs

December 28, 2012December 9, 201312 Comments

Besides normality of residuals and homogeneity of variance, one of the biggest assumptions of linear modeling is independence of predictors. If one or more of the predictors in a model are correlated, then the model may produce unstable parameter estimates with highly inflated standard errors, resulting in an overall significant model with no significant predictors. In other words, bad news if your goal is to try and determine the contribution of each predictor in explaining the response. But there is hope!

Continue reading →

sample(ECOLOGY)

Random thoughts on ecology, biodiversity, and science in general

Tag: R

How much is enough? A new technique for quantifying precision of community surveys

A practical guide to machine learning in ecology

A basic tutorial to version control using git

Using parallel processing in R

Continuing a ‘for’ loop in R after an error

R^2 for linear mixed effects models

Black theme for ggplot2

Mapping in R, Part II

Mapping in R, Part I

Dealing with multicollinearity using VIFs