Besides normality of residuals and homogeneity of variance, one of the biggest assumptions of linear modeling is independence of predictors. If one or more of the predictors in a model are correlated, then the model may produce unstable parameter estimates with highly inflated standard errors, resulting in an overall significant model with no significant predictors. In other words, bad news if your goal is to try and determine the contribution of each predictor in explaining the response. But there is hope!
I recently finished a two week stint on beautiful Maria Island, a national park off the coast of Tasmania, to work up the data collected in the first round of the Reef Life Survey co-run by Drs. Graham Edgar and Rick Stuart-Smith at the Institute of Marine and Antarctic Studies at the University of Tasmania. The Reef Life Survey network is a non-profit entity whose aim is to, “improve biodiversity conservation and the sustainable management of marine resources through the collection of high-quality biodiversity information at spatial and temporal scales beyond those possible by scientific dive teams.” They do this by utilizing a network of enthusiastic trained recreational divers who perform standardized transects at hard substrate systems worldwide. Currently there are slightly over 1800 sites in the network from Tierra del Fuego (50 S) to Svalbard (nearly 80 N), although a good number of sites are concentrated in Australia where the network was trialed over the past few years.
Jeremy Fox, ecologist and blogger extraordinaire at Dynamic Ecology, has just made available a new publication called: “Can blogging change how ecologists share ideas? In economics, it already has” for Ideas in Ecology and Evolution. He has shared the pre-print version (where else) on his blog, and you can check it out here: http://dynamicecology.files.wordpress.com/2012/11/4457-8353-1-ce-1.pdf
He makes a number of good points. Here is my admittedly biased take:
This is a bit of an aside to comment on a topic that is becoming increasingly relevant not only to my graduate career, but I think to ecology in general: large-scale collaborations. The days when an ecologist could just throw some tools and some PVC in a rucksack and head down to the ol’ rocky intertidal are slowly being usurped by complex and sophisticated national and international networks (even though we still use a heck of a lot of PVC).
While some might lament the death of the “backyard ecologist,” I for one welcome the change. Large-scale networks generally preserve the inference gained from local-scale experiments–after all, many are simply conglomerates of the same experiments done in a bunch of different locations–with the added benefit of being able to investigate the generality of patterns and processes in nature as a whole. And isn’t generality kind of the goal of all science? As humans, we would like to think that the natural world obeys some base set of laws that apply regardless of where you’re working or what you’re working with (even though on some dark and rainy days, I think that may not be the case).
For those (few?) of you who are marine benthic ecologists, you may be familiar with the size-fractionated abundance-to-biomass conversion equations proposed by Graham Edgar for benthic/epifaunal invertebrates. Basically, he derived a general equation relating epifaunal abundance to biomass (in mg AFDM), and biomass to rates of secondary production (in ug AFDM per day). For those of you working with small benthic invertebrates, these equations can cut out a lot of work in having to ash each species, with the added benefit of being able to preserve and retain all specimens. However, it can be a tad unwieldy to implement, unless of course, you’re dealing with R!
Often in ecological research, we are interested not only in comparing univariate descriptors of communities, like diversity (such as in my previous post), but also in how the constituent species — or the composition — changes from one community to the next.
One common tool to do this is non-metric multidimensional scaling, or NMDS. The goal of NMDS is to collapse information from multiple dimensions (e.g, from multiple communities, sites, etc.) into just a few, so that they can be visualized and interpreted. Unlike other ordination techniques that rely on (primarily Euclidean) distances, such as Principal Coordinates Analysis, NMDS uses rank orders, and thus is an extremely flexible technique that can accommodate a variety of different kinds of data.
For decades, biologists and ecologists have largely characterized biological diversity using metrics based on entropy, a concept rooted in information theory that suggests one can quantify the degree of uncertainty associated with predicting bits and pieces of information. In ecology, this has boiled down to determining whether species drawn from a community are the same or different. The metrics will sound familiar to anyone who has taken an introductory ecology class–the Shannon index, Simpson diversity–but Lou Jost, Anne Chao, and others have highlighted the fact that the non-linearity of these indices may lead researchers to grossly misinterpret the underlying diversity of the community in question.
I’ve decided to start a blog as a repository for all the interesting papers, statistics tips, R code, and other snippets of useful miscellany that I come across on a daily basis. Hopefully this blog will be of interest to others thinking about the same concepts and struggling with the same issues as I do.