Linear mixed effects models are a powerful technique for the analysis of ecological data, especially in the presence of nested or hierarchical variables. But unlike their purely fixed-effects cousins, they lack an obvious criterion to assess model fit.
[Updated October 13, 2015: Development of the R function has moved to my piecewiseSEM package, which can be found here under the function sem.model.fits
]
In the fixed-effects world, the coefficient of determination, better known as R2, is a useful and intuitive tool for describing the predictive capacity of your model: its simply the total variance in the response explained by all the predictors in your model.
In a least squares regression, R2 is the sum of differences in the observed minus the fitted values, over the sum of the deviation of each observation from the mean of the response, all subtracted from 1:
For models estimated using maximum likelihood, this equation was generalized to the ratio of the likelihood of the intercept-only model over the full model, raised to the 2/N, and all subtracted from 1:
Both equations can be interpreted identically.
R2 has a number of useful properties. Its independent of sample size, bound (0,1), and dimensionless, which makes it ideal for comparing fits across different datasets.
I should note, however, that its a poor tool for model selection, since it almost always favors the most complex models. If the goal is to select among the best models, an information criterion approach (such as AIC or BIC) is preferred, since these indicators penalize for the number of predictors.
Unfortunately (and this is a common pitfall), the best model as determined by AIC is not synonymous with a good model. In other words, AIC provides a great test of relative model fit, but not absolute model fit. AIC values are also not comparable across models fit to different datasets. Thus, R2 and AIC both have their place in ecological statistics.
Often, researchers using mixed models report an R2 from a linear mixed model as simply the squared correlation between the fitted and observed values (see here), but this is a pseudo-R2 and is technically incorrect.
Why? Because a mixed effects model yields a variance associated with each random factor and the residual variance, so its not entirely clear which to use when calculating the R2 (the approach above ignores this issue by choosing to calculate R2 relative to only the residual variance, which omits any structure in the data imposed by the random factor).
When hierarchical models with >1 levels are considered, this problem is exacerbated since now we are dealing with variances at multiple levels, plus the residual error. Some people have discussed calculating an R2 value for each level of the random factor, but this can lead to negative R2 values when addition of predictors reduces the residual error while increasing the variance of the random component (or vice versa), even though the sum of the variance components remains unchanged.
So what have we done, historically-speaking? We’ve compared AIC values for the full and null models, or calculated all manner of pseudo-R2s, and quietly (and perhaps for some, ashamedly) went about our business.
Which is why I was so happy to see a paper with the title “A general and simple method for obtaining R2 from generalized linear mixed-effects models” by Shinichi Nakagawa and Holger Schielzeth appear in my newsfeed. Basically, they have derived two easily interpretable values of R2 that address the above issues, specifically that of negative pseudo-R2s with more predictors, while still honoring the random structure of the data.
The first is called the marginal R2 and describes the proportion of variance explained by the fixed factor(s) alone:
The fixed-effects variance is in the numerator. The denominator is the total variance explained by the model, including (in order): the fixed-effects variance, the random variance (partitioned by level l), and the last two terms add up the residual variance and are the additive dispersion component (for non-normal models) and the distribution-specific variance.
The second is the conditional R2, which describes the proportion of variance explained by both the fixed and random factors:
In this case, the numerator contains both the variance of the fixed effects, as well as the sum of random variance components for each level l of the random factor. You’ll note the denominator is identical.
This approach was later modified by Johnson (2014) to include random slope models.
While this may not be the holy grail of fit statistics (and the authors concede on this point as well), in my opinion, it does address many of the problems of pseudo-R2s and so is going to be my preferred statistic.
The authors also note (and I echo) that looking at R2 and AIC values is not a replacement for testing basic model assumptions, such as normality of predictors and non-constant variance, and should be used in tandem.
I’ve written a function in R called sem.model.fits
(along with an amiable fellow named Juan Sebastian Casallas) that calculates the marginal and condition R2s, as well as AIC values for a complementary approach to model comparison. The input is either single model, or a list
of models of most common classes, including models from the lme4
and nlme
packages. The output is a data.frame
where each row is a model in the same order as in the input list, with the model class, marginal and conditional R2s, and the AIC value as columns if the response is identical across all models.
You can find an example of this function in action here: https://github.com/jslefche/piecewiseSEM/blob/master/README.md#get-r2-for-individual-models
I’ve since incorporated this function into my piecewiseSEM package for piecewise structural equation modeling. Since the function found here will no longer be actively developed by me, I recommend using the function in the package to minimize the potential for issues.
[Edit: Also check out this awesome tutorial for quantifying uncertainty around these estimates!]
References
Nakagawa, S., and H. Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2): 133-142. DOI: 10.1111/j.2041-210x.2012.00261.x
Johnson, Paul C.D. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12225.
Awesome paper, I’m looking forward to reading it! By the way, any chance you’ll write such a code for PQL objects from the MASS package? Or, are marginal and conditional R^2 for PQL estimated GLMMs still unavailable?
Nice looking function!
Just a heads-up – the development version of lme4 no longer uses the “mer” class (I believe its changed to “merMod”), nor @ for accessing slots (although a quick look suggests that this doesnt matter – you dont use it). I’m not sure if attr names have changed or not though…
Hi again,
I just tried your function out on a list of lme objects and it broke for 3 reasons
1) you did’nt do the Rc.list part for lme models;
I copied the line from the mer part to fix this, but…
2) colSums only works if there are >1 random effect;
I added an ifelse statement to sort this, and…
3) you include AIC.list when you bind the results together but you never make it so I removed that from the call.
Here’s my edited version of the lme section for you. Hope you dont mind my tinkering with your code!
if(modclass[1]==”lme”) {#For models fit using lme
#Get design matrix of fixed effects from model
Fmat.list=lapply(modlist,function(i) model.matrix(eval(i$call$fixed)[-2],i$data) )
#Get variance of fixed effects by multiplying coefficients by design matrix
VarF.list=lapply(seq_along(modlist),function(i) var(as.vector(fixef(modlist[[i]]) %*% t(Fmat.list[[i]]))) )
#Get variance of random effects by extracting variance components
VarComp.list=lapply(modlist,function(i) VarCorr(i) )
VarRand.list=lapply(VarComp.list,function(i) as.numeric(i[rownames(i)!=”Residual” & rownames(i)==”(Intercept)”,”Variance”]) )
#Get residual variance
VarResid.list=lapply(VarComp.list,function(i) as.numeric(i[rownames(i)==”Residual”,”Variance”]) )
#Calculate marginal R-squared (fixed effects/total variance)
Rm.list=lapply(seq_along(modlist),function(i) VarF.list[[i]]/(VarF.list[[i]]+sum(VarRand.list[[i]])+VarResid.list[[i]]) )
Rc.list=lapply(seq_along(modlist),function(i) (VarF.list[[i]]+ifelse(length(VarRand.list[[i]][1])==1,VarRand.list[[i]][1],colSums(VarRand.list[[i]])))/(VarF.list[[i]]+ifelse(length(VarRand.list[[i]][1])==1,VarRand.list[[i]][1],colSums(VarRand.list[[i]]))+VarResid.list[[i]]) )
#Bind R^2s into a matrix and return
Rsquared.mat=do.call(cbind,list(Rm.list,Rc.list)); colnames(Rsquared.mat)=c(“Marginal”,”Conditional”)
return(Rsquared.mat)
}
Great, thanks so much for catching these errors! I think you can see where I was going with expanding the function to return AIC scores but since it requires ML vs REML estimation, the implementation was a little too lengthy for what I wanted to do here. Also, nice catch with the colSums: I went through this function with the example provided in the Nakagawa & Schielzeth supplements, which of course used multiple random effects.
Not a problem…your function actually came along at an ideal time for me! For the AIC scores you could hijack aictab in the AICcmodavg package. It produces a table of AIC (or AICc, QAIC, QAICc) values. It only provides the AIC for REML or ML, whichever was used in the fitting process, but using lapply and update you could get around this (for lme, lmer should be very similar though):
REML.AIC <- aictab(mods, 1:8, second.ord=FALSE)[,"AIC"] # for AICc second.ord=TRUE and [,"AICc"]
ML.AIC <- aictab(lapply(mods, function(i) update(i, method="ML")), 1:8, second.ord=FALSE)[,"AIC"]
Although it might be sensible to do a logical test on whether the model was fit with ML or REML first.
HTH
When I add two crossed random factors in nlme, I do get following result:
# Marginal Conditional
#[1,] 0.5606305 NA
I think (based on same model in lme4) that only the conditional R2 is given. Anyone any idea how to solve this?
Try this again, I updated the function to better incorporate multiple random effects!
Thanks for writing this blog and an function, based on our paper! Also, I note that the author of MuMIn package has written a function based on our paper. By the way, my name is Shinichi Nakagawa – not Shinichia N. -should be S. Nakagawa. Holger’s name is correct (see http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00261.x/abstract).
I did notice the r.squaredGLMM function in the MuMIn package, but only after I wrote this post. At any rate, either function will work fine. Sorry about the reversal of names, it’s since been fixed!
An error using the lmer package / lme function:
> r.squared.lme(X)
Error in r.squared.lme(X) :
could not find function “.rsquared.glmm”
Can you help me please? Thanks!
Great code, thanks for sharing! In linear regression we get a r2 and p-value:
“Multiple R-squared: 0.7239, Adjusted R-squared: 0.6319
F-statistic: 7.867 on 5 and 15 DF, p-value: 0.0008221”
Based on the r2 uncertainty (or other method) is it possible to estimate a p-value for the r2 of a mixed model? thanks!
If I recall, the F-statistic is simply a test between the full and reduced (null) models. You can achieve the same test by fitting the null model (random effects only) and passing both models to anova():
library(lme4)
fm1=lmer(Reaction~Days+(Days|Subject),sleepstudy)
fm0=lmer(Reaction~(Days|Subject),sleepstudy)
anova(fm1,fm0)
Note that this is an analysis of deviance. Cheers, Jon
Excellent. I would appreciate if you could send me your pre-print on meta-analysis (Ecology paper).
Hi there,
Good job with the function.
I´m initiating my experience with mixed models. I’m trying to use the function but I always get the following error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘VarCorr’ for signature ‘”lme”’
What does this mean?
Hi dema,
Try updating your packages in R. Have you tried running the example code I provided? I just updated it so its a little more straightforward.
Cheers!
Hi, it worked nicely now!
Thank you very much for the code and the help!
Cheers!
Hello,
Thanks very much for writing this function. I tried it out on some models I built using lmer(), but I keep getting this error:
> rsquared.lme(list(lmer.GPA.0a))
Error in as.vector(fixef(i) %*% t(i@X)) :
error in evaluating the argument ‘x’ in selecting a method for function ‘as.vector’: Error in UseMethod(“fixef”) :
no applicable method for ‘fixef’ applied to an object of class “mer”
Any idea what could be going wrong here? The model is not particularly complicated:
> lmer.GPA.0a <- lmer(GPAMean.2 ~ DLABScore.cent + (1|Class) + (1|Language), data=onlyesqdata.sub.novice)
Cheers,
Charles
Hi Charles,
Hmm, fixef should work with lmer. Try closing R and clearing your workspace, updating R and your packages, and re-running your code!
Cheers, Jon
I am getting a similar error message as Charles, Error in as.vector(fixef(i) %*% t(i@X)) :
error in evaluating the argument ‘x’ in selecting a method for function ‘as.vector’: Error in fixef(i) :
error in evaluating the argument ‘object’ in selecting a method for function ‘fixef’: Error: object ‘i’ not found
I am new to making functions in R, did I go wrong some where in putting this function into R?
Hi Jon,
I managed to solve my last comment, but now when I run the function it gives me NA as the Marginal and Conditional R2. I am trying to run this on glmer functions, but from reading the package they should still fall into class mer. Any thoughts? I appreciate your reply.
Thanks,
Michelle
Hi Michelle, Are you using the development version of lme4? I wrote this using 0.999999-2. You may see the comment by @bulumakao for a fix for later versions. Cheers, Jon
Hi Jon,
I am quite a beginner at mixed-effect-models so my questions might seems a bit odd.
First, does this function is able to take into account further variables than the intercept for random effect? I mean doing something like : y~1+var+(1+var|grouping_var).
Another question is about the validity of decomposing the random effects variance share, that is instead of summing columns of VarRand, doing something like : varRand/(varF+colSums(varRand)+varRes) and get an estimate of random effects explained variance per random effect?
Thanks.
Cheers,
basile
Hi basile,
This method should be able to account for varying slope/varying intercept models.
As for your second question, I think what you’re looking for are the variance components. They should be given in the summary table for the models. In this scenario, one is typically interested in the fixed effects, so determining the proportion of variance attributed to the random effects may not be very informative in terms of inference. Did you have a specific application in mind?
Cheers, Jon
Hi,
Thanks for making this function available.
I’m also asking about how the function deals with random intercept+slope models, and other models where the dimension of the covariance matrix is >1.
I’m fitting a Gaussian GLMM using lme4 so the line in the function that sums the random effect variances is:
# Get variance of random effects by extracting variance components
VarRand=colSums(do.call(rbind,lapply(VarCorr(i),function(j) j[1])))
This part…
lapply(VarCorr(i),function(j) j[1])
…extracts the variance of each random effect, but where j is a covariance matrix (as with a random slope+intercept model) j[1] extracts only the [1,1] element, which for a random slope+intercepts model will be the variance of the random intercept, omitting the slope variance and 2 x the covariance. I don’t think this can be right. You can illustrate this by simply adding a constant to the x variable in (x | group), or changing the reference category when x is a factor. Rm and Rc will change (but shouldn’t) because each term in the covariance matrix depends on where the y-axis crosses the x-axis, while the sum of the covariance matrix is independent of this choice. To include all the random effect variances by summing the entire covariance matrix of each RF you could try:
VarRand=sum(sapply(VarCorr(i),function(j) sum(j)))
I noticed that r.squaredGLMM in the MuMIn package sums the RF variances in the same way so I’m outvoted on this, but I’m still fairly confident that the entire covariance matrix is required.
PS sum(sapply(VarCorr(i),function(j) sum(j))) is of course unnecessarily long. Better is
sum(sapply(VarCorr(lm0.randslopeplot),sum))
or
sum(unlist(VarCorr(lm0.randslopeplot)))
Sorry, that should be:
sum(sapply(VarCorr(i),sum))
or
sum(unlist(VarCorr(i)))
Hi Jon – I’ve changed my mind… My proposed solution above is wrong, i.e. it doesn’t make the method work for random slopes models. The entire covariance matrix is required, but so is the model matrix. The correct random effects variance component for each random effect is mean(diag(X %*% Sigma %*% t(X))), where X is the model matrix for the random effects formula, including a column of 1s for the random intercept and a data column for each random slope, and Sigma is the random effect covariance matrix, with dimensions corresponding to the columns of X. The diag(…) part calculates the random effect variances for each observation in the data set. When there are random slopes, these observation-specific variances will differ, hence the need to average them. For a random intercepts model there will be no inter-observation variation, and each element of diag(…) will be the random intercept variance, so the code will return the same result as the original method.
Hi Jon,
I was asking because in case of a slope in random effect, the following line only take the intercept value of the VarCorr output
VarRand.list=lapply(modlist,function(i) do.call(rbind,lapply(VarCorr(i),function(j) j[1])) )
VarCorr output is:
$`site_id:subject_id`
(Intercept) from_bl
(Intercept) 0.005091415 -0.0018503245
from_bl -0.001850324 0.0006724458
attr(,”stddev”)
(Intercept) from_bl
0.07135415 0.02593156
attr(,”correlation”)
(Intercept) from_bl
(Intercept) 1 -1
from_bl -1 1
$site_id
(Intercept)
(Intercept) 3.198851e-05
attr(,”stddev”)
(Intercept)
0.005655839
attr(,”correlation”)
(Intercept)
(Intercept) 1
attr(,”sc”)
[1] 0.07231825
My goal is to evaluate how much variance is explained by fixed effects (dx groups, …), random effects (subjects and sites) and the residuals to show how little the fixed effects (and also inter-site random effect) are regarding inter-subject variability and unexplained variance.
I have also looked at LMERConvenienceFunctions::pamer.fnc ( http://www.inside-r.org/node/78388 ) that outputs a R_squared per fixed-effect but it seems that the sums of these is not equal to the marginal variance.
Thanks for your insight.
Cheers.
basile
Hi Basile, Why not use an AIC-based model selection procedure? Build a model with your fixed, your random, and your fixed and random and compare to get a sense for how much explanatory power each factor has. I interpret the R^2 method above as a way to evaluate the fit of a model that has already been vetted. What do you think? Cheers, Jon
Hello Jon,
Thanks for putting together this very useful code! Any idea why I get the below error, whether I use your example or my own data? I’ve tried updating packages, restarting R, and clearing the workspace.
[1] “Error: Function requires list of objects of class ‘mer’ or ‘lme'”
Thanks,
Kristine
Hi Kristine, Are all your models built using lmer (or lme)? If you have just a regular lm, this function will kick back that error! Cheers, Jon
Yes, I’m user lmer and I wasn’t successful [after loading the required packages] even with your example code. Thanks for your help.
Hey Kristine, If you’re using the development version of lme4 from the rforge repository, your model will have a mermod (I think there might be a captial letter somewhere). You could check str(model) to find out if your model is actually of the mer class or not.
Thanks for the input. My model is in the mer class, so I’m not sure what wasn’t working, but I got it to run without errors this morning! Maybe restarting R and my machine was the key. I plan to use this code regularly, so thanks again for posting the code.
In case anyone else encounters this issue, I figured out the issue. I had loaded the package lmerTest, which is not compatible with the rsquared.lme function, since lmerTest masks the lmer object from lme4.
Good to know! I just started messing with lmerTest…good for the migrant SAS user, but will probably get some flak from the R community.
Hi, I am running mixed models and have tried to calculate the R2 using your function. I only load the lm4 package and still get the following message:
Error en as.vector(fixef(i) %*% t(i@X)) :
error in evaluating the argument ‘x’ in selecting a method for function ‘as.vector’: Error en UseMethod(“fixef”) :
no applicable method for ‘fixef’ applied to an object of class “mer”
Any ideas how to overcome this problem?
As Kristine, I would like to use this function frequently with several more models.
In fact, when I run the example test either in R or Rstudio I get the same message. I checked and my models are both “mer” objects.
Thank you very much,
Hi Ivan, Try the function again, I just updated it to address some of the issues raised in the comments and added a few more useful bits!
Hi,
Awesome we can finally get to asses the model fit for mixed models so thanks again.
For me, the lme4 version of the function is still not working, It works with the nlm package but not when I use lme4. I get the same messages when I run the examples written after the function, It works for nlm objects but not for lme4. If it still does not work, I could always re-write my models using nlm but it would take a while since I have a lot, and having a function that works with nlm4 would be just perfect.
Hi Ivan,
The best thing to do is to start a fresh session. Sometimes when you load nlme after lme4 some of the required functions are masked and no longer recognise mer objects. Could that be the problem?
¡Thanks a lot for the script, Mr Lefchek! A referee asked me for variance partition of several nlme analysis and I didn’t know how to obtain variance explained by fixed factors.
Great function thank you! I was trying to run the function on a model involving a subset of my original data- as it turns out you must clear the unused arguments of your subset with the lapply function like this: (where “D” is a category within the variable “type”).
> new.set new.set[ ]<-lapply(new.set, function(type) if (is.factor(type)) type[drop=T] else type)
In hopes this saves some folks the few hours of hair pulling it caused me!
Somehow my code got messed up upon posting…
Should be:
new.set<-subset(data,type=="D")
new.set[ ]<-lapply(new.set,function(type) if(is.factor(type)) type[drop=T] else type)
Hmm, interesting. It appears to be more of an issue of the way you subset your data than with the function, though. Good to know, regardless–thanks for the tip!
Copied the code above yesterday, lme4 didn’t work but lmer did.
Here are 2 modifications for models fit using lme4 version 0.99999911-5
1) add class(i)==”lmerMod” to the list
2) change VarF=var(as.vector(fixef(i) %*% t(i@X)))
to VarF=var(as.vector(fixef(i) %*% t(i@pp$X)))
Then tested it on the example code, both lme4 and lmer models give the same results.
Cheers, SLT.
Thanks for passing this along. Looks like you’re using a dev version from their repo (or did you compile from Github?). At any rate, it appears lme4 version 1.0 is set to be released soon. I will probably update the function when that goes live as it appears they’ve changed not only the name of the class but also the S4 slots. Cheers, Jon
Ran into the same problem using version lme4 1.0-4 (submitted to CRAN). I suggest instead of adding lmerMod to the list to change class(i)==”merMod” to inherits(i, “merMod”), as lmerMod is a subclass of merMod.
Just grabbed lmer4 version 1.0-4 and confirmed. Updated function to reflect this! Looks like lmerTest no longer works with lme4 1.0+
lmerTest works with lme4 1.0+ . It is on CRAN now
I had found Nakagawa and Schielzeth paper and then your code and thought ha ha my solution. Unfortunately, I get results but they don’t make sense.
I am trying to calculate a ‘local’ effect size (ES) measure and I need it to be standardised so that it can be compared across dependent variables and experiments. I found a paper by Selya et al (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3328081/) which proposed a Cohen’s f2 ((R2for a treatment model-R2 for a null model)/(1-R2for a treatment model)). So I wanted to use your code to calculate the R2 for these models to then calculate the Cohen’s f2. My data is analysed by a mixed model framework with one random effect and two-three fixed effects but I am only interested in the ES associated with one of the fixed effects. I am using nlme library and R version 3.0.1.
What’s the problem with the results – well I know I have a highly significant treatment effect and yet there is no differences in the R2 between the treatment and null model.
> LM_R2_null
Class Marginal Conditional AIC
1 lme 0.5510203 0.64475 1865.515
> LM_R2_genotype=rsquared.lme(list(leanMass_genotype_Model))
> LM_R2_genotype
Class Marginal Conditional AIC
1 lme 0.5648557 0.6442183 1855.959
> Cohenf=(LM_R2_genotype$Conditional-LM_R2_null$Conditional)/(1-LM_R2_genotype$Conditional)
> Cohenf
[1] -0.00149466
I wondered if it was that the random effect in the absence of the treatment variable is accounting for more of the variance. However this sort of goes against my whole testing of the treatment effect by anova(model_treatment, model_null). Any thoughts would be greatly appreciated.
Best
Natasha
Hi Natasha, I suspect the trouble may come from the way you are specifying your null model. It should just be the random intercept. Shoot me an email if you need more specific help. Cheers, Jon
Hi Jon,
Thanks for your thoughts, you are correct that my null model isn’t an intercept only model, however this was deliberate and has arisen with the goal in mind of calculating Cohen’s f.
In the manuscript (link provided in earlier post), it suggests the use of a Cohen’s f to estimate the local effect size for a MM situation as follows:
“f2=(R2AB−R2A)/(1−R2AB) [2]
where B is the variable of interest (i.e., either smoking quantity or nicotine dependence score),
Ais the set of all other variables (i.e., gender and depending on what B is at the moment, nicotine dependence score or smoking quantity), R2AB is the proportion of variance accounted for by A and B together (relative to a model with no regressors), and R2A is the proportion of variance accounted for by A (relative to a model with no regressors). Thus,the numerator of (2) reflects the proportion of variance uniquely accounted for by B , over and above that of all other variables”
So you can see that the R2A (which I called null model) isn’t an intercept only model rather just excludes the fixed effect of interest. Now I know the data, and that whilst the remaining two fixed effects contribute largely to the variation, the treatment effect of interest is highly significant and should account for a significant proportion of the variance. So I would expect
R2A to be lower than R2AB.
So it suggests that either my main effect is minor or the R2 is not a sensitive enough measure for the main effects influence or…… any ideas?
Best wishes
Natasha
Thanks for putting together and sharing your code. It’s really valuable to have people around sharing their knowledge!
I’m using generalized linear mixed models (family= poisson) and I saw in your most recent version of the code that the “GLMM fit to Poisson distribution not yet supported”. Why is that? Is it something to do with the glmer function or with your code?
Right now, I’m using an old version of the lmer package, where I use the function lmer() instead of glmer(), because the new one gives me the error “Error en `[[<-.data.frame`(`*tmp*`, i, value = integer(0)) : replacement has 0 rows, data has" 4211".). I found somewhere in the internet that this is related with a bug in the new version of lme4. In this old version I can fit my data with a Poisson error just fine.
I was wondering if you would recommend adapting your code using the commented section of the code for glmer (family=poisson). Do you have a better option?
Many thanks in advance for your help!
natalia-
Hi Natalia, Thanks for checking out the function. If you refer to the Nakagawa & Schielzeth paper on which this is based (link in post) you will see the calculations for generalized linear mixed effects models (binomial and poisson) are a little different than for general linear mixed models. The problem is that if you use my function on, say, a Poisson model, R will still generate a value, but that value will be incorrect. So best to steer clear for the time being, although I plan on taking a little closer look at this in the near future! Cheers, Jon
Hi Natalia, I updated the function this morning. It should work for Poisson models now, but I’m still not sure if I’m constructing the null model correctly. If you encounter any errors, let me know. Cheers, Jon
Hi Jon, Thank you sooo much! I really appreciate your help. I’ll keep you posted about how it works. Cheers. Natalia
Hi Jon, Sorry to bother you again. The residual variance i get is NA [ attar(varCorr(i),”sc”) is NA.] I tried to find out why in the web, but couldn’t find anything. Do you have any idea??
Thanks again!
cheers – n
Hi Jon,
We have applied this code as part of a mixed model analysis to 800 datasets and the results are not quite what we expected. So we used the conditional R2, but I was wondering why you provided both the marginal and conditional R2? In what situation would you want to use the marginal where you consider just the fixed factors alone?
Hi Natasha, I recommend reading the original paper (Nakagawa & Schielzeth, reference listed at the end of the post). They recommend reporting both the conditional and marginal R^2, which have slightly different interpretations. They also provide a worked example that you should find helpful. Send me an email if you are still having issues. Cheers, Jon
Jon,
Wanted to thank you very much for creating this function; extremely helpful to have these new methods readily available. I’m glad that you added capabilities for the Poisson, but I seem to be getting strange output. For a single model with an interaction between two variables and an extra covariate (total of 3 variables and 4 coefficients in the glmer model), I get three different R^2 values when I use the rsquared.glme function. I imagine that it’s having trouble creating a single null model, if I’m not mistaken. Any advice you could give would be well appreciated.
By three different R^2 values, I mean 3 different lines, each with a separate marginal and conditional R^2, as well as an accompanying AIC.
I knew this would be an issue. Extracting the fixed effects from the full model is more difficult than I anticipated, as `fixef` has some weird quirks. I will work on it and post an update when I’ve found a solution. To StackOverflow! Thanks, Jon
Done and done. Check the model now!
I’ve been playing with your code this morning but it’s failing for me on objects from nlme(). I ran your example code:
> detach(package:lme4,unload=T) #Parts of this package conflict with lme4
> require(nlme)
> mod0=lm(y~fixed1,data)
> mod1=lme(y~fixed1,random=~1|rand2/rand1,data)
> mod2=lme(y~fixed1+fixed2,random=~1|rand2/rand1,data)
> rsquared.glmm(list(mod0,mod1,mod2))
Error in if (class(i) == “glmerMod” & summary(i)$family == “binomial”) { :
argument is of length zero
>
Hi Robert, They keep switching up model classes. I ran into a similar error for lmer models built in lmerTest. I’m planning on looking into it soon but in the interim, you can extract the relevant error structure from the function (e.g., normal, binomial, etc.) and run the code by hand! Cheers, Jon
Had the same issue yesterday and used:
rsquared.lme(list(mod0,mod1,mod2))
*as opposed to rsquared.glmm albeit with the code before the most recent update… so not sure if it works with current version… Jon? Have the code if you need…
Dear Jon
Thanks for your code! Very useful. I used it for LMM and it worked out fine. I am new in mixed models and I am not sure how to interpret that I get the same values for the marginal and conditional R2 in one model. That means that there is not much variability in the random effect, but I am not sure if there are further implications. I have repeated measures in my experiment, that’s why I am using mixed models (and my model has two fixed and one random effect with a random intercept model). Any advice would be more than appreciated.
Thank you very much!
Hi Bea, Check out the paper (linked at the bottom of the post) for an interpretation of the two types of R2. Also check the variance components of the random effect in the model output (should be the top item, including the residual error for the model). If this is low, you may consider dropping the random term and comparing the full vs reduced model using something like AIC. It may be that there is not much autocorrelation as you thought. You can also test using something like Moran’s I to better decide whether to retain the random term in your model. Cheers, Jon
Thank you very much Jon! I really appreciate your advice!
Cheers
Hi Jon,
Thanks for accepting my changes in the code! I think the link to the github repo got lost in the blog post update, do you mind posting it again?
Thanks!
Right you are, fixed!
Thanks again for helping with the function–it’s much improved and works great!
Hello,
and thank you so much for sharing this – it really saved me! Because I first tried with the examples from the paper, but they wouldn’t work for me (and I was going nuts trying to find out what I did wrong) and than I found your script and now I am able to get the R2-values for my analysis, which is awesome!
So, how do I properly cite your script?
Cheers,
Angela
Hi Angela, I think it’s sufficient to simply cite the original publication. Happy to have helped! Cheers, Jon
Alright, I planed on doing that anyway, but thought you should get some more credit for this.
Would you mind answering another question I have on the calculation from R2 of mixed models? Because the thing is, one of the reviewers of my paper actually asked me if I could provide not only R2-vlaues for my mixed models, but partial ones for each factor. Now, I am wondering if this is actually a reasonable request. Because with all the reading I’ve done, I feel like even the calcualtion of the R2-values in itself is still in developmental stage for these models and I don’t think it would be wise to try and further split them up.
So, do you have any thoughts about this?
Cheers,
Angela
Hi Angela, You can gain a sense for the contribution of each of the fixed factors to the explained variance by examining the variance components, or better yet, simply reporting parameter estimates. An alternative would be to perform some kind of model selection to identify the predictors that are not contributing significantly to the overall model. Hope that helps! Jon
Wow, thank you Sir, you just made my day! This function yields the same results as I found, only this will save me a lot of time and works like a charm! Great work and nice paper by the way from Nakagawa et al.
Greetings from the Netherlands
Hi Jon,
Thanks for posting this, it has been extremely helpful. Any pointers/ideas on how to deal with negative binomial distributions?
Thanks
Safra
Hey,
Did you get any pointers in the end? I am experiencing the same problem and can’t seem to find an answer. I would be very interesting in how you managed to compute (if you did so) your R2s with a negative binomial distribution.
Thanks so much for any input!
Gabrielle
Hi Gabrielle, Not yet…but I opened an issue on GitHub and will hopefully get to this soon. Cheers, Jon
Hi Jon
We are using your function on an multi-lake productivity and nutrient concentration data set. Some of the data is from multiple years from the same lake while some lakes are only represented once. We are using mixed effect models (lme) to remove the pseudoreplication caused by using the same lake over multiple years. We are finding that the conditional R squared is up to two to four times the marginal value (one extreme case had a jump from 0.15 to 0.82!). This is making us wonder if we are overlooking something in our data. Have you heard of anybody else having similar experience?
We also find that the AIC from your script is different from the one provided in the lme output. Any thoughts on the discrepancy?
Many thanks!
Paul
Hey Paul, regarding the AIC discrepancy, note that the r.squared.lme function updates the model’s method to “ML” (instead of “REML”) to calculate the AIC, by doing:
AIC(update(mdl, method=”ML”))
HTH,
Juan Sebastian
It is possible, particularly if the lakes are highly variable both within and between years. You should consider specifying the correlation structure using lme, as you have a repeated measures design. Also, Juan is correct in stating that the AIC score is calculated using ML, while models for R2 are estimated using REML.
Hi Jon,
I’ve just discovered your function and I’m quite excited to be able to use it. I have a question though. Imagine a model like this:
m1 <- lmer(y ~ x + (x | subject))
Now, x is a categorical variable here. If I fit the model using treatment contrasts or sum contrasts for x, the deviance, AIC, etc. are always the same. But in general the variance of the random slope(s) will be larger for the treatment contrast version of the model.
The result is that the R-squared function (both marginal and conditional) comes out slightly different for the two parametrizations. This seems unfortunate and I don't know what to do.
Any advice?
Thanks a lot,
Daniel
Oh, also I think in general the variance of the random intercept will also be larger for the treatment contrasts than the sum contrasts. I would be happy to provide a real example if you would like.
Yes, the r.squaredGLMM function in the current version of MuMIn (1.9.13) uses the original N&S formulae, so is just for random intercepts models, but Kamil tells me that the next version (1.10.0) will incorporate the code above [mean(diag(X %*% Sigma %*% t(X)))] so will work for random slopes models as well. It will also work for reparameterisations of random slopes models e.g. where inter-individual variance depends on x (typically a categorical x), e.g. random regression model applied to animal behaviour.
Kamil Barton said the version of MuMIn on R-Forge incorporates the new code for random slopes, but it doesn’t seem to – I’m getting the same results as the old version.
Could you help me in terms of installing this: mean(diag(X %*% Sigma %*% t(X)))
Into your code: VarRand <- colSums(do.call(rbind, lapply(lme4::VarCorr(mdl), function(x) x[1])))
I see where to get the X matrix but the Sigma is not something I see as part of the model structure.
Thanks!
Dan
Hi Dan,
MuMIn 1.9.26, which is the latest development version (currently) on
http://r-forge.r-project.org/R/?group_id=346,
should work for random slopes, so I’d be surprised if it gave the same results as the version on CRAN (1.9.13) for a random slopes model fit. 1.9.26 is the version that should (currently) install following
install.packages(“MuMIn”, repos=”http://R-Forge.R-project.org”,type=”source”). Check installed.packages()[“MuMIn”,”Version”] after installing.
Alternatively, you can calculate the random effects variance directly using mean(diag(X %*% Sigma %*% t(X))). Sigma is returned by VarCorr(m1)$subject, for your model. For models with >1 random effect level, the list returned by VarCorr will have length >1 so the total random effect variance will need to be summed across this list. The colSums code that you quoted doesn’t work for random slopes models and should be replaced by sapply-ing and summing over mean(diag(X %*% Sigma %*% t(X))) which works for both. I haven’t given general usable because in some cases the X matrix needs to be formed with care (as Kamil has done in MuMIn >= 1.9.26).
Good luck,
Paul
Hi again Paul,
Thanks for all your help.
I am not sure what’s going on because I do get the same results, as you’ll see here:
> installed.packages()[“MuMIn”, “Version”]
[1] “1.9.13”
> r.squaredGLMM(cefr.0)
R2m R2c
0.04337719 0.60337371
> installed.packages()[“MuMIn”, “Version”]
[1] “1.9.26”
> r.squaredGLMM(cefr.0)
R2m R2c
0.04337719 0.60337371
I’ll keep trying, I guess.
Hi Dan – I emailed you to suggest resolving this over email, then summarising the resolution on this thread.
Paul
Hi Daniel,
It looks like your problem is caused by your random slope. The Nakagawa & Schielzeth method is intended for random intercepts models but not random slopes models. If you try to use it on a random slopes model you’ll find that the results will depend on arbitrary covariate choices such as the reference category of a categorical variable or covariate centering.
I can think of three solutions:
(1) Estimate the random effect variance component from the random slopes model manually using the solution I posted above (after a few erroneous suggestions) https://jonlefcheck.net/2013/03/13/r2-for-linear-mixed-effects-models/comment-page-1/#comment-1314
(2) Wait for the version >=1.10.0 of Kamil Bartoń’s excellent MuMIn package, which will have an updated r.squaredGLMM function that will work for both random intercepts and random slopes models. I think Kamil intends this to be within weeks but you’d need to check with him.
(3) Follow N&S’s advice and remove the random slope: “we recommend calculating R2GLMM (both marginal and conditional) from corresponding random intercept models for randomslope models”. If your design is balanced (similar numbers and covariate distribution in each subject) then the approximation to marginal R^2GLMM should be good. How close conditional R^2GLMM is to the true random slopes value is will depend on how much additional residual variance is absorbed by fitting the random slopes.
Good luck,
Paul
Thanks for the very helpful reply! All three of your options sound like good possibilities. Does the MuMin package calculate the R^2 using the same theory as in Nakagawa & Schielzeth?
Dan
Hello,
I just came back to this and now I am wondering how to extract the Variance components for my fixed and random factors from your script.
I understand that your code is calcualting them (the VarF, VarRand and VarResid, right?) but I don’t understand how to extract them for the single factors. I tried all the things I could think of based on my meager R- knowledge, but sadly nothing worked. Would you mind telling me how to do this?
Angela
Excellent work, thank you very much. Good luck for all further projects!
Hi Jon and thank you very much for this very interresting article.
I would like to know how it is possible to obtain the different partial R2 for every fixed factors in the model when you run a mixed model. I used to report partial R2 in classic ANOVA and I would like to do this with mixed model analyze. Could you help me?
Thank you very much.
Hi Sebastien, Check out the procedure in Legendre & Legendre (1998), page 521-522. It makes me a little wary since these are technically pseudo-R2s and you are doing a stepwise procedure, which has its own set of criticism. Cheers, Jon
Thank you very Much Jon for your answer, I try and let you know…
Sébastien.
Hi,
in order to be more specific about my last post. I’have try two solutions:
– r.squaredGLMM(list(model1,model2))
In this case R return me an error message: Erreur dans UseMethod(“r.squaredGLMM”) :
pas de méthode pour ‘r.squaredGLMM’ applicable pour un objet de classe “list”
-rsquared.glmm(list((model1,model2))
In this case, I also have an error message indicate that it’s not possible to find “rsquared.glmm” function. I have search this function, but without success…
Thank you very much.
Hello Sebastien. First, make sure you are using the latest code from GitHub (you will need to use the function install_github in the devtools package, or copy and paste the raw code yourself). The function is rsquared.glmm, and you have an extra parenthesis in your second bit of code. Drop me an email if it still doesn’t work. Cheers, Jon
Salut Sébastien,
`r.squaredGLMM` within the MuMIn package doesn’t work with lists of models. If you want to use it as such, you would need to do something like `lapply(list(model1,model2), r.squaredGLMM)`. Otherwise if you want to use Jon’s function,you can call `devtools::source_url(“https://raw.githubusercontent.com/jslefche/rsquared.glmer/master/rsquaredglmm.R”)`, or equivalently, copy and paste the code within this blog post in your R terminal.
Bon courage
Thank you very Juan Sébastien. I have cope and Paste the function and it works very well!
However, it gives me the marginal and conditional R2 for difference models and what I want it’s the marginal and conditional R2 for every factors…Jon have give me a reference (Legendre & Legendre, 1998), I hope I will find a solution…
Sébastien
Hi,
I am wondering if it is a way to calculate the R2 from a GLMM model run with glmmADMB with negative binomial distribution.
Thank you,
Diego
If I recall we do not support glmmADMB. Try using glmmPQL in the MASS package, that should work and allow you to fit to quasi distributions (if that’s your goal). Cheers, Jon
Thank you for your quick replay!
As you suggested I fit the model with glmmPQL with a negative binomial distribution, but when I try to calculate the R2 I get the following message: “Error in eval(expr, envir, enclos) : could not find function “ML”” I do not understand why because glmmPQL use by default ML. Do you have any idea about what could be the problem?
Thank you very much,
Diego
thanks for this post. one question: for the gaussian case, instead of calculating var(fixed) using the design matrix, can we instead calculate it by first calculating the total variance of the response (var(y)) and subtracting from it both var(random) and var(residual)? i.e. var(fixed)=var(total)-var(random)-var(residual)? it seems easier since total variance is easy to calculate.
Hi Jon, Thanks again for the provided code. I, however, experience some issues when comparing R^2 values of my random slope binomial GLMMs calculated from your code and from the code in the MuMIn package. It seems while the marginal R^2 are consistent between your code and the MuMIn package, the conditional R^2 is lower using your code compared to the MuMIn package. Any suggestions how this is possible? Thanks for any clarification. Maurits
I got the answer myself, it appears that the different outcomes are due to the inclusion of a dispersion parameter in my model, introduced as a random effect.
This is a very useful function. Thanks!
I am just wondering whether it is possible to use this function on a programmatically generated list of models. Here is what I tried:
> mod.func vars mods mod.fit.stats <- rsquared.glmm(mods)
Error in paste("outcome.var", "~", var.name) :
object 'var.name' not found
I’m not sure why the code in my previous post got messed up, but I will try it one more time:
mod.func <- function(var.name) {
lme(as.formula(paste("outcome.var", "~", var.name)),
~ 1| group, data = dat, method = "ML")
}
vars <- names(dat[ , setdiff(names(dat), c("group", "outcome.var"))])
mods <- lapply(vars, mod.func)
mod.fit.stats <- rsquared.glmm(mods)
Error in paste("outcome.var", "~", var.name) :
object 'var.name' not found
Actually, I found a solution for this problem. If I use do.call() in the function definition, then it works:
mod.func <- function(var.name) {
formula <- as.formula(paste("outcome.var", "~", var.name))
do.call(lme,
list(fixed = formula,
random = ~ 1| group,
data = dat,
method = "ML"))
}
vars <- names(dat[ , setdiff(names(dat), c("group", "outcome.var"))])
mods <- lapply(vars, mod.func)
mod.fit.stats <- rsquared.glmm(mods)
Hi! Thanks so much for the code!
With one of my models I get this error:
>
Error in X[, rownames(Sigma)] : subscript out of bounds
Do you know why I would get this error? The code works for other versions of this same model, with the same data, it just doesnt seem to like this particular model (random intercept and slope correlated, intercept only)
Thanks!
Hi Amanda, What package are you using to create your models? Can you share the code for the specific model? Does the model run without any errors? Cheers, Jon
Hi Jon! Thanks for your reply!
I am using lme4.
Here is some info about the model. I am super new at this, so it is entirely possible that I have made a mistake somewhere. I am like a newly licensed driver behind the wheel! I have tried starting over in R and also downloading your script from github to make sure I have the latest version.
lmer.best= lmer(percent.combined ~1+(Sum.of.basal.area |site), data=se.size, REML=T)
Linear mixed model fit by REML [‘lmerMod’]
Formula: percent.combined ~ 1 + (Sum.of.basal.area | site)
Data: se.size
REML criterion at convergence: 193.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.76774 -0.44322 0.09153 0.46918 1.59555
Random effects:
Groups Name Variance Std.Dev. Corr
site (Intercept) 28338.243 168.340
Sum.of.basal.area 33.335 5.774 -1.00
Residual 3.855 1.963
Number of obs: 36, groups: site, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 71.352 3.611 19.76
Thanks in advance for any observations!
I suspect this is failing because you have no fixed effects in your model. To assess model ‘fit’, I would recommend examining the variance components using varcomp() or comparing to the fixed + random models using AIC (make sure to change your estimator to ML). Cheers, Jon
A colleague forwarded me this this morning, which might also be useful: http://www.maths.bath.ac.uk/~jjf23/mixchange/rbd.html, and https://github.com/fabian-s/RLRsim
Awesome post and piece of R code. Thanks a lot for sharing this.
Hi Jon, thanks very much for taking the time to post this.
I’m trying to calculate R^2 for a negative binomial glmm using lme4 and the negative.binomial family provided by MASS. I was looking to extend your code to support this but I’m having trouble determining how to calculate the distributional variance. Any idea where I should look for that kind of information?
Thanks,
Chris
Hey,
I am experiencing quite the same issue here, GLMM with negative binomial, and can’t seem to find the answer to my problem. Did you figure out how to compute conditional and marginal R2 for your models? It doesn’t seem to be a very common issue…
Thanks so much for any input!
Gabrielle
Thanks very much for this function. Is there a memory limit on how large of a data set it can operate? I ask because I have a training(~185k) and a testing (~125k) observation that glmer is able to render just fine, but which generates the following error when I use your function:
Error in diag(Z %*% Sigma %*% t(Z)) :
error in evaluating the argument ‘x’ in selecting a method for function ‘diag’: Error: cannot allocate vector of size 707.7 Gb
Calculating the AIC by itself is not an issue for these merMods, so I’m curious why this function would be running out memory. Any thoughts?
It’s because the code inside diag(…) is very wasteful of memory. Kamil Bartón has recently mplemented an efficient version of this step in r.squaredGLMM in his MuMIn package. I’m pretty sure this version is on CRAN. Otherwise it’ll be in the development version on rforge. I would paste the improved code line here but I don’t have access to it right now. It’s in one of the hidden functions behind r.squaredGLMM if you know how to dig around in these.
PS just to clarify, that memory-wasting line of code is my responsibility, not Jon’s. I should get an opportunity to post Kamil’s improved version over the next few days.
Here’s Kamil Bartoń’s memory-efficient fix for this problem. Replace
diag(Z %*% Sigma %*% t(Z))
with
rowSums((Z %*% Sigma) * Z)
The problem with the original code is that it creates an n X n matrix, of which only the diagonal is used. The new code calculates the diagonal directly, without wasting memory on storing the rest of the matrix.
I’ve checked and this fix is implemented in the latest version of MuMIn on CRAN (version 1.12.1).
Thanks Paul for taking care of this (been a bit occupied as of late). Glad there was an easy (and implementable) solution!
Paul,
Thanks very much! I’ll give it a try.
Jonathan
Thanks for the code. Works great on a linear mixed model, however, when trying to run the r.squared.merMod on a model fitted with glmer (gaussian family, log link) I received the following error:
Error in family_link.stop(family, link) :
Don’t know how to calculate variance for gaussian family and log link.
Does anyone know if it is possible to fix this?
Hello and thanks for the very useful code. I was wondering whether you think it is more appropriate to post the marginal or conditional R2 in a structural equation model. If I am really interested in how the paths explain variation in my response variable, would if make sense to just report the marginal?
Good question, Maya. I have always reported both under the assumption that reporting one or the other might be misleading, and let the reader decide if the increase in variance explained by fixed + random vs fixed only is interesting / relevant. If you are using my piecewiseSEM package, I just pushed a new update that includes function, get.model.fits, that acts as a wrapper for this function for SEMs. You can find it on Github. Cheers, Jon
Thanks! I am using piecewiseSEM and will try get.model.fits. I really appreciate all of your contributions!
Hi Jon. I have one more question about generalized linear mixed models in SEM. What are your thoughts about including standardized coefficients when you report SEMs that include GLMMs with binomial or poisson models? Do the standardized coefficients lose their meaning in a glmm context (and can you even calculate them)? Many thanks for your input!
Yes, this is a bit tricky. I believe in my package I do not scale those responses, but do scale the responses and put out a warning. In that case, the coefficients would represent the % change in success (or failure) per unit change in the standard deviation of X1. Whether this is appropriate depends on the structure of your SEM (principally whether the binomial or poission distributed variable appears as a predictor in another model). Shoot me an email if you want to discuss in more detail. Cheers, Jon