R^2 for linear mixed effects models

Linear mixed effects models are a powerful technique for the analysis of ecological data, especially in the presence of nested or hierarchical variables. But unlike their purely fixed-effects cousins, they lack an obvious criterion to assess model fit.

[Updated October 13, 2015: Development of the R function has moved to my piecewiseSEM package, which can be found here under the function sem.model.fits]

In the fixed-effects world, the coefficient of determination, better known as R2, is a useful and intuitive tool for describing the predictive capacity of your model: its simply the total variance in the response explained by all the predictors in your model.

In a least squares regression, R2 is the sum of differences in the observed minus the fitted values, over the sum of the deviation of each observation from the mean of the response, all subtracted from 1:

r2_ss

For models estimated using maximum likelihood, this equation was generalized to the ratio of the likelihood of the intercept-only model over the full model, raised to the 2/N, and all subtracted from 1:

r2_ml

Both equations can be interpreted identically.

R2 has a number of useful properties. Its independent of sample size, bound (0,1), and dimensionless, which makes it ideal for comparing fits across different datasets.

I should note, however, that its a poor tool for model selection, since it almost always favors the most complex models. If the goal is to select among the best models, an information criterion approach (such as AIC or BIC) is preferred, since these indicators penalize for the number of predictors.

Unfortunately (and this is a common pitfall), the best model as determined by AIC is not synonymous with a good model. In other words, AIC provides a great test of relative model fit, but not absolute model fit. AIC values are also not comparable across models fit to different datasets. Thus, R2 and AIC both have their place in ecological statistics.

Often, researchers using mixed models report an R2 from a linear mixed model as simply the squared correlation between the fitted and observed values (see here), but this is a pseudo-R2 and is technically incorrect.

Why? Because a mixed effects model yields a variance associated with each random factor and the residual variance, so its not entirely clear which to use when calculating the R2 (the approach above ignores this issue by choosing to calculate R2 relative to only the residual variance, which omits any structure in the data imposed by the random factor).

When hierarchical models with >1 levels are considered, this problem is exacerbated since now we are dealing with variances at multiple levels, plus the residual error. Some people have discussed calculating an R2 value for each level of the random factor, but this can lead to negative R2 values when addition of predictors reduces the residual error while increasing the variance of the random component (or vice versa), even though the sum of the variance components remains unchanged.

So what have we done, historically-speaking? We’ve compared AIC values for the full and null models, or calculated all manner of pseudo-R2s, and quietly (and perhaps for some, ashamedly) went about our business.

Which is why I was so happy to see a paper with the title “A general and simple method for obtaining R2 from generalized linear mixed-effects models” by Shinichi Nakagawa and Holger Schielzeth appear in my newsfeed. Basically, they have derived two easily interpretable values of R2 that address the above issues, specifically that of negative pseudo-R2s with more predictors, while still honoring the random structure of the data.

The first is called the marginal R2 and describes the proportion of variance explained by the fixed factor(s) alone:

r2_mixed_marginal

The fixed-effects variance is in the numerator. The denominator is the total variance explained by the model, including (in order): the fixed-effects variance, the random variance (partitioned by level l), and the last two terms add up the residual variance and are the additive dispersion component (for non-normal models) and the distribution-specific variance.

The second is the conditional R2, which describes the proportion of variance explained by both the fixed and random factors:

r2_mixed_conditional

In this case, the numerator contains both the variance of the fixed effects, as well as the sum of random variance components for each level l of the random factor. You’ll note the denominator is identical.

This approach was later modified by Johnson (2014) to include random slope models.

While this may not be the holy grail of fit statistics (and the authors concede on this point as well), in my opinion, it does address many of the problems of pseudo-R2s and so is going to be my preferred statistic.

The authors also note (and I echo) that looking at R2 and AIC values is not a replacement for testing basic model assumptions, such as normality of predictors and non-constant variance, and should be used in tandem.

I’ve written a function in R called sem.model.fits (along with an amiable fellow named Juan Sebastian Casallas) that calculates the marginal and condition R2s, as well as AIC values for a complementary approach to model comparison. The input is either single model, or a list of models of most common classes, including models from the lme4 and nlme packages. The output is a data.frame where each row is a model in the same order as in the input list, with the model class, marginal and conditional R2s, and the AIC value as columns if the response is identical across all models.

You can find an example of this function in action here: https://github.com/jslefche/piecewiseSEM/blob/master/README.md#get-r2-for-individual-models

I’ve since incorporated this function into my piecewiseSEM package for piecewise structural equation modeling. Since the function found here will no longer be actively developed by me, I recommend using the function in the package to minimize the potential for issues.

[Edit: Also check out this awesome tutorial for quantifying uncertainty around these estimates!]

References

Nakagawa, S., and H. Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2): 133-142. DOI: 10.1111/j.2041-210x.2012.00261.x

Johnson, Paul C.D. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12225.

229 thoughts on “R^2 for linear mixed effects models

  1. Lee Gutowsky says:

    Awesome paper, I’m looking forward to reading it! By the way, any chance you’ll write such a code for PQL objects from the MASS package? Or, are marginal and conditional R^2 for PQL estimated GLMMs still unavailable?

  2. aghaynes says:

    Nice looking function!
    Just a heads-up – the development version of lme4 no longer uses the “mer” class (I believe its changed to “merMod”), nor @ for accessing slots (although a quick look suggests that this doesnt matter – you dont use it). I’m not sure if attr names have changed or not though…

    • aghaynes says:

      Hi again,
      I just tried your function out on a list of lme objects and it broke for 3 reasons
      1) you did’nt do the Rc.list part for lme models;
      I copied the line from the mer part to fix this, but…
      2) colSums only works if there are >1 random effect;
      I added an ifelse statement to sort this, and…
      3) you include AIC.list when you bind the results together but you never make it so I removed that from the call.

      Here’s my edited version of the lme section for you. Hope you dont mind my tinkering with your code!

      if(modclass[1]==”lme”) {#For models fit using lme
      #Get design matrix of fixed effects from model
      Fmat.list=lapply(modlist,function(i) model.matrix(eval(i$call$fixed)[-2],i$data) )
      #Get variance of fixed effects by multiplying coefficients by design matrix
      VarF.list=lapply(seq_along(modlist),function(i) var(as.vector(fixef(modlist[[i]]) %*% t(Fmat.list[[i]]))) )
      #Get variance of random effects by extracting variance components
      VarComp.list=lapply(modlist,function(i) VarCorr(i) )
      VarRand.list=lapply(VarComp.list,function(i) as.numeric(i[rownames(i)!=”Residual” & rownames(i)==”(Intercept)”,”Variance”]) )
      #Get residual variance
      VarResid.list=lapply(VarComp.list,function(i) as.numeric(i[rownames(i)==”Residual”,”Variance”]) )
      #Calculate marginal R-squared (fixed effects/total variance)
      Rm.list=lapply(seq_along(modlist),function(i) VarF.list[[i]]/(VarF.list[[i]]+sum(VarRand.list[[i]])+VarResid.list[[i]]) )
      Rc.list=lapply(seq_along(modlist),function(i) (VarF.list[[i]]+ifelse(length(VarRand.list[[i]][1])==1,VarRand.list[[i]][1],colSums(VarRand.list[[i]])))/(VarF.list[[i]]+ifelse(length(VarRand.list[[i]][1])==1,VarRand.list[[i]][1],colSums(VarRand.list[[i]]))+VarResid.list[[i]]) )
      #Bind R^2s into a matrix and return
      Rsquared.mat=do.call(cbind,list(Rm.list,Rc.list)); colnames(Rsquared.mat)=c(“Marginal”,”Conditional”)
      return(Rsquared.mat)
      }

      • jslefche says:

        Great, thanks so much for catching these errors! I think you can see where I was going with expanding the function to return AIC scores but since it requires ML vs REML estimation, the implementation was a little too lengthy for what I wanted to do here. Also, nice catch with the colSums: I went through this function with the example provided in the Nakagawa & Schielzeth supplements, which of course used multiple random effects.

      • aghaynes says:

        Not a problem…your function actually came along at an ideal time for me! For the AIC scores you could hijack aictab in the AICcmodavg package. It produces a table of AIC (or AICc, QAIC, QAICc) values. It only provides the AIC for REML or ML, whichever was used in the fitting process, but using lapply and update you could get around this (for lme, lmer should be very similar though):

        REML.AIC <- aictab(mods, 1:8, second.ord=FALSE)[,"AIC"] # for AICc second.ord=TRUE and [,"AICc"]
        ML.AIC <- aictab(lapply(mods, function(i) update(i, method="ML")), 1:8, second.ord=FALSE)[,"AIC"]

        Although it might be sensible to do a logical test on whether the model was fit with ML or REML first.
        HTH

      • Johan Craeymeersch says:

        When I add two crossed random factors in nlme, I do get following result:
        # Marginal Conditional
        #[1,] 0.5606305 NA
        I think (based on same model in lme4) that only the conditional R2 is given. Anyone any idea how to solve this?

    • jslefche says:

      I did notice the r.squaredGLMM function in the MuMIn package, but only after I wrote this post. At any rate, either function will work fine. Sorry about the reversal of names, it’s since been fixed!

      • Daniel says:

        An error using the lmer package / lme function:

        > r.squared.lme(X)
        Error in r.squared.lme(X) :
        could not find function “.rsquared.glmm”

        Can you help me please? Thanks!

    • Daniel says:

      Great code, thanks for sharing! In linear regression we get a r2 and p-value:
      “Multiple R-squared: 0.7239, Adjusted R-squared: 0.6319
      F-statistic: 7.867 on 5 and 15 DF, p-value: 0.0008221”

      Based on the r2 uncertainty (or other method) is it possible to estimate a p-value for the r2 of a mixed model? thanks!

      • Jon Lefcheck says:

        If I recall, the F-statistic is simply a test between the full and reduced (null) models. You can achieve the same test by fitting the null model (random effects only) and passing both models to anova():

        library(lme4)
        fm1=lmer(Reaction~Days+(Days|Subject),sleepstudy)
        fm0=lmer(Reaction~(Days|Subject),sleepstudy)
        anova(fm1,fm0)

        Note that this is an analysis of deviance. Cheers, Jon

  3. dema says:

    Hi there,
    Good job with the function.
    I´m initiating my experience with mixed models. I’m trying to use the function but I always get the following error:
    Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘VarCorr’ for signature ‘”lme”’

    What does this mean?

  4. Charles Chang says:

    Hello,

    Thanks very much for writing this function. I tried it out on some models I built using lmer(), but I keep getting this error:

    > rsquared.lme(list(lmer.GPA.0a))
    Error in as.vector(fixef(i) %*% t(i@X)) :
    error in evaluating the argument ‘x’ in selecting a method for function ‘as.vector’: Error in UseMethod(“fixef”) :
    no applicable method for ‘fixef’ applied to an object of class “mer”

    Any idea what could be going wrong here? The model is not particularly complicated:

    > lmer.GPA.0a <- lmer(GPAMean.2 ~ DLABScore.cent + (1|Class) + (1|Language), data=onlyesqdata.sub.novice)

    Cheers,
    Charles

    • jslefche says:

      Hi Charles,
      Hmm, fixef should work with lmer. Try closing R and clearing your workspace, updating R and your packages, and re-running your code!
      Cheers, Jon

      • michellemarraffini says:

        I am getting a similar error message as Charles, Error in as.vector(fixef(i) %*% t(i@X)) :
        error in evaluating the argument ‘x’ in selecting a method for function ‘as.vector’: Error in fixef(i) :
        error in evaluating the argument ‘object’ in selecting a method for function ‘fixef’: Error: object ‘i’ not found

        I am new to making functions in R, did I go wrong some where in putting this function into R?

      • michellemarraffini says:

        Hi Jon,
        I managed to solve my last comment, but now when I run the function it gives me NA as the Marginal and Conditional R2. I am trying to run this on glmer functions, but from reading the package they should still fall into class mer. Any thoughts? I appreciate your reply.
        Thanks,
        Michelle

      • Jon Lefcheck says:

        Hi Michelle, Are you using the development version of lme4? I wrote this using 0.999999-2. You may see the comment by @bulumakao for a fix for later versions. Cheers, Jon

  5. basile P says:

    Hi Jon,

    I am quite a beginner at mixed-effect-models so my questions might seems a bit odd.
    First, does this function is able to take into account further variables than the intercept for random effect? I mean doing something like : y~1+var+(1+var|grouping_var).
    Another question is about the validity of decomposing the random effects variance share, that is instead of summing columns of VarRand, doing something like : varRand/(varF+colSums(varRand)+varRes) and get an estimate of random effects explained variance per random effect?
    Thanks.
    Cheers,
    basile

    • jslefche says:

      Hi basile,

      This method should be able to account for varying slope/varying intercept models.

      As for your second question, I think what you’re looking for are the variance components. They should be given in the summary table for the models. In this scenario, one is typically interested in the fixed effects, so determining the proportion of variance attributed to the random effects may not be very informative in terms of inference. Did you have a specific application in mind?

      Cheers, Jon

      • Paul Johnson says:

        Hi,
        Thanks for making this function available.
        I’m also asking about how the function deals with random intercept+slope models, and other models where the dimension of the covariance matrix is >1.
        I’m fitting a Gaussian GLMM using lme4 so the line in the function that sums the random effect variances is:
        # Get variance of random effects by extracting variance components
        VarRand=colSums(do.call(rbind,lapply(VarCorr(i),function(j) j[1])))
        This part…
        lapply(VarCorr(i),function(j) j[1])
        …extracts the variance of each random effect, but where j is a covariance matrix (as with a random slope+intercept model) j[1] extracts only the [1,1] element, which for a random slope+intercepts model will be the variance of the random intercept, omitting the slope variance and 2 x the covariance. I don’t think this can be right. You can illustrate this by simply adding a constant to the x variable in (x | group), or changing the reference category when x is a factor. Rm and Rc will change (but shouldn’t) because each term in the covariance matrix depends on where the y-axis crosses the x-axis, while the sum of the covariance matrix is independent of this choice. To include all the random effect variances by summing the entire covariance matrix of each RF you could try:
        VarRand=sum(sapply(VarCorr(i),function(j) sum(j)))
        I noticed that r.squaredGLMM in the MuMIn package sums the RF variances in the same way so I’m outvoted on this, but I’m still fairly confident that the entire covariance matrix is required.

      • Paul Johnson says:

        PS sum(sapply(VarCorr(i),function(j) sum(j))) is of course unnecessarily long. Better is
        sum(sapply(VarCorr(lm0.randslopeplot),sum))
        or
        sum(unlist(VarCorr(lm0.randslopeplot)))

      • Paul Johnson says:

        Hi Jon – I’ve changed my mind… My proposed solution above is wrong, i.e. it doesn’t make the method work for random slopes models. The entire covariance matrix is required, but so is the model matrix. The correct random effects variance component for each random effect is mean(diag(X %*% Sigma %*% t(X))), where X is the model matrix for the random effects formula, including a column of 1s for the random intercept and a data column for each random slope, and Sigma is the random effect covariance matrix, with dimensions corresponding to the columns of X. The diag(…) part calculates the random effect variances for each observation in the data set. When there are random slopes, these observation-specific variances will differ, hence the need to average them. For a random intercepts model there will be no inter-observation variation, and each element of diag(…) will be the random intercept variance, so the code will return the same result as the original method.

  6. basile P says:

    Hi Jon,
    I was asking because in case of a slope in random effect, the following line only take the intercept value of the VarCorr output
    VarRand.list=lapply(modlist,function(i) do.call(rbind,lapply(VarCorr(i),function(j) j[1])) )

    VarCorr output is:
    $`site_id:subject_id`
    (Intercept) from_bl
    (Intercept) 0.005091415 -0.0018503245
    from_bl -0.001850324 0.0006724458
    attr(,”stddev”)
    (Intercept) from_bl
    0.07135415 0.02593156
    attr(,”correlation”)
    (Intercept) from_bl
    (Intercept) 1 -1
    from_bl -1 1

    $site_id
    (Intercept)
    (Intercept) 3.198851e-05
    attr(,”stddev”)
    (Intercept)
    0.005655839
    attr(,”correlation”)
    (Intercept)
    (Intercept) 1

    attr(,”sc”)
    [1] 0.07231825

    My goal is to evaluate how much variance is explained by fixed effects (dx groups, …), random effects (subjects and sites) and the residuals to show how little the fixed effects (and also inter-site random effect) are regarding inter-subject variability and unexplained variance.
    I have also looked at LMERConvenienceFunctions::pamer.fnc ( http://www.inside-r.org/node/78388 ) that outputs a R_squared per fixed-effect but it seems that the sums of these is not equal to the marginal variance.

    Thanks for your insight.
    Cheers.
    basile

  7. jslefche says:

    Hi Basile, Why not use an AIC-based model selection procedure? Build a model with your fixed, your random, and your fixed and random and compare to get a sense for how much explanatory power each factor has. I interpret the R^2 method above as a way to evaluate the fit of a model that has already been vetted. What do you think? Cheers, Jon

  8. Kristine says:

    Hello Jon,
    Thanks for putting together this very useful code! Any idea why I get the below error, whether I use your example or my own data? I’ve tried updating packages, restarting R, and clearing the workspace.

    [1] “Error: Function requires list of objects of class ‘mer’ or ‘lme'”

    Thanks,
    Kristine

      • Kristine says:

        Yes, I’m user lmer and I wasn’t successful [after loading the required packages] even with your example code. Thanks for your help.

      • aghaynes says:

        Hey Kristine, If you’re using the development version of lme4 from the rforge repository, your model will have a mermod (I think there might be a captial letter somewhere). You could check str(model) to find out if your model is actually of the mer class or not.

      • Kristine says:

        Thanks for the input. My model is in the mer class, so I’m not sure what wasn’t working, but I got it to run without errors this morning! Maybe restarting R and my machine was the key. I plan to use this code regularly, so thanks again for posting the code.

      • Kristine says:

        In case anyone else encounters this issue, I figured out the issue. I had loaded the package lmerTest, which is not compatible with the rsquared.lme function, since lmerTest masks the lmer object from lme4.

      • jslefche says:

        Good to know! I just started messing with lmerTest…good for the migrant SAS user, but will probably get some flak from the R community.

  9. Iván says:

    Hi, I am running mixed models and have tried to calculate the R2 using your function. I only load the lm4 package and still get the following message:
    Error en as.vector(fixef(i) %*% t(i@X)) :
    error in evaluating the argument ‘x’ in selecting a method for function ‘as.vector’: Error en UseMethod(“fixef”) :
    no applicable method for ‘fixef’ applied to an object of class “mer”
    Any ideas how to overcome this problem?
    As Kristine, I would like to use this function frequently with several more models.
    In fact, when I run the example test either in R or Rstudio I get the same message. I checked and my models are both “mer” objects.

    Thank you very much,

      • Iván says:

        Hi,
        Awesome we can finally get to asses the model fit for mixed models so thanks again.
        For me, the lme4 version of the function is still not working, It works with the nlm package but not when I use lme4. I get the same messages when I run the examples written after the function, It works for nlm objects but not for lme4. If it still does not work, I could always re-write my models using nlm but it would take a while since I have a lot, and having a function that works with nlm4 would be just perfect.

    • aghaynes says:

      Hi Ivan,
      The best thing to do is to start a fresh session. Sometimes when you load nlme after lme4 some of the required functions are masked and no longer recognise mer objects. Could that be the problem?

  10. Iker Dobarro says:

    ¡Thanks a lot for the script, Mr Lefchek! A referee asked me for variance partition of several nlme analysis and I didn’t know how to obtain variance explained by fixed factors.

  11. Paul says:

    Great function thank you! I was trying to run the function on a model involving a subset of my original data- as it turns out you must clear the unused arguments of your subset with the lapply function like this: (where “D” is a category within the variable “type”).

    > new.set new.set[ ]<-lapply(new.set, function(type) if (is.factor(type)) type[drop=T] else type)

    In hopes this saves some folks the few hours of hair pulling it caused me!

    • Paul says:

      Somehow my code got messed up upon posting…
      Should be:

      new.set<-subset(data,type=="D")
      new.set[ ]<-lapply(new.set,function(type) if(is.factor(type)) type[drop=T] else type)

  12. bulumakao says:

    Copied the code above yesterday, lme4 didn’t work but lmer did.
    Here are 2 modifications for models fit using lme4 version 0.99999911-5
    1) add class(i)==”lmerMod” to the list
    2) change VarF=var(as.vector(fixef(i) %*% t(i@X)))

    to VarF=var(as.vector(fixef(i) %*% t(i@pp$X)))

    Then tested it on the example code, both lme4 and lmer models give the same results.
    Cheers, SLT.

    • Jon Lefcheck says:

      Thanks for passing this along. Looks like you’re using a dev version from their repo (or did you compile from Github?). At any rate, it appears lme4 version 1.0 is set to be released soon. I will probably update the function when that goes live as it appears they’ve changed not only the name of the class but also the S4 slots. Cheers, Jon

      • Roelof says:

        Ran into the same problem using version lme4 1.0-4 (submitted to CRAN). I suggest instead of adding lmerMod to the list to change class(i)==”merMod” to inherits(i, “merMod”), as lmerMod is a subclass of merMod.

  13. Natasha says:

    I had found Nakagawa and Schielzeth paper and then your code and thought ha ha my solution. Unfortunately, I get results but they don’t make sense.

    I am trying to calculate a ‘local’ effect size (ES) measure and I need it to be standardised so that it can be compared across dependent variables and experiments. I found a paper by Selya et al (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3328081/) which proposed a Cohen’s f2 ((R2for a treatment model-R2 for a null model)/(1-R2for a treatment model)). So I wanted to use your code to calculate the R2 for these models to then calculate the Cohen’s f2. My data is analysed by a mixed model framework with one random effect and two-three fixed effects but I am only interested in the ES associated with one of the fixed effects. I am using nlme library and R version 3.0.1.

    What’s the problem with the results – well I know I have a highly significant treatment effect and yet there is no differences in the R2 between the treatment and null model.
    > LM_R2_null
    Class Marginal Conditional AIC
    1 lme 0.5510203 0.64475 1865.515
    > LM_R2_genotype=rsquared.lme(list(leanMass_genotype_Model))
    > LM_R2_genotype
    Class Marginal Conditional AIC
    1 lme 0.5648557 0.6442183 1855.959
    > Cohenf=(LM_R2_genotype$Conditional-LM_R2_null$Conditional)/(1-LM_R2_genotype$Conditional)
    > Cohenf
    [1] -0.00149466

    I wondered if it was that the random effect in the absence of the treatment variable is accounting for more of the variance. However this sort of goes against my whole testing of the treatment effect by anova(model_treatment, model_null). Any thoughts would be greatly appreciated.
    Best
    Natasha

    • Jon Lefcheck says:

      Hi Natasha, I suspect the trouble may come from the way you are specifying your null model. It should just be the random intercept. Shoot me an email if you need more specific help. Cheers, Jon

  14. Natasha says:

    Hi Jon,
    Thanks for your thoughts, you are correct that my null model isn’t an intercept only model, however this was deliberate and has arisen with the goal in mind of calculating Cohen’s f.

    In the manuscript (link provided in earlier post), it suggests the use of a Cohen’s f to estimate the local effect size for a MM situation as follows:
    “f2=(R2AB−R2A)/(1−R2AB) [2]
    where B is the variable of interest (i.e., either smoking quantity or nicotine dependence score),
    Ais the set of all other variables (i.e., gender and depending on what B is at the moment, nicotine dependence score or smoking quantity), R2AB is the proportion of variance accounted for by A and B together (relative to a model with no regressors), and R2A is the proportion of variance accounted for by A (relative to a model with no regressors). Thus,the numerator of (2) reflects the proportion of variance uniquely accounted for by B , over and above that of all other variables”

    So you can see that the R2A (which I called null model) isn’t an intercept only model rather just excludes the fixed effect of interest. Now I know the data, and that whilst the remaining two fixed effects contribute largely to the variation, the treatment effect of interest is highly significant and should account for a significant proportion of the variance. So I would expect
    R2A to be lower than R2AB.

    So it suggests that either my main effect is minor or the R2 is not a sensitive enough measure for the main effects influence or…… any ideas?
    Best wishes
    Natasha

  15. natalia norden says:

    Thanks for putting together and sharing your code. It’s really valuable to have people around sharing their knowledge!
    I’m using generalized linear mixed models (family= poisson) and I saw in your most recent version of the code that the “GLMM fit to Poisson distribution not yet supported”. Why is that? Is it something to do with the glmer function or with your code?
    Right now, I’m using an old version of the lmer package, where I use the function lmer() instead of glmer(), because the new one gives me the error “Error en `[[<-.data.frame`(`*tmp*`, i, value = integer(0)) : replacement has 0 rows, data has" 4211".). I found somewhere in the internet that this is related with a bug in the new version of lme4. In this old version I can fit my data with a Poisson error just fine.
    I was wondering if you would recommend adapting your code using the commented section of the code for glmer (family=poisson). Do you have a better option?
    Many thanks in advance for your help!
    natalia-

    • Jon Lefcheck says:

      Hi Natalia, Thanks for checking out the function. If you refer to the Nakagawa & Schielzeth paper on which this is based (link in post) you will see the calculations for generalized linear mixed effects models (binomial and poisson) are a little different than for general linear mixed models. The problem is that if you use my function on, say, a Poisson model, R will still generate a value, but that value will be incorrect. So best to steer clear for the time being, although I plan on taking a little closer look at this in the near future! Cheers, Jon

    • Jon Lefcheck says:

      Hi Natalia, I updated the function this morning. It should work for Poisson models now, but I’m still not sure if I’m constructing the null model correctly. If you encounter any errors, let me know. Cheers, Jon

  16. natalia norden says:

    Hi Jon, Sorry to bother you again. The residual variance i get is NA [ attar(varCorr(i),”sc”) is NA.] I tried to find out why in the web, but couldn’t find anything. Do you have any idea??
    Thanks again!
    cheers – n

  17. Natasha says:

    Hi Jon,
    We have applied this code as part of a mixed model analysis to 800 datasets and the results are not quite what we expected. So we used the conditional R2, but I was wondering why you provided both the marginal and conditional R2? In what situation would you want to use the marginal where you consider just the fixed factors alone?

    • Jon Lefcheck says:

      Hi Natasha, I recommend reading the original paper (Nakagawa & Schielzeth, reference listed at the end of the post). They recommend reporting both the conditional and marginal R^2, which have slightly different interpretations. They also provide a worked example that you should find helpful. Send me an email if you are still having issues. Cheers, Jon

  18. Luke McCune says:

    Jon,
    Wanted to thank you very much for creating this function; extremely helpful to have these new methods readily available. I’m glad that you added capabilities for the Poisson, but I seem to be getting strange output. For a single model with an interaction between two variables and an extra covariate (total of 3 variables and 4 coefficients in the glmer model), I get three different R^2 values when I use the rsquared.glme function. I imagine that it’s having trouble creating a single null model, if I’m not mistaken. Any advice you could give would be well appreciated.

  19. Robert Froese says:

    I’ve been playing with your code this morning but it’s failing for me on objects from nlme(). I ran your example code:

    > detach(package:lme4,unload=T) #Parts of this package conflict with lme4
    > require(nlme)
    > mod0=lm(y~fixed1,data)
    > mod1=lme(y~fixed1,random=~1|rand2/rand1,data)
    > mod2=lme(y~fixed1+fixed2,random=~1|rand2/rand1,data)
    > rsquared.glmm(list(mod0,mod1,mod2))

    Error in if (class(i) == “glmerMod” & summary(i)$family == “binomial”) { :
    argument is of length zero
    >

    • Jon Lefcheck says:

      Hi Robert, They keep switching up model classes. I ran into a similar error for lmer models built in lmerTest. I’m planning on looking into it soon but in the interim, you can extract the relevant error structure from the function (e.g., normal, binomial, etc.) and run the code by hand! Cheers, Jon

    • Paul Moquin says:

      Had the same issue yesterday and used:
      rsquared.lme(list(mod0,mod1,mod2))
      *as opposed to rsquared.glmm albeit with the code before the most recent update… so not sure if it works with current version… Jon? Have the code if you need…

  20. Bea says:

    Dear Jon
    Thanks for your code! Very useful. I used it for LMM and it worked out fine. I am new in mixed models and I am not sure how to interpret that I get the same values for the marginal and conditional R2 in one model. That means that there is not much variability in the random effect, but I am not sure if there are further implications. I have repeated measures in my experiment, that’s why I am using mixed models (and my model has two fixed and one random effect with a random intercept model). Any advice would be more than appreciated.

    Thank you very much!

    • Jon Lefcheck says:

      Hi Bea, Check out the paper (linked at the bottom of the post) for an interpretation of the two types of R2. Also check the variance components of the random effect in the model output (should be the top item, including the residual error for the model). If this is low, you may consider dropping the random term and comparing the full vs reduced model using something like AIC. It may be that there is not much autocorrelation as you thought. You can also test using something like Moran’s I to better decide whether to retain the random term in your model. Cheers, Jon

  21. Juan Sebastián Casallas says:

    Hi Jon,
    Thanks for accepting my changes in the code! I think the link to the github repo got lost in the blog post update, do you mind posting it again?

    Thanks!

  22. Angela says:

    Hello,
    and thank you so much for sharing this – it really saved me! Because I first tried with the examples from the paper, but they wouldn’t work for me (and I was going nuts trying to find out what I did wrong) and than I found your script and now I am able to get the R2-values for my analysis, which is awesome!

    So, how do I properly cite your script?

    Cheers,
    Angela

      • Angela says:

        Alright, I planed on doing that anyway, but thought you should get some more credit for this.

        Would you mind answering another question I have on the calculation from R2 of mixed models? Because the thing is, one of the reviewers of my paper actually asked me if I could provide not only R2-vlaues for my mixed models, but partial ones for each factor. Now, I am wondering if this is actually a reasonable request. Because with all the reading I’ve done, I feel like even the calcualtion of the R2-values in itself is still in developmental stage for these models and I don’t think it would be wise to try and further split them up.

        So, do you have any thoughts about this?

        Cheers,
        Angela

      • Jon Lefcheck says:

        Hi Angela, You can gain a sense for the contribution of each of the fixed factors to the explained variance by examining the variance components, or better yet, simply reporting parameter estimates. An alternative would be to perform some kind of model selection to identify the predictors that are not contributing significantly to the overall model. Hope that helps! Jon

  23. Ruben says:

    Wow, thank you Sir, you just made my day! This function yields the same results as I found, only this will save me a lot of time and works like a charm! Great work and nice paper by the way from Nakagawa et al.

    Greetings from the Netherlands

  24. Safra says:

    Hi Jon,

    Thanks for posting this, it has been extremely helpful. Any pointers/ideas on how to deal with negative binomial distributions?

    Thanks
    Safra

    • Gabrielle says:

      Hey,

      Did you get any pointers in the end? I am experiencing the same problem and can’t seem to find an answer. I would be very interesting in how you managed to compute (if you did so) your R2s with a negative binomial distribution.

      Thanks so much for any input!

      Gabrielle

  25. Paul Moquin says:

    Hi Jon
    We are using your function on an multi-lake productivity and nutrient concentration data set. Some of the data is from multiple years from the same lake while some lakes are only represented once. We are using mixed effect models (lme) to remove the pseudoreplication caused by using the same lake over multiple years. We are finding that the conditional R squared is up to two to four times the marginal value (one extreme case had a jump from 0.15 to 0.82!). This is making us wonder if we are overlooking something in our data. Have you heard of anybody else having similar experience?

    We also find that the AIC from your script is different from the one provided in the lme output. Any thoughts on the discrepancy?

    Many thanks!
    Paul

    • Juan Sebastián Casallas says:

      Hey Paul, regarding the AIC discrepancy, note that the r.squared.lme function updates the model’s method to “ML” (instead of “REML”) to calculate the AIC, by doing:
      AIC(update(mdl, method=”ML”))

      HTH,
      Juan Sebastian

    • Jon Lefcheck says:

      It is possible, particularly if the lakes are highly variable both within and between years. You should consider specifying the correlation structure using lme, as you have a repeated measures design. Also, Juan is correct in stating that the AIC score is calculated using ML, while models for R2 are estimated using REML.

  26. Daniel Ezra Johnson says:

    Hi Jon,

    I’ve just discovered your function and I’m quite excited to be able to use it. I have a question though. Imagine a model like this:

    m1 <- lmer(y ~ x + (x | subject))

    Now, x is a categorical variable here. If I fit the model using treatment contrasts or sum contrasts for x, the deviance, AIC, etc. are always the same. But in general the variance of the random slope(s) will be larger for the treatment contrast version of the model.

    The result is that the R-squared function (both marginal and conditional) comes out slightly different for the two parametrizations. This seems unfortunate and I don't know what to do.

    Any advice?

    Thanks a lot,
    Daniel

      • Paul Johnson says:

        Yes, the r.squaredGLMM function in the current version of MuMIn (1.9.13) uses the original N&S formulae, so is just for random intercepts models, but Kamil tells me that the next version (1.10.0) will incorporate the code above [mean(diag(X %*% Sigma %*% t(X)))] so will work for random slopes models as well. It will also work for reparameterisations of random slopes models e.g. where inter-individual variance depends on x (typically a categorical x), e.g. random regression model applied to animal behaviour.

      • Daniel Ezra Johnson says:

        Kamil Barton said the version of MuMIn on R-Forge incorporates the new code for random slopes, but it doesn’t seem to – I’m getting the same results as the old version.

        Could you help me in terms of installing this: mean(diag(X %*% Sigma %*% t(X)))
        Into your code: VarRand <- colSums(do.call(rbind, lapply(lme4::VarCorr(mdl), function(x) x[1])))

        I see where to get the X matrix but the Sigma is not something I see as part of the model structure.

        Thanks!
        Dan

      • Paul Johnson says:

        Hi Dan,
        MuMIn 1.9.26, which is the latest development version (currently) on
        http://r-forge.r-project.org/R/?group_id=346,
        should work for random slopes, so I’d be surprised if it gave the same results as the version on CRAN (1.9.13) for a random slopes model fit. 1.9.26 is the version that should (currently) install following
        install.packages(“MuMIn”, repos=”http://R-Forge.R-project.org”,type=”source”). Check installed.packages()[“MuMIn”,”Version”] after installing.
        Alternatively, you can calculate the random effects variance directly using mean(diag(X %*% Sigma %*% t(X))). Sigma is returned by VarCorr(m1)$subject, for your model. For models with >1 random effect level, the list returned by VarCorr will have length >1 so the total random effect variance will need to be summed across this list. The colSums code that you quoted doesn’t work for random slopes models and should be replaced by sapply-ing and summing over mean(diag(X %*% Sigma %*% t(X))) which works for both. I haven’t given general usable because in some cases the X matrix needs to be formed with care (as Kamil has done in MuMIn >= 1.9.26).
        Good luck,
        Paul

      • Daniel Ezra Johnson says:

        Hi again Paul,

        Thanks for all your help.

        I am not sure what’s going on because I do get the same results, as you’ll see here:

        > installed.packages()[“MuMIn”, “Version”]
        [1] “1.9.13”
        > r.squaredGLMM(cefr.0)
        R2m R2c
        0.04337719 0.60337371

        > installed.packages()[“MuMIn”, “Version”]
        [1] “1.9.26”
        > r.squaredGLMM(cefr.0)
        R2m R2c
        0.04337719 0.60337371

        I’ll keep trying, I guess.

    • Paul Johnson says:

      Hi Daniel,

      It looks like your problem is caused by your random slope. The Nakagawa & Schielzeth method is intended for random intercepts models but not random slopes models. If you try to use it on a random slopes model you’ll find that the results will depend on arbitrary covariate choices such as the reference category of a categorical variable or covariate centering.

      I can think of three solutions:
      (1) Estimate the random effect variance component from the random slopes model manually using the solution I posted above (after a few erroneous suggestions) https://jonlefcheck.net/2013/03/13/r2-for-linear-mixed-effects-models/comment-page-1/#comment-1314
      (2) Wait for the version >=1.10.0 of Kamil Bartoń’s excellent MuMIn package, which will have an updated r.squaredGLMM function that will work for both random intercepts and random slopes models. I think Kamil intends this to be within weeks but you’d need to check with him.
      (3) Follow N&S’s advice and remove the random slope: “we recommend calculating R2GLMM (both marginal and conditional) from corresponding random intercept models for randomslope models”. If your design is balanced (similar numbers and covariate distribution in each subject) then the approximation to marginal R^2GLMM should be good. How close conditional R^2GLMM is to the true random slopes value is will depend on how much additional residual variance is absorbed by fitting the random slopes.

      Good luck,
      Paul

      • Daniel Ezra Johnson says:

        Thanks for the very helpful reply! All three of your options sound like good possibilities. Does the MuMin package calculate the R^2 using the same theory as in Nakagawa & Schielzeth?
        Dan

  27. Angela says:

    Hello,
    I just came back to this and now I am wondering how to extract the Variance components for my fixed and random factors from your script.

    I understand that your code is calcualting them (the VarF, VarRand and VarResid, right?) but I don’t understand how to extract them for the single factors. I tried all the things I could think of based on my meager R- knowledge, but sadly nothing worked. Would you mind telling me how to do this?

    Angela

  28. Sébastien says:

    Hi Jon and thank you very much for this very interresting article.
    I would like to know how it is possible to obtain the different partial R2 for every fixed factors in the model when you run a mixed model. I used to report partial R2 in classic ANOVA and I would like to do this with mixed model analyze. Could you help me?
    Thank you very much.

    • Jon Lefcheck says:

      Hi Sebastien, Check out the procedure in Legendre & Legendre (1998), page 521-522. It makes me a little wary since these are technically pseudo-R2s and you are doing a stepwise procedure, which has its own set of criticism. Cheers, Jon

  29. Sébastien says:

    Hi,
    in order to be more specific about my last post. I’have try two solutions:
    – r.squaredGLMM(list(model1,model2))
    In this case R return me an error message: Erreur dans UseMethod(“r.squaredGLMM”) :
    pas de méthode pour ‘r.squaredGLMM’ applicable pour un objet de classe “list”
    -rsquared.glmm(list((model1,model2))
    In this case, I also have an error message indicate that it’s not possible to find “rsquared.glmm” function. I have search this function, but without success…
    Thank you very much.

    • Jon Lefcheck says:

      Hello Sebastien. First, make sure you are using the latest code from GitHub (you will need to use the function install_github in the devtools package, or copy and paste the raw code yourself). The function is rsquared.glmm, and you have an extra parenthesis in your second bit of code. Drop me an email if it still doesn’t work. Cheers, Jon

    • Juan Sebastián Casallas says:

      Salut Sébastien,

      `r.squaredGLMM` within the MuMIn package doesn’t work with lists of models. If you want to use it as such, you would need to do something like `lapply(list(model1,model2), r.squaredGLMM)`. Otherwise if you want to use Jon’s function,you can call `devtools::source_url(“https://raw.githubusercontent.com/jslefche/rsquared.glmer/master/rsquaredglmm.R”)`, or equivalently, copy and paste the code within this blog post in your R terminal.

      Bon courage

      • Sébastien says:

        Thank you very Juan Sébastien. I have cope and Paste the function and it works very well!
        However, it gives me the marginal and conditional R2 for difference models and what I want it’s the marginal and conditional R2 for every factors…Jon have give me a reference (Legendre & Legendre, 1998), I hope I will find a solution…
        Sébastien

  30. Diego says:

    Hi,

    I am wondering if it is a way to calculate the R2 from a GLMM model run with glmmADMB with negative binomial distribution.

    Thank you,

    Diego

    • Jon Lefcheck says:

      If I recall we do not support glmmADMB. Try using glmmPQL in the MASS package, that should work and allow you to fit to quasi distributions (if that’s your goal). Cheers, Jon

      • Diego says:

        Thank you for your quick replay!

        As you suggested I fit the model with glmmPQL with a negative binomial distribution, but when I try to calculate the R2 I get the following message: “Error in eval(expr, envir, enclos) : could not find function “ML”” I do not understand why because glmmPQL use by default ML. Do you have any idea about what could be the problem?

        Thank you very much,

        Diego

  31. Sabri Bromage says:

    thanks for this post. one question: for the gaussian case, instead of calculating var(fixed) using the design matrix, can we instead calculate it by first calculating the total variance of the response (var(y)) and subtracting from it both var(random) and var(residual)? i.e. var(fixed)=var(total)-var(random)-var(residual)? it seems easier since total variance is easy to calculate.

  32. Maurits says:

    Hi Jon, Thanks again for the provided code. I, however, experience some issues when comparing R^2 values of my random slope binomial GLMMs calculated from your code and from the code in the MuMIn package. It seems while the marginal R^2 are consistent between your code and the MuMIn package, the conditional R^2 is lower using your code compared to the MuMIn package. Any suggestions how this is possible? Thanks for any clarification. Maurits

    • Maurits says:

      I got the answer myself, it appears that the different outcomes are due to the inclusion of a dispersion parameter in my model, introduced as a random effect.

  33. Bonnie says:

    This is a very useful function. Thanks!

    I am just wondering whether it is possible to use this function on a programmatically generated list of models. Here is what I tried:

    > mod.func vars mods mod.fit.stats <- rsquared.glmm(mods)
    Error in paste("outcome.var", "~", var.name) :
    object 'var.name' not found

    • Bonnie says:

      I’m not sure why the code in my previous post got messed up, but I will try it one more time:

      mod.func <- function(var.name) {
      lme(as.formula(paste("outcome.var", "~", var.name)),
      ~ 1| group, data = dat, method = "ML")
      }
      vars <- names(dat[ , setdiff(names(dat), c("group", "outcome.var"))])
      mods <- lapply(vars, mod.func)
      mod.fit.stats <- rsquared.glmm(mods)

      Error in paste("outcome.var", "~", var.name) :
      object 'var.name' not found

      • Bonnie says:

        Actually, I found a solution for this problem. If I use do.call() in the function definition, then it works:

        mod.func <- function(var.name) {
        formula <- as.formula(paste("outcome.var", "~", var.name))
        do.call(lme,
        list(fixed = formula,
        random = ~ 1| group,
        data = dat,
        method = "ML"))
        }

        vars <- names(dat[ , setdiff(names(dat), c("group", "outcome.var"))])
        mods <- lapply(vars, mod.func)
        mod.fit.stats <- rsquared.glmm(mods)

  34. Amanda says:

    Hi! Thanks so much for the code!

    With one of my models I get this error:
    >
    Error in X[, rownames(Sigma)] : subscript out of bounds

    Do you know why I would get this error? The code works for other versions of this same model, with the same data, it just doesnt seem to like this particular model (random intercept and slope correlated, intercept only)

    Thanks!

  35. Amanda says:

    Hi Jon! Thanks for your reply!

    I am using lme4.
    Here is some info about the model. I am super new at this, so it is entirely possible that I have made a mistake somewhere. I am like a newly licensed driver behind the wheel! I have tried starting over in R and also downloading your script from github to make sure I have the latest version.

    lmer.best= lmer(percent.combined ~1+(Sum.of.basal.area |site), data=se.size, REML=T)
    Linear mixed model fit by REML [‘lmerMod’]
    Formula: percent.combined ~ 1 + (Sum.of.basal.area | site)
    Data: se.size

    REML criterion at convergence: 193.2

    Scaled residuals:
    Min 1Q Median 3Q Max
    -1.76774 -0.44322 0.09153 0.46918 1.59555

    Random effects:
    Groups Name Variance Std.Dev. Corr
    site (Intercept) 28338.243 168.340
    Sum.of.basal.area 33.335 5.774 -1.00
    Residual 3.855 1.963
    Number of obs: 36, groups: site, 6

    Fixed effects:
    Estimate Std. Error t value
    (Intercept) 71.352 3.611 19.76

    Thanks in advance for any observations!

  36. Chris says:

    Hi Jon, thanks very much for taking the time to post this.

    I’m trying to calculate R^2 for a negative binomial glmm using lme4 and the negative.binomial family provided by MASS. I was looking to extend your code to support this but I’m having trouble determining how to calculate the distributional variance. Any idea where I should look for that kind of information?

    Thanks,

    Chris

    • Gabrielle says:

      Hey,

      I am experiencing quite the same issue here, GLMM with negative binomial, and can’t seem to find the answer to my problem. Did you figure out how to compute conditional and marginal R2 for your models? It doesn’t seem to be a very common issue…

      Thanks so much for any input!

      Gabrielle

  37. Jonathan says:

    Thanks very much for this function. Is there a memory limit on how large of a data set it can operate? I ask because I have a training(~185k) and a testing (~125k) observation that glmer is able to render just fine, but which generates the following error when I use your function:

    Error in diag(Z %*% Sigma %*% t(Z)) :
    error in evaluating the argument ‘x’ in selecting a method for function ‘diag’: Error: cannot allocate vector of size 707.7 Gb

    Calculating the AIC by itself is not an issue for these merMods, so I’m curious why this function would be running out memory. Any thoughts?

    • Paul Johnson says:

      It’s because the code inside diag(…) is very wasteful of memory. Kamil Bartón has recently mplemented an efficient version of this step in r.squaredGLMM in his MuMIn package. I’m pretty sure this version is on CRAN. Otherwise it’ll be in the development version on rforge. I would paste the improved code line here but I don’t have access to it right now. It’s in one of the hidden functions behind r.squaredGLMM if you know how to dig around in these.

      • Paul Johnson says:

        PS just to clarify, that memory-wasting line of code is my responsibility, not Jon’s. I should get an opportunity to post Kamil’s improved version over the next few days.

      • Paul Johnson says:

        Here’s Kamil Bartoń’s memory-efficient fix for this problem. Replace
        diag(Z %*% Sigma %*% t(Z))
        with
        rowSums((Z %*% Sigma) * Z)
        The problem with the original code is that it creates an n X n matrix, of which only the diagonal is used. The new code calculates the diagonal directly, without wasting memory on storing the rest of the matrix.
        I’ve checked and this fix is implemented in the latest version of MuMIn on CRAN (version 1.12.1).

  38. Becky says:

    Thanks for the code. Works great on a linear mixed model, however, when trying to run the r.squared.merMod on a model fitted with glmer (gaussian family, log link) I received the following error:

    Error in family_link.stop(family, link) :
    Don’t know how to calculate variance for gaussian family and log link.

    Does anyone know if it is possible to fix this?

  39. Maya says:

    Hello and thanks for the very useful code. I was wondering whether you think it is more appropriate to post the marginal or conditional R2 in a structural equation model. If I am really interested in how the paths explain variation in my response variable, would if make sense to just report the marginal?

    • Jon Lefcheck says:

      Good question, Maya. I have always reported both under the assumption that reporting one or the other might be misleading, and let the reader decide if the increase in variance explained by fixed + random vs fixed only is interesting / relevant. If you are using my piecewiseSEM package, I just pushed a new update that includes function, get.model.fits, that acts as a wrapper for this function for SEMs. You can find it on Github. Cheers, Jon

      • Maya says:

        Hi Jon. I have one more question about generalized linear mixed models in SEM. What are your thoughts about including standardized coefficients when you report SEMs that include GLMMs with binomial or poisson models? Do the standardized coefficients lose their meaning in a glmm context (and can you even calculate them)? Many thanks for your input!

      • Jon Lefcheck says:

        Yes, this is a bit tricky. I believe in my package I do not scale those responses, but do scale the responses and put out a warning. In that case, the coefficients would represent the % change in success (or failure) per unit change in the standard deviation of X1. Whether this is appropriate depends on the structure of your SEM (principally whether the binomial or poission distributed variable appears as a predictor in another model). Shoot me an email if you want to discuss in more detail. Cheers, Jon

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s