Often in ecological research, we are interested not only in comparing univariate descriptors of communities, like diversity (such as in my previous post), but also in how the constituent species — or the composition — changes from one community to the next.

One common tool to do this is *non-metric multidimensional scaling*, or NMDS. The goal of NMDS is to collapse information from multiple dimensions (e.g, from multiple communities, sites, etc.) into just a few, so that they can be visualized and interpreted. Unlike other ordination techniques that rely on (primarily Euclidean) distances, such as Principal Coordinates Analysis, NMDS uses rank orders, and thus is an extremely flexible technique that can accommodate a variety of different kinds of data.

Below is a bit of code I wrote to illustrate the concepts behind of NMDS, and to provide a practical example to highlight some R functions that I find particularly useful. Most of the background information and tips come from the excellent manual for the software PRIMER (v6) by Clark and Warwick.

________________________

#### What is NMDS?

First, let’s lay some groundwork**.**

Consider a single axis representing the abundance of a single species. Along this axis, we can plot the communities in which this species appears, based on its abundance within each.

Now consider a second axis of abundance, representing another species. We can now plot each community along the two axes (Species 1 and Species 2).

Now consider a *third *axis of abundance representing yet another species.

# install.packages("scatterplot3d") library(scatterplot3d) d=scatterplot3d(0:10,0:10,0:10,type="n",xlab="Abundance of Species 1", ylab="Abundance of Species 2",zlab="Abundance of Species 3"); d d$points3d(5,5,0); text(d$xyz.convert(5,5,0.5),labels="community A") d$points3d(3,3,3); text(d$xyz.convert(3,3,3.5),labels="community B") d$points3d(0,5,5); text(d$xyz.convert(0,5,5.5),labels="community C")

Keep going, and imagine as many axes as there are species in these communities.

The goal of NMDS is to represent the original position of communities in multidimensional space as accurately as possible using a reduced number of dimensions that can be easily plotted and visualized (and to spare your thinker).

**How does it work?**

NMDS does not use the absolute abundances of species in communities, but rather their *rank orders*. The use of ranks omits some of the issues associated with using absolute distance (e.g., sensitivity to transformation), and as a result is much more flexible technique that accepts a variety of types of data. (It’s also where the “non-metric” part of the name comes from.)

The NMDS procedure is iterative and takes place over several steps:

- Define the original positions of communities in multidimensional space.
- Specify the number of reduced dimensions (typically 2).
- Construct an initial configuration of the samples in 2-dimensions.
- Regress distances in this initial configuration against the observed (measured) distances.
- Determine the stress, or the disagreement between 2-D configuration and predicted values from the regression. If the 2-D configuration perfectly preserves the original rank orders, then a plot of one against the other must be monotonically increasing. The extent to which the points on the 2-D configuration differ from this monotonically increasing line determines the degree of stress. This relationship is often visualized in what is called a Shepard plot.
- If stress is high, reposition the points in 2 dimensions in the direction of decreasing stress, and repeat until stress is below some threshold.**A good rule of thumb: stress > 0.05 provides an excellent representation in reduced dimensions, > 0.1 is great, >0.2 is good/ok, and stress > 0.3 provides a poor representation.**

Additional note: The final configuration may differ depending on the initial configuration (which is often random), and the number of iterations, so it is advisable to run the NMDS multiple times and compare the interpretation from the lowest stress solutions.

To begin, NMDS requires a distance matrix, or a matrix of dissimilarities. Raw Euclidean distances are not ideal for this purpose: they’re sensitive to total abundances, so may treat sites with a similar number of species as more similar, even though the identities of the species are different. They’re also sensitive to species absences, so may treat sites with the same number of absent species as more similar.

Consequently, ecologists use the Bray-Curtis dissimilarity calculation, which has a number of ideal properties:

- It is invariant to changes in units.
- It is unaffected by additions/removals of species that are not present in two communities,
- It is unaffected by the addition of a new community,
- It can recognize differences in total abundances when relative abundances are the same,

**Let’s do it in R!**

To run the NMDS, we will use the function `metaMDS`

from the vegan package. The function requires only a community-by-species matrix (which we will create randomly).

You should see each iteration of the NMDS until a solution is reached (i.e., stress was minimized after some number of reconfigurations of the points in 2 dimensions). You can increase the number of default iterations using the argument `trymax=`

. which may help alleviate issues of non-convergence. If high stress is your problem, increasing the number of dimensions to `k=3`

might also help.

example_NMDS=metaMDS(community_matrix,k=2,trymax=100)

You’ll see that `metaMDS`

has automatically applied a square root transformation and calculated the Bray-Curtis distances for our community-by-site matrix.

Let’s examine a Shepard plot, which shows scatter around the regression between the interpoint distances in the final configuration (i.e., the distances between each pair of communities) against their original dissimilarities.

stressplot(example_NMDS)

Large scatter around the line suggests that original dissimilarities are not well preserved in the reduced number of dimensions. Looks pretty good in this case.

Now we can plot the NMDS. The plot shows us both the communities (“sites”, open circles) and species (red crosses), but we don’t know which circle corresponds to which site, and which species corresponds to which cross.

plot(example_NMDS)

We can use the function `ordiplot`

and `orditorp`

to add text to the plot in place of points to make some sense of this rather non-intuitive mess.

ordiplot(example_NMDS,type="n") orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",cex=1.25,air=0.01)

That’s it! There’s a few more tips and tricks I want to demonstrate.

Let’s suppose that communities 1-5 had some treatment applied, and communities 6-10 a different treatment. We can draw convex hulls connecting the vertices of the points made by these communities on the plot. I find this an intuitive way to understand how communities and species cluster based on treatments.

treat=c(rep("Treatment1",5),rep("Treatment2",5)) ordiplot(example_NMDS,type="n") ordihull(example_NMDS,groups=treat,draw="polygon",col="grey90",label=F) orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",col=c(rep("green",5),rep("blue",5)), air=0.01,cex=1.25)

One can also plot “spider graphs” using the function `orderspider`

, ellipses using the function `ordiellipse`

, or a minimum spanning tree (MST) using `ordicluster`

which connects similar communities (useful to see if treatments are effective in controlling community structure).

You could also color the convex hulls by treatment.

# First, create a vector of color values corresponding of the # same length as the vector of treatment values colors=c(rep("red",5),rep("blue",5)) ordiplot(example_NMDS,type="n") #Plot convex hulls with colors baesd on treatment for(i in unique(treat)) { ordihull(example_NMDS$point[grep(i,treat),],draw="polygon", groups=treat[treat==i],col=colors[grep(i,treat)],label=F) } orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",col=c(rep("green",5), rep("blue",5)),air=0.01,cex=1.25)

*You may wish to use a less garish color scheme than I

If the treatment is continuous, such as an environmental gradient, then it might be useful to plot contour lines rather than convex hulls. We can simply make up some, say, elevation data for our original community matrix and overlay them onto the NMDS plot using `ordisurf`

:

Which produces the following plot:

You could even do this for other continuous variables, such as temperature.

________________________

The full example code (annotated, with examples for the last several plots) is available below:

# Non-metric multidimensional scaling (NMDS) is one tool commonly used to # examine community composition # Let's lay some conceptual groundwork # Consider a single axis of abundance representing a single species: plot(0:10,0:10,type="n",axes=F,xlab="Abundance of Species 1",ylab="") axis(1) # We can plot each community on that axis depending on the abundance of # species 1 within it points(5,0); text(5.5,0.5,labels="community A") points(3,0); text(3.2,0.5,labels="community B") points(0,0); text(0.8,0.5,labels="community C") # Now consider a second axis of abundance representing a different # species # Communities can be plotted along both axes depending on the abundance of # species within it plot(0:10,0:10,type="n",xlab="Abundance of Species 1", ylab="Abundance of Species 2") points(5,5); text(5,4.5,labels="community A") points(3,3); text(3,3.5,labels="community B") points(0,5); text(0.8,5.5,labels="community C") # Now consider a THIRD axis of abundance representing yet another species # (For this we're going to need to load another package) # install.packages("scatterplot3d") library(scatterplot3d) d=scatterplot3d(0:10,0:10,0:10,type="n",xlab="Abundance of Species 1", ylab="Abundance of Species 2", zlab="Abundance of Species 3"); d d$points3d(5,5,0); text(d$xyz.convert(5,5,0.5),labels="community A") d$points3d(3,3,3); text(d$xyz.convert(3,3,3.5),labels="community B") d$points3d(0,5,5); text(d$xyz.convert(0,5,5.5),labels="community C") # Now consider as many axes as there are species S (obviously we cannot # visualize this beyond 3 dimensions) # Hopefully your head didn't explode # The goal of NMDS is to represent the original position of communities in # multidimensional space as accurately as possible using a reduced number # of dimensions that can be easily plotted and visualized # NMDS does not use the absolute abundances of species in communities, but # rather their RANK ORDER! # The use of ranks omits some of the issues associated with using absolute # distance (e.g., sensitivity to transformation), and as a result is much # more flexible technique that accepts a variety of types of data # (It is also where the "non-metric" part of the name comes from) # The NMDS procedure is iterative and takes place over several steps: # (1) Define the original positions of communities in multidimensional # space # (2) Specify the number m of reduced dimensions (typically 2) # (3) Construct an initial configuration of the samples in 2-dimensions # (4) Regress distances in this initial configuration against the observed # (measured) distances # (5) Determine the stress (disagreement between 2-D configuration and # predicted values from regression) # If the 2-D configuration perfectly preserves the original rank # orders, then a plot ofone against the other must be monotonically # increasing. The extent to which the points on the 2-D configuration # differ from this monotonically increasing line determines the # degree of stress (see Shepard plot) # (6) If stress is high, reposition the points in m dimensions in the #direction of decreasing stress, and repeat until stress is below #some threshold # Generally, stress < 0.05 provides an excellent represention in reduced # dimensions, < 0.1 is great, < 0.2 is good, and stress > 0.3 provides a # poor representation # NOTE: The final configuration may differ depending on the initial # configuration (which is often random) and the number of iterations, so # it is advisable to run the NMDS multiple times and compare the # interpretation from the lowest stress solutions # To begin, NMDS requires a distance matrix, or a matrix of # dissimilarities # Raw Euclidean distances are not ideal for this purpose: they are # sensitive to totalabundances, so may treat sites with a similar number # of species as more similar, even though the identities of the species # are different # They are also sensitive to species absences, so may treat sites with # the same number of absent species as more similar # Consequently, ecologists use the Bray-Curtis dissimilarity calculation, # which has many ideal properties: # It is invariant to changes in units # It is unaffected by additions/removals of species that are not # present in two communities # It is unaffected by the addition of a new community # It can recognize differences in total abudnances when relative # abundances are the same # To run the NMDS, we will use the function `metaMDS` from the vegan # package #install.packages("vegan") library(vegan) # `metaMDS` requires a community-by-species matrix # Let's create that matrix with some randomly sampled data set.seed(2) community_matrix=matrix(sample(1:100,300,replace=T),nrow=10, dimnames=list(paste("community",1:10,sep=""), paste("sp",1:30,sep=""))) # The function `metaMDS` will take care of most of the distance # calculations, iterative fitting, etc. We need simply to supply: example_NMDS=metaMDS(community_matrix, # Our community-by-species matrix k=2) # The number of reduced dimensions # You should see each iteration of the NMDS until a solution is reached # (i.e., stress was minimized after some number of reconfigurations of # the points in 2 dimensions). You can increase the number of default # iterations using the argument "trymax=##" example_NMDS=metaMDS(community_matrix,k=2,trymax=100) # And we can look at the NMDS object example_NMDS # metaMDS has automatically applied a square root # transformation and calculated the Bray-Curtis distances for our # community-by-site matrix # Let's examine a Shepard plot, which shows scatter around the regression # between the interpoint distances in the final configuration (distances # between each pair of communities) against their original dissimilarities stressplot(example_NMDS) # Large scatter around the line suggests that original dissimilarities are # not well preserved in the reduced number of dimensions #Now we can plot the NMDS plot(example_NMDS) # It shows us both the communities ("sites", open circles) and species # (red crosses), but we don't know which are which! # We can use the functions `ordiplot` and `orditorp` to add text to the # plot in place of points ordiplot(example_NMDS,type="n") orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",cex=1.25,air=0.01) # There are some additional functions that might of interest # Let's suppose that communities 1-5 had some treatment applied, and # communities 6-10 a different treatment # We can draw convex hulls connecting the vertices of the points made by # these communities on the plot # First, let's create a vector of treatment values: treat=c(rep("Treatment1",5),rep("Treatment2",5)) ordiplot(example_NMDS,type="n") ordihull(example_NMDS,groups=treat,draw="polygon",col="grey90", label=FALSE) orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",col=c(rep("green",5),rep("blue",5)), air=0.01,cex=1.25) # I find this an intuitive way to understand how communities and species # cluster based on treatments # One can also plot ellipses and "spider graphs" using the functions # `ordiellipse` and `orderspider` which emphasize the centroid of the # communities in each treatment # Another alternative is to plot a minimum spanning tree (from the # function `hclust`), which clusters communities based on their original # dissimilarities and projects the dendrogram onto the 2-D plot ordiplot(example_NMDS,type="n") orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",col=c(rep("green",5),rep("blue",5)), air=0.01,cex=1.25) ordicluster(example_NMDS,hclust(vegdist(community_matrix,"bray"))) # Note that clustering is based on Bray-Curtis distances # This is one method suggested to check the 2-D plot for accuracy # You could also plot the convex hulls, ellipses, spider plots, etc. colored based on the treatments # First, create a vector of color values corresponding of the same length as the vector of treatment values colors=c(rep("red",5),rep("blue",5)) ordiplot(example_NMDS,type="n") #Plot convex hulls with colors baesd on treatment for(i in unique(treat)) { ordihull(example_NMDS$point[grep(i,treat),],draw="polygon",groups=treat[treat==i],col=colors[grep(i,treat)],label=F) } orditorp(example_NMDS,display="species",col="red",air=0.01) orditorp(example_NMDS,display="sites",col=c(rep("green",5),rep("blue",5)), air=0.01,cex=1.25) # If the treatment is a continuous variable, consider mapping contour # lines onto the plot # For this example, consider the treatments were applied along an # elevational gradient # We can define random elevations for previous example elevation=runif(10,0.5,1.5) # And use the function ordisurf to plot contour lines ordisurf(example_NMDS,elevation,main="",col="forestgreen") # Finally, we want to display species on plot orditorp(example_NMDS,display="species",col="grey30",air=0.1,cex=1)

I am phd student at Tehran university, Iran. I need paper fpr definition of N-MDS. Guide me, please.

Thanks

Any statistical textbook will do!

the text are closed to each other, and even some are covered by each other, is there any good way to solve this problem?

Try the

`air =`

argument in`orditorp`

!I change the air=argument, then below is the result. It doesn’t work.

****

> orditorp(nmds,display=”sites”,cex=1.25,air=argument)

Error in orditorp(nmds, display = “sites”, cex = 1.25, air = argument) :

object ‘argument’ not found

******

See

`?orditorp`

, specifically:air

Amount of empty space between text labels. Values <1 allow overlapping text.

Thanks, I would try to set priority parameter.

Is there any difference between 2 D and 3D in NMDS .in Most of the biology papers NMDS with 2Dimension is used..Before going to NMDS K-MEAN analysis is need or not?

please guide me

Thanks

jasna vijayan

Yes, its entirely possible that 2 vs 3 dimensions will have a big influence on the inferences from the NMDS. Two dimensions is often used because its easier to visualize and present in a publication but I have seen 3-d presentations done well (and many done poorly). K-means clustering will provide different inferences to NMDS in that it will identify natural groupings in your data. You could in theory overlay those on the NMDS points. Hope this helps! Cheers, Jon

Thanks Jon…so in publication point of view 2D is ok

Hi Jon,

Is there a script command to get the correlations of the species data with the first and second NMDS axes?

Thanks!

Hi Jon,

I’m running metaMDS() and specifying k=3. Is there a way to plot three dimensions in three 2D graphs? E.g. one graph of NMDS1 vs. NMDS2, one graph of NMDS1 vs. NMDS3, and one graph of NMDS2 vs. NMDS3.

Thanks!

Sarah

Hi Sarah, Try the

`scatterplot3d`

package, specifically the`?plot3d`

function. Cheers, JonI have been trying to find a clear, simple explanation that goes through taking an ecological dataset through the process of nMDS graphing in R and your post has been so incredibly helpful! Thank you for putting it together and sharing.

Fantastic tutorial!

I’m just wondering, what one should do if they have species of an unknown group in their data set. Would I be able to use this? For example, I am dealing with a data set of grasshopper species among different communities. All of the grasshoppers were identified to species except for some of a genus that is particularly hard to identify at it’s earliest developmental stages. Thus, those have all been grouped together. Would it be safest to omit them? Should I omit all grasshoppers of all species below the 3rd instar? Or should I leave them as their own “species” group. Or can I even use this at all in this case?

Perhaps a very useful item to add to the tutorial would be “where there be dragons with this” aka what people should be aware of in performing NMDS.

Thanks for the help 🙂

Hi The Bug,

Its hard to say. Do you want these “unidentified” species to influence your interpretation of differences between communities? In other words, are they unidentified because they are unimportant (e.g., non-target) or because they were too destroyed to confidently ID or because they have difficult taxonomy? In all but the last case, I would probably not include them. Same goes for developmental stages: unless you are specifically interested in how the ontogenological composition changes among sites, then I would leave them out. If you treat them as separate columns the algorithm will view them as wholly independent species, which is not correct.

Heh, one day I may revisit the tutorial and add some dragons but alas, too little time!

Cheers,

Jon

Thanks, this is such a nice guide

Hi,

How can I change the treatment commands?

For example here:

treat=c(rep(“Treatment1”,5),rep(“Treatment2”,5))

I would like to replace them with pH and Conductivity instead.

I am very new with R and any help and advice is much appreciated 🙂

Hi Nas, Simply replace “Treatment1” and “Treatment2” with “pH” and “Conductivity”. Cheers, Jon

How to create a plot in 3D?

thank you very much

Hi there! I found an analyses from a NMDS in a paper, but I can’t find the way to do it.

I copy you teh explanation in the methods of that paper:

“we tested statistically whether the community composition of any of the ecosystem age groups was more homogenous than the composition among all islands. This was done for each ecosystem age group by joining the points on the plot by lines to construct minimum convex hulls, and testing whether the area of any of the group-specific hulls was smaller than random joining of points into three groups. Values of P based on 999 permutations were reported.”

Do you have any idea about the test to get that? thanks!!!

To be honest, I have never seen that before. I would use `betadisper` or `adonis` in the `vegan` package to test for differences among groups. As far as I know, neither of these procedures construct convex hulls. Hope this helps! Cheers, Jon

Hi Jon

Many thanks for the blog.

I have a related question. Is any way to calculate any “p-value” from this analysis? Or any measure of “discrimination” among clusters (if any)?

thanks again!!

F

The function

`pamk`

in the`fpc`

package will calculate the optimal number of clusters and use the Duda-Hart test to confirm. Hope this helps! JHi Jon, many thanks for the information. It is really superbly helpful. I hope you write a whole book on stats for ecologists some day. I’m using NMDS for the first time. I got a 2D solution with low stress values and I grouped the communities using hierarchical cluster analysis. A more senior ecologist (who champions DCA and always asks during conferences why am I not using it) said the ordination and groupings is wrong due to two groups overlapping on the ordination diagram when colour coded. I know from my knowledge of the sites and through ISA and MRPP that the groupings are sensible and statistically significant (MRPP). It might be a silly question but is overlap in groups in the 2D ordination diagram possible???

Hi Louise,

Thanks for the kind words! Overlap between the two groups is not only possible but common. Given adequate reproduction in reduced dimensions (i.e., a low stress value) this would indicate that the composition and relative abundances of the communities are not radically different between some data points. If you are looking at, for instance, the community before and after a disturbance, this might actually be a highly desirable result. I wouldn’t worry about a small overlap with statistically significant groupings, as you have here, because as you point out the groups make sense ecologically.

With respect to DCA and other kinds of canonical ordination, you may wish to share the following reference with your colleague:

Questionable Multivariate Statistical Inference in Wildlife Habitat and Community Studies

The point is that if the results jive with your intuition and observation about the system, that is far more valuable than statistical tests, which are informative but just as often driven by spurious correlations and confounding influence.

Cheers,

Jon

Hello Jon,

Thanks for a thorough explanation.

I have a morphometric data for 4 species (data set similar to “iris” data in R). I want to see if these morphometric characters can differentiate the species (i.e, species come out as separate clusters). I did make the plot following your code. I can see most of the characters forming a more or less tight cluster and when I map species on them they come out as different clusters. How does species form separate clusters when there are no obvious separate clusters in characters?

Also the characters which are inside the polygon, are they important characters for that species? Help me with interpretation.

Since the data used in metaMDS is species abundance, I don’t know if it can be applied to morphometric data (quantitative and qualitative). For now I have number-coded qualitative data to make it numeric.

Thanks a lot.

Hi Preeti — other techniques, like cluster analysis or machine learning (see the post here: https://jonlefcheck.net/2015/02/06/a-practical-guide-to-machine-learning-in-ecology/ ) may be more appropriate for your analysis. Have a look and let me know if you any questions! Cheers, Jon

Jon: Is there a way to plot (show) some of the ordination ellipses (or polygons) on a 2D graph but not plot (or hide/make transparent) the others? Thanks.

Hi Paul, check out some of the code to port the points over to

`ggplot2`

in the comments. You may find it a more flexible alternative that can achieve what you want to visualize! Cheers, JonHi Jon, top shelf summary of NMS, thanks :). I have seen that breakdown of stress values and their acceptability before, but not in a formal textbook. Any text I have seen has different values appropriate for say PCORD. Where do you get the breakdown of acceptable stress scores from?

Hi Graham, I pulled these values from the PRIMER V6 manual by Clarke & Warwick. The citation from Google Scholar is:

Clarke, K. R., and R. M. Warwick. “An approach to statistical analysis and interpretation.” Change in Marine Communities 2 (1994).

Hope this helps!

Cheers, Jon

Thank you Jon!

This is a great tutorial – thanks for taking the time to make it!!