Mapping in R, Part I

I’m consistently amazed at the capabilities of R: if it can be done, it can be done in R. And so is the case with mapping. Recently, I had the need to do some complicated geospatial analysis, and I wanted to do it in R for the obvious reasons: it’s free, it’s open-source, and there is a great support community. As it turns out, R has much of the functionality of ArcGIS, albeit with a lot less flash and a lot more hair-pulling. But once you’ve done the legwork to get the data in and formatted, and the functions set, it’s a snap to run through all sorts of data.

Better to start simple, though. Lately, I’ve been asked to create a lot of maps showing the location of different experimental sites, partners, etc. It turns out that R has a lot of great maps built-in, if you download the “maps” package. If your coordinates are in decimals (as opposed to degrees/minutes/seconds), it’s actually quite simple to plot them on various levels (counties, states, or countries).

The following script uses LTER sites as an example, plotting those that fall within the contiguous US and shading the states that contain one or more sites. This script helps with a common problem I have, which is determining whether a point falls within a region. Like all code I provide, it’s completely self-contained!

lter_map

library(maps) #Calls: map_data
library(maptools) #Calls: map2SpatialPolygons
library(ggplot2) #Calls: ggplot
library(plyr) #Calls: ddply
library(sp) #Calls: over

# The goal is to map the LTER sites in the US, and shade the states in which they 
# occur

# Read in coordinates of LTER sites (From: http://www.lternet.edu/sites/coordinates)
sites=read.delim(textConnection(" 	0 	0
  Andrews Forest LTER (AND) 	44.21 	-122.26
  Arctic LTER (ARC) 	66.63 	-149.60
  Baltimore Ecosystem Study (BES) 	39.10 	-76.30
  Bonanza Creek LTER (BNZ) 	64.86 	-147.85
  California Current Ecosystem LTER (CCE) 	32.87 	-120.28
  Cedar Creek LTER (CDR) 	45.40 	-93.20
  Central Arizona - Phoenix LTER (CAP) 	33.43 	-111.93
  Coweeta LTER (CWT) 	35.00 	-83.50
  Florida Coastal Everglades LTER (FCE) 	25.47 	-80.85
  Georgia Coastal Ecosystems LTER (GCE) 	31.43 	-81.37
  Harvard Forest LTER (HFR) 	42.53 	-72.19
  Hubbard Brook LTER (HBR) 	43.94 	-71.75
  Jornada Basin LTER (JRN) 	32.62 	-106.74
  Kellogg Biological Station LTER (KBS) 	42.40 	-85.40
  Konza Prairie LTER (KNZ) 	39.09 	-96.58
  LTER Network Office (LNO) 	35.08 	-106.62
  Luquillo LTER (LUQ) 	18.30 	-65.80
  McMurdo Dry Valleys LTER (MCM) 	-77.00 	162.52
  Moorea Coral Reef LTER (MCR) 	-17.49 	-149.83
  Niwot Ridge LTER (NWT) 	39.99 	-105.38
  North Temperate Lakes LTER (NTL) 	46.01 	-89.67
  Palmer Antarctica LTER (PAL) 	-64.77 	-64.05
  Plum Island Ecosystems LTER (PIE) 	42.76 	-70.89
  Santa Barbara Coastal LTER (SBC) 	34.41 	-119.84
  Sevilleta LTER (SEV) 	34.35 	-106.88
  Shortgrass Steppe (SGS) 	40.83 	-104.72
  Virginia Coast Reserve LTER (VCR) 	37.28 	-75.91 "),header=F,strip.white=T) 
sites=droplevels(sites[-1,]); names(sites)=c("sites","lat","long")

# Load US map
states=map("state",fill=T,plot=F)

# In order to understand it a site falls within a state, we need to convert both 
# to spatial objects

# Convert US map to SpatialPolygon 
states_sp=map2SpatialPolygons(states,
   IDs=sapply(strsplit(states$names, ":"),function(x) x[1]),
         proj4string=CRS("+proj=longlat +datum=wgs84"))
# Convert sites to SpatialPoints object
sites_sp=SpatialPoints(sites[,c(3,2)],proj4string=CRS("+proj=longlat +datum=wgs84"))
# Use function `over` to obtain a list of SpatialPolygons that contain the sites
index=over(sites_sp,states_sp)
# Return the state names of the SpatialPolygons containing each site
states_names=sapply(states_sp@polygons, function(x) x@ID)[index]
# Note that not all sites occur in the continental US (e.g., Arctic, Alaska, etc.)
# so they occur as NAs

# Create a dataframe for plotting from the SpatialPoints object using `fortify`
states.fort=fortify(states_sp)
# And bind a column "fill" denoting whether the site is present 
states.fort$fill=ifelse(states.fort$id %in% states_names,"true","false")

# Use ggplot to plot the dataframe as a polygon
ggplot()+
# Fill the state polygons where fill=="true" and set state borders to "black"
  geom_polygon(data=states.fort,aes(x=long,y=lat,group=group,fill=fill),col="black")+
# Set the fill colors
  scale_fill_manual(values=c("white","cornflowerblue"))+
# Plot the individual sites as red dots
  geom_point(data=sites,aes(x=long,y=lat),col="red",size=4,alpha=0.9)+
# Set labels
  labs(x="",y="",title="US LTER Sites")+
# Zoom in on the contiguous US, otherwise points are plotted elsewhere
  coord_cartesian(ylim=c(24,50),xlim=c(-65,-130))+
# Set base theme and remove legend, and fill grid with grey background
  theme_bw(base_size=18)+theme(legend.position="none",
  panel.background=element_rect(fill="grey90"))

Created by Pretty R at inside-R.org

3 thoughts on “Mapping in R, Part I

    • jslefche says:

      Ah, good catch Wes! In this case, both CA sites are coastal LTERS (California Current and Santa Barbara), which means they don’t occur within the spatial polygon. In this case you would have to manually add “california” to the `states_names` object. R does what you tell it to do, which is not always what you want it to do!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s