# All libraries herelibrary(tidyverse) # Data manipulization and visualizationlibrary(leaflet) # Create interactive maplibrary(lubridate) # Working with dates and timeslibrary(ggpubr) # Combines ggplotslibrary(here) # Set file locationlibrary(readr) # Reading files into data frameslibrary(gt) # Create tableslibrary(tufte) # Implementing Tufte-style graphicslibrary(feasts) # Time series analysislibrary(knitr) # Creates nice tables
Introduction
I currently enjoy living near the beach in Santa Barbara, but this comes with inadvertent ecological impacts as coastal development and high recreational use increase pollutants and impacts the coastal ecosystem. A major question arises: Does human development on land cause significant change in the species abundance of coastal waters? The aim of this project is to provide an understanding of sensitivities select species have to developed coastlines.
Data was used from the Santa Barbara Coastal Long Term Ecological Research dataset “Settlement of urchins and other invertebrates, ongoing since 1990.” This project collects counts on six taxonomic groups through regular summer surveys. The survey locations are throughout southern California with one north of San Francisco. Each location is a pier, the counts are collected from the end of the pier in the littoral zone. The littoral zone is below the low-tide level and experiences the effects of tidal and longshore currents, and breaking waves.
Three of the piers are off protected coastal regions, the rest are populated and developed piers. The protected regions are all parks with minimal development and a marine conservation area protecting surrounding water: Gaviota Beach Pier with Kashtayit State Marine Conservation Area, Point Cabrillo Pier with the Point Cabrillo State Marine Reserve, Anacapa Island Pier with a variety of marine protected areas surrounding the Channel Islands chain. The maps below display the survey locations, 3 protected and 5 unprotected.
Locations of the LTER Survey Sites:
Code
# Create a data frame with the coordinates and protection statusprotection_coordinates <-data.frame(lat =c(34.0085, 39.3142, 34.4645, 37.7523, 34.4152, 34.4133, 32.865, 35.1794),lon =c(-119.4236, -123.7461, -120.2307, -122.5102, -119.8746, -119.6852, -117.2504, -120.7347),protection =c(rep("Protected", 3), rep("Unprotected", 5)),Fullname =c("Anacapa", "Point Cabrillo", "Gaviota", "Ocean Beach", "Ellwood Pier", "Stearns Wharf", "Scripps", "Avila Beach"))# Create a leaflet map centered on Californiamap_sites <-leaflet() %>%setView(lng =-119.4179, lat =35.7783, zoom =6) %>%addTiles()# Add locations to the mapmap_sites <- map_sites %>%# Protected points addCircleMarkers(data = protection_coordinates[protection_coordinates$protection =="Protected", ], lat =~lat, # use lat/long from dataframelng =~lon, color ="blue", radius =6, # set the radius of the markerpopup =~Fullname # add the site name to the popup ) %>%# Unprotected pointsaddCircleMarkers(data = protection_coordinates[protection_coordinates$protection =="Unprotected", ], lat =~lat, lng =~lon, color ="darkorange", radius =6, popup =~Fullname # add the site name to the popup )# Add a legend to the mapmap_sites <-addLegend(map = map_sites,position ="topright",title ="Protection Status",colors =c("blue", "darkorange"),labels =c("Protected", "Unprotected"))# Display the mapmap_sites
This data set explores a variety of species spanning different ecological functions:
Arthropoda: Phylum of lobsters, crabs, and barnacles which have an exoskeleton made of chitin and risks mineralizing in calcium carbonate, a risk of ocean acidification.
Bivalvia: Class belonging to phylum Mollusca, includes filter feeders such as clams, oysters, and mussels.
Gastropoda: Class belonging to phylum Mollusca including snails, conch, abalone, limpets, and whelks with a range from scavengers, predators, herbivores, and parasites.
Ophiuroidea/Asterpodea: Brittle stars and sea stars respectively belonging to phylum Echinodermata, these are carnivores, filter feeders, and scavengers.
S purpuratus: Pacific purple sea urchin, a sedentary species primarily feeding on algae.
M franciscanus: Red sea urchins, which eat kept and seaweed.
Total Urchins: Combines counts of the previous two, S purpuratus and M franciscanus.
Data Collection and Tidying
Importing and Creating Data Frames
This dataset was initially downloaded via DataOne website, link in the references. The metadata stated NA’s were written in the data as ‘-99999’, so I included this while importing the data.
Code
## Import LTER data UrchinsDF <-read_csv(here("posts", "2022-12-9-stats-proj", "Invertebrate_Settlement_All_Years_20220722.csv"), na ="-99999")
The protection data was not immediately available, I had to create a data frame with the values 1 for protected, 0 for unprotected. Protection was determined by the location of the piers. A pier is labeled unprotected if located in populated developed areas such as Ocean Beach, San Diego; protected piers are located in undeveloped parks surrounded by marine protected areas, including Channel Islands National Park and Point Cabrillo State Marine Reserve.
Here the two dataframes are joined so all of the survey rows are connected with a protection status based on location. Protection status is changed to be a binary TRUE/FALSE option, and the date class is changed. There are repeated surveys per day so these are collapsed down to one survey per day per site for my dataframe.
Code
## Join and select column fieldsUrchins <-left_join(UrchinsDF, ProtectionDF, by ="SITE") |>select(SITE, Protection, DATE_RETRIEVED, ARTHROPODA, BIVALVIA, GASTROPODA, OPHIUROIDEA_ASTERPODEA, S_PURPURATUS, M_FRANCISCANUS, TOTAL_URCHINS)## Set as date classUrchins$DATE_RETRIEVED <- lubridate::mdy(Urchins$DATE_RETRIEVED)## Collapse repeated surveys per day by siteUrchins <- Urchins |>group_by(SITE, DATE_RETRIEVED) |># Combine by date, per sitesummarise(ARTHROPODA =sum(ARTHROPODA, na.rm =TRUE),BIVALVIA =sum(BIVALVIA, na.rm =TRUE),GASTROPODA =sum(GASTROPODA, na.rm =TRUE),OPHIUROIDEA_ASTERPODEA =sum(OPHIUROIDEA_ASTERPODEA, na.rm =TRUE),S_PURPURATUS =sum(S_PURPURATUS, na.rm =TRUE),M_FRANCISCANUS =sum(M_FRANCISCANUS, na.rm =TRUE),TOTAL_URCHINS =sum(TOTAL_URCHINS, na.rm =TRUE),Protection =mean(Protection))# Change class to TRUE/FALSE for use in ggplot shape aestheticUrchins$Protection <-as.logical(Urchins$Protection)
Tidy Data
Next the entire dataframe is pivoted to be longer along each Species, so now each species per survey has its own row for count. For later analysis, I use this longer dataframe to simplify the counts for each species to 1 or 0 for presence or absence. This is saved as a new dataframe and widened to return to a survey per row.
Code
## Create Species Data FrameSpeciesDF <- Urchins |>pivot_longer(cols =c("ARTHROPODA", "BIVALVIA", "GASTROPODA", "OPHIUROIDEA_ASTERPODEA", "S_PURPURATUS", "M_FRANCISCANUS", "TOTAL_URCHINS"),names_to ="Species",values_to ="Count",values_drop_na =TRUE) # Drop NA values# Add a binary row for species presence SpeciesDF$Present[SpeciesDF$Count >=1] <-1SpeciesDF$Present[SpeciesDF$Count ==0] <-0# Dataframe displays only binary presence for each species per surveyPresentDF <- SpeciesDF |>select(-Count) |>pivot_wider(values_from ="Present",names_from ="Species")
Data Exploration
These initial visualizations can provide clues to trends in the data. This first plot shows the data distribution over time with total species count at each site, highlighting if protected or not. There are several major spikes, potential outliers, and many zeros. Overall this visualization is too crowded to infer any patterns.
Code
TotalUrchins <-ggplot(Urchins, aes(x = DATE_RETRIEVED, y = TOTAL_URCHINS)) +geom_point(aes(color = SITE, shape = Protection), alpha = .7, size =2) +theme_minimal() +labs(title ="Total Species Collected at Sites in Southern CA",y ="Total Species Count",x ="Date Collected",color ="Site", shape ="Protection") +theme(plot.title =element_text(face ="bold",hjust = .5)) +scale_color_brewer(palette ="Set1")TotalUrchins
Below is a series of maps of species abundance over time for each survey site. There are high spikes in this data with a lot of zeros, and does not appear linear in nature for any of the sites. There is also not a single site, protection status or species that is the solo cause of the data spikes.
Code
## How does each species vary per site:Ana <- SpeciesDF |>filter(SITE =="ANACAPA") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Anacapa",x =NULL) +ylab("Species Count")FBPC <- SpeciesDF |>filter(SITE =="FBPC") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Point Cabrillo",x =NULL,y =NULL)Gav <- SpeciesDF |>filter(SITE =="GAVIOTA") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Gaviota",x =NULL) +ylab("Species Count")OB <- SpeciesDF |>filter(SITE =="OCNBCH") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Ocean Beach",x =NULL,y =NULL)EL <- SpeciesDF |>filter(SITE =="SBELL") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Ellwood Pier",x =NULL) +ylab("Species Count")SW <- SpeciesDF |>filter(SITE =="SBSTWRF") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Stearns Wharf",x =NULL,y =NULL)Scrip <- SpeciesDF |>filter(SITE =="SIO") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Scripps") +xlab("Date Collected") +ylab("Species Count")AV <- SpeciesDF |>filter(SITE =="AVILA") |>ggplot(aes(x = DATE_RETRIEVED, y = Count, color = Species)) +geom_line() +theme_minimal() +labs(title ="Avila Beach",y =NULL) +xlab("Date Collected")ggarrange(Ana, FBPC, Gav, OB, EL, SW, Scrip, AV +rremove("x.text"), ncol =2, nrow =4,common.legend =TRUE,legend ="right")
Hypothesis Testing
First, I want to check if there is a difference in means between protected and unprotected sites for each species with Welch’s Two Sample t-test. The goal is to understand if any species are sensitive to unprotected regions. The null hypothesis predicts there is no change between the abundance of each species in protected versus unprotected regions. I will repeat this for each listed species and total urchins.
This analysis found that some species are more sensitive to protection from coastal human development with low p-values, allowing the null hypothesis to be rejected for Total Urchins, Arthropoda, Bivalvia, Gastropoda, and S purpuratus. It is interesting to note that not all species were affected the same, the average means for Arthropoda and Bivalvia are higher in unprotected versus protected regions. This could be a result of changing ecological functions, a loss of predators, or another cause that human activities have been favorable for the species.
Not all species displayed a sensitivity or difference between the regions. Ophiuroidea/Asterpodea and M franciscanus each had higher p-values (.13 and .32 respectively) so unable to reject the null hypothesis of no difference between the means of protected versus unprotected regions for these groups. It is notable that Total Urchins had a significant p-value, but that changes when the two urchin species are separated, as S purpuratus does not show a significant difference but M franciscanus does between the sites. This may need to be reviewed further for significance as M franciscanus appears to have generally low abundance in the data with an average count of 0 between both sites. This signifies an importance to measure the species separately instead of in the combined fashion in the dataset.
This plot displays for each species a confidence interval, for each there is a 95% probability that the true difference in means for the species count at protected versus unprotected sites falls within the confidence range.
Finally, I want to explore if there is a time element with the presence of each species. To account for the large spikes, I utilize the dataframe with the binary present/absent data for each species per survey. The below time series linear regressions compare protected versus unprotected for each species.
Code
summary(lm(TOTAL_URCHINS ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Urch <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = TOTAL_URCHINS, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="Total Urchins",x =NULL,y =NULL) +theme(plot.title =element_text(face ="bold",hjust = .5))summary(lm(ARTHROPODA ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Art <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = ARTHROPODA, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="Arthropoda",x =NULL,y =NULL) +theme(plot.title =element_text(face ="bold",hjust = .5))summary(lm(BIVALVIA ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Biv <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = BIVALVIA, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="Bivalvia",x =NULL) +ylab("Present (1) or Absent (0) At Survey") +theme(plot.title =element_text(face ="bold",hjust = .5))summary(lm(GASTROPODA ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Gast <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = GASTROPODA, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="Gastropoda",x =NULL,y =NULL) +theme(plot.title =element_text(face ="bold",hjust = .5))summary(lm(OPHIUROIDEA_ASTERPODEA ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Oph <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = OPHIUROIDEA_ASTERPODEA, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="Ophiuroida/Asterpodea",x =NULL,y =NULL) +theme(plot.title =element_text(face ="bold",hjust = .5))summary(lm(S_PURPURATUS ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Purp <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = S_PURPURATUS, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="S Purpuratus",x =NULL,y =NULL) +theme(plot.title =element_text(face ="bold",hjust = .5))summary(lm(M_FRANCISCANUS ~ DATE_RETRIEVED + Protection, data = PresentDF))Time_Franc <-ggplot(PresentDF, aes(x = DATE_RETRIEVED, y = M_FRANCISCANUS, color = Protection)) +geom_jitter(height = .1, alpha = .6) +geom_smooth(method = loess, formula ='y ~ x') +theme_minimal() +labs(title ="M franciscanus",y =NULL) +xlab("Date Collected") +theme(plot.title =element_text(face ="bold",hjust = .5))
There is a low p-value associated with time displaying significance in this variable for Arthropoda, Bivalvia, Gastropoda, and Ophiuroidea/Asterpodea. Time was not significant for Total Urchins, S purpuratus, or M franciscanus. Interesting to note, in this linear regression time series the protection was found to be significant for Ophiuroidea/Asterpodea and M franciscanus while the earlier t-test did not. Overall these regressions have low R squared values, which signify the proportion of the species presence explained by time and protection.
Overall, I have found there are sensitivities to some species between protected and unprotected coastlines. Continued research could look further into the factors that affect these species, such as pollution levels due to high human activity or change of the ecosystem dynamics due to fishing and other recreational activities.
References
Britannica, The Editors of Encyclopaedia. “littoral zone”. Encyclopedia Britannica, 8 Aug. 2019, https://www.britannica.com/science/littoral-zone. Accessed 3 December 2022.