Prerequisites

This is the first ‘practical’ Chapter of the book and comes with software requirements. For the analysis of the available data we use R (R Core Team 2013). R is a computer language and environment for data analysis, statistical computation and data visualization. It can be downloaded at <https://www.r-project.org>. Together with R, we are using RStudio as the IDE (Team’ 2020).

Some core R software packages used in this Chapter need to be installed so that the analyses can be done as shown there. The installation can be done in the following way:

# Core Libraries
install.packages('tidyverse')    # Meta - dplyr, ggplot2, purrr, tidyr, stringr, forcats
install.packages('lubridate')    # date and time
install.packages('timetk')       # Time series data wrangling, visualization and preprocessing

# Extras
if (!require(devtools)) install.packages("devtools", repos = "http://cran.us.r-project.org")
# install_github("boxuancui/DataExplorer", ref="develop") # Simplifies and automates EDA process and reporting

# Data and helper functions
devtools::install_github("hydrosolutions/riversCentralAsia")

The packages can then be loaded and made available in your R session.

library(devtools)
library(tidyverse)
library(lubridate)
library(timetk)
library(DataExplorer)
library(riversCentralAsia)

When other, additional packages are needed, they will be loaded in the corresponding Sections below.

Please also remember the following rules when working with R dataframes in the tidyverse:

  • Every column is variable.
  • Every row is an observation.
  • Every cell is a single value.

A final note. In all of the following, we mostly use the powerful data manipulation and visualization techniques for time series data as provided by the timetk package. This package is in active development and greatly facilitates any work with time series data as it, among other things, nicely integrates with the R ‘tidyverse.’