-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
111 lines (83 loc) · 5.87 KB
/
README.Rmd
File metadata and controls
111 lines (83 loc) · 5.87 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
output: github_document
bibliography: refs.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
fig.align = "center",
out.width = "100%"
)
library(htmltools)
knitr::knit_hooks$set(imgcenter = function(before, options, envir){
if (before) {
HTML("<p align='center'>")
} else {
HTML("</p>")
}
})
```
# evmissing <a href="https://paulnorthrop.github.io/evmissing/"><img src="man/figures/evmissing_logo.png" align="right" style="float:right; height:150px;" alt="evmissing logo"/></a>
[](https://github.com/paulnorthrop/evmissing/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/github/paulnorthrop/evmissing?branch=master)
[](https://cran.r-project.org/package=evmissing)
[](https://cran.r-project.org/package=evmissing)
[](https://cran.r-project.org/package=evmissing)
## Extreme Value Analyses with Missing Data
Performs likelihood-based extreme value inferences with adjustment for the presence of missing values. A GEV distribution is fitted to block maxima using maximum likelihood estimation, with the GEV location and scale parameter reflecting the numbers of non-missing raw values in each block. A Bayesian version is also provided. For the purposes of comparison, there are options to make no adjustment for missing values or to discard any block maximum for which greater than a percentage of the underlying raw values are missing. A plot method provides a set of standard model diagnostic plots, with appropriate adjustment made for the presence of missing values. Example datasets containing missing values are provided.
The `evmissing` package was created to accompany @SimpsonNorthrop2026.
## An example
The main function in `evmissing` is `gev_mle()`, which fits a GEV distribution to block maxima using maximum likelihood estimation, with the option to make an adjustment for the numbers of non-missing raw values in each block.
Our adjustment is based on the strong assumption that missing values occur completely at random. We suppose that a block maximum $M_n$ based on a full block of length $n$ has a GEV($\mu$, $\sigma$, $\xi$) distribution, with distribution function $G(x)$. Let $n_i$ be the number of non-missing values in block $i$ and let $M_{n_i}$ denote the block maximum of such a block. We suppose that $M_{n_i}$ has a GEV($\mu(n_i)$, $\sigma(n_i)$, $\xi$) distribution, where $\mu(n_i) = \mu + \sigma [(n_i/n)^\xi -1] / \xi$ and $\sigma(n_i) = \sigma (n_i/n)^\xi$. These expressions are based on the supposition that $M_{n_i}$ has a distribution function of $G(x)^{n_i/n}$, where $n_i/n$ reflects the effective block size for block $i$ relative to the full block size.
### Sea surge height data
We illustrate this using the data `BrestSurgeMaxima`, which is a data frame
containing annual maximum sea surge heights (in cm) at Brest, France for the years 1846-2007 inclusive.
```{r}
library(evmissing)
head(BrestSurgeMaxima)
```
In addition to the annual maxima this data frame includes `notNA`, the number of days in each year for which raw data were available, and `n`, the number of days in the year, that is, the block length. The function `block_maxima()` can be used to create data of this format from a raw time series and information about how blocks are defined. Alternatively, we can provide a raw time series directly to `gev_mle()`.
### Model fitting
```{r}
# Make the adjustment
fit_adjust <- gev_mle(BrestSurgeMaxima)
summary(fit_adjust)
# Make no adjustment
fit_no_adjust <- gev_mle(BrestSurgeMaxima, adjust = FALSE)
summary(fit_no_adjust)
```
The most obvious difference between these model fits is that the estimated location parameter with adjustment (52.27 cm) is smaller than that when the adjustment is used (52.89 cm). This is as expected because if some of the data
that could contribute to an annual maximum are missing then the observed annual
maximum is stochastically smaller than the unobserved maximum based on full
data.
Alternatively, if the argument `adjust` to `gev_mle()` is a numeric scalar then any block maximum for which greater than `adjust` percent of the underlying raw values are missing is discarded before fitting a GEV distribution, with no further adjustment made.
### Model diagnostic plots
The plot method for an object returned from `gev_mle` creates visual model diagnostics like those described in @Coles2001 and implemented in @ismev, where the values plotted have been adjusted, where necessary, for the presence of missing values.
```{r echo=FALSE}
library(htmltools)
knitr::opts_chunk$set(
fig.align = "center"
)
knitr::knit_hooks$set(imgcenter = function(before, options, envir){
if (before) {
HTML("<p align='center'>")
} else {
HTML("</p>")
}
})
```
```{r plots, echo = TRUE, eval = FALSE, fig.width = 5, fig.height = 5, out.width="50%"}
plot(fit_adjust)
```
```{r plots, echo = FALSE, imgcenter = TRUE, fig.width = 5, fig.height = 5, out.width="60%", fig.alt = "Diagnostic plots of for the Brest Sea Surge Maxima data. PP, QQ, return level and density plots."}
```
We see that overall the GEV model fits these data well, although the largest annual maximum sea surge height lies outside the profile-based 95\% confidence interval that is relevant to this observation.
## Installation
To install the current released version from CRAN:
```{r cran_installation, eval = FALSE, echo = TRUE}
install.packages("evmissing")
```
## References