Skip to content

mm() variance clarity #36

@leeper

Description

@leeper

Is it sufficiently clear that mm() returns domain estimates rather than SEs based on subsetting the data?

x <- structure(list(level = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,  2L), .Label = c("John", "Kate"), class = "factor"), outcome = c(0L,  0L, 1L, 1L, 0L, 0L, 1L, 1L), weight = c(1L, 1L, 1L, 1L, 0L, 0L,  0L, 0L)), row.names = 1:8, class = "data.frame")

# what people might be expecting
with(subset(x, level == "John"), sqrt(sum((outcome - mean(outcome))^2)/3/4))
svymean(~outcome, svydesign(ids = ~1, weights = ~ 1, data = subset(x, level == "John")))

# what is actually returned (all are equivalent)
## mm()
mm(x, outcome ~ level)

## unweighted data, subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ 1, data = x), level == "John"))

## weighted data (Kate weight == 0), subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ weight, data = x), level == "John"))

## weighted data (Kate weight == 0), full data frame
svymean(~outcome, svydesign(ids = ~1, weights = ~ weight, data = x))

[ ] Document this better, pointing to vignette: https://cran.r-project.org/web/packages/survey/vignettes/domain.pdf
[ ] Add option to not calculate variances as if subsets are random samples of population?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions