Skip to content

Improve tools for creating complete dataset #509

Description

@gowerc

rbmi expects that the incoming dataset is complete, that is that it has 1 row per patient per time point even if the analysis value is missing for that row.

We currently offer the helper functions expand(), fill_locf() & expand_locf() to help create this dataset. However in practice these don't work well as they are dependent on LOCF for assigning default values to missing covariate values in the newly created rows; the issue being that if the first / baseline row was missing then there is nothing for LOCF to populate the missing covariate values with.

Currently in internal training we have been recommend users to first create a reference row that has all the covariate values which can be used in combination with LOCF however this is clunky as the user needs to remember to drop the reference row afterwards. Additionally it also means the user needs to add a "REFERENCE" factor level to the visit variable and also remember to re-level to remove this level afterwards.

I would propose a new function expand_ref() which does the expansion step and fills in newly formed rows with values from the reference dataset where reference is strictly 1 row per patient. e.g.

expand_ref(
    data = adeff,
    ref = adref,
    AVISIT = c("week 1", "week 2", ...)
)

Though I would not apply any locf here, if the user needed that they can pipe it into a call to fill_locf() (in hindsight I think expand_locf() tries to do too much at once and shouldn't exist but we are where we are and I don't think we should remove as people have already built code around it).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions