R/validate-outcome-dataset.R
ValidateOutcomeDataset.Rd
The NlsyLinks handles a lot of the plumbing code needed to transform extracted NLSY datasets into a format that statistical routines can interpret. In some cases, a dataset of measured variables is needed, with one row per subject. This function validates the measured/outcome dataset, to ensure it posses an interpretable schema. For a specific list of the requirements, see Details
below.
ValidateOutcomeDataset(dsOutcome, outcomeNames)
dsOutcome | A base::data.frame with the measured variables |
---|---|
outcomeNames | The column names of the measure variables that eventually will be used by a statistical procedure. |
Returns TRUE
if the validation passes.
Returns an error (and associated descriptive message) if it false.
The dsOutcome
parameter must:
Have a non-missing value.
Contain at least one row.
Contain a column called 'SubjectTag' (case sensitive).
Have the SubjectTag column containing only positive numbers.
Have the SubjectTag column where all values are unique (ie, two rows/subjects cannot have the same value).
The outcomeNames
parameter must:
Have a non-missing value
Contain only column names that are present in the dsOutcome
data frame.
Will Beasley
library(NlsyLinks) #Load the package into the current R session. ds <- ExtraOutcomes79 outcomeNames <- c("MathStandardized", "WeightZGenderAge") ValidateOutcomeDataset(dsOutcome=ds, outcomeNames=outcomeNames) #Returns TRUE.#> [1] TRUEoutcomeNamesBad <- c("MathMisspelled", "WeightZGenderAge") #ValidateOutcomeDataset(dsOutcome=ds, outcomeNames=outcomeNamesBad) #Throws error.