This function takes a 'GroupSummary' base::data.frame (which is created by the RGroupSummary() function) and returns a base::data.frame that is used by the Ace() function.

CleanSemAceDataset(dsDirty, dsGroupSummary, oName_S1, oName_S2, rName = "R")

Arguments

dsDirty

This is the base::data.frame to be cleaned.

dsGroupSummary

The base::data.frame containing information about which groups should be included in the analyses. It should be created by the RGroupSummary() function.

oName_S1

The name of the manifest variable (in dsDirty) for the first subject in each pair.

oName_S2

The name of the manifest variable (in dsDirty) for the second subject in each pair.

rName

The name of the variable (in dsDirty) indicating the pair's relatedness coefficient.

Value

A base::data.frame with one row per subject pair. The base::data.frame contains the following variables (which can NOT be changed by the user through optional parameters):

  • R The pair's R value.

  • O1 The outcome variable for the first subject in each pair.

  • O2 The outcome variable for the second subject in each pair.

  • GroupID Indicates the pair's group membership.

Details

The function takes dsDirty and produces a new base::data.frame with the following features:

  1. Only three existing columns are retained: O1, O2, and R. They are assigned these names.

  2. A new column called GroupID is created to reflect their group membership (which is based on the R value). These values are sequential integers, starting at 1. The group with the weakest R is 1. The group with the strongest R has the largest GroupID (this is typically the MZ twins).

  3. Any row is excluded if it has a missing data point for O1, O2, or R.

  4. The base::data.frame is sorted by the R value. This helps program against the multiple-group SEM API sometimes.

Author

Will Beasley

Examples

library(NlsyLinks) #Load the package into the current R session. dsLinks <- Links79PairExpanded #Start with the built-in data.frame in NlsyLinks dsLinks <- dsLinks[dsLinks$RelationshipPath=='Gen2Siblings', ] #Use only NLSY79-C siblings oName_S1 <- "MathStandardized_S1" #Stands for Outcome1 oName_S2 <- "MathStandardized_S2" #Stands for Outcome2 dsGroupSummary <- RGroupSummary(dsLinks, oName_S1, oName_S2) dsClean <- CleanSemAceDataset( dsDirty=dsLinks, dsGroupSummary, oName_S1, oName_S2, rName="R" ) summary(dsClean)
#> R O1 O2 GroupID #> Min. :0.2500 Min. : 65.00 Min. : 65.0 Min. :1.000 #> 1st Qu.:0.2500 1st Qu.: 90.00 1st Qu.: 90.0 1st Qu.:1.000 #> Median :0.5000 Median : 98.00 Median : 99.0 Median :3.000 #> Mean :0.4186 Mean : 98.24 Mean : 98.6 Mean :2.341 #> 3rd Qu.:0.5000 3rd Qu.:107.00 3rd Qu.:107.0 3rd Qu.:3.000 #> Max. :1.0000 Max. :135.00 Max. :135.0 Max. :4.000
dsClean$AbsDifference <- abs(dsClean$O1 - dsClean$O2) plot(jitter(dsClean$R), dsClean$AbsDifference, col="gray70")