These variables are useful to many types of analyses (not just behavior genetics), and are provided to save users time.

Format

A data frame with 24,181 observations on the following 12 variables.

  • SubjectTag see the variable of the same name in Links79Pair

  • ExtendedID see the variable of the same name in Links79Pair

  • Generation Indicates if the subject is in generation 1 or 2.

  • Gender Indicates if the subject is Male or Female.

  • RaceCohort Indicates if the race cohort is Hispanic, Black or Nbnh (ie, Non-black, non-hispanic). This comes from the Gen1 variable R02147.00 and Gen2 variable C00053.00.

  • SiblingCountInNls The number of the subject's siblings, including himself/herself (a singleton has a value of one). This considers only the siblings in the NLSY. For Gen1, this can exclude anyone outside the age range. For Gen2, this excludes anyone who doesn't share the same mother.

  • BirthOrderInNls Indicates the subject's birth order among the NLSY siblings.

  • SimilarAgeCount The number of children who were born within roughly 30 days of the subject's birthday, including the subject (for instance, even an only child will have a value of 1). For Gen2 subjects, this should reflect how many children the Gen1 mother gave birth to at the same time (1: singleton; 2: twins, 3: triplets). For Gen1 subjects, this is less certain, because the individual might have been living with a similarly-aged housemate, born to a different mother.

  • HasMzPossibly Indicates if the subject might be a member of an MZ twin/triplet. This will be true if there is a sibling with a DOB within a month, and they are the same gender.

  • IsMz Indicates if the subject has been identified as a member of an MZ twin/triplet.

  • KidCountBio The number of biological children known to the NLSY (but not necessarily interviewed by the NLSY.

  • KidCountInNls The number of children who belong to the NLSY. This is nonnull for only Gen1 subjects.

  • Mob The subject's month of birth. The exact day is not available to the public. By default, we set their birthday to the 15th day of the month.

  • LastSurveyYearCompleted The year of the most recently completed survey.

  • AgeAtLastSurvey The subject's age at the most recently completed survey.

  • IsDead ##This variable is not available yet## Indicates if the subject was alive for the last attempted survey.

  • DeathDate ##This variable is not available yet## The subject's month of death. The exact day is not available to the public. By default, we set their birthday to the 15th day of the month.

Source

Gen1 information comes from the Summer 2013 release of the NLSY79 sample. Gen2 information comes from the Summer 2013 release of the NLSY79 Children and Young Adults sample. Data were extracted with the NLS Investigator (https://www.nlsinfo.org/investigator/).

See also

Download CSV If you're using the NlsyLinks package in R, the dataset is automatically available. To use it in a different environment, download the csv, which is readable by all statistical software. links-metadata-2017-79.yml documents the dataset version information.

Author

Will Beasley

Examples

library(NlsyLinks) # Load the package into the current R session.

summary(SubjectDetails79)
#>    SubjectTag        ExtendedID      Generation       Gender     
#>  Min.   :    100   Min.   :    1   Min.   :1.000   Male  :12276  
#>  1st Qu.: 314025   1st Qu.: 3139   1st Qu.:1.000   Female:11913  
#>  Median : 620050   Median : 6195   Median :1.000   NA's  :    1  
#>  Mean   : 618600   Mean   : 6180   Mean   :1.476                 
#>  3rd Qu.: 914501   3rd Qu.: 9140   3rd Qu.:2.000                 
#>  Max.   :1268600   Max.   :12686   Max.   :2.000                 
#>                                                                  
#>     RaceCohort    SiblingCountInNls BirthOrderInNls  SimilarAgeCount
#>  Hispanic: 4215   Min.   : 1.000    Min.   : 1.000   Min.   :1.00   
#>  Black   : 6362   1st Qu.: 1.000    1st Qu.: 1.000   1st Qu.:1.00   
#>  Nbnh    :13613   Median : 2.000    Median : 1.000   Median :1.00   
#>                   Mean   : 2.355    Mean   : 1.669   Mean   :1.02   
#>                   3rd Qu.: 3.000    3rd Qu.: 2.000   3rd Qu.:1.00   
#>                   Max.   :11.000    Max.   :11.000   Max.   :3.00   
#>                                                                     
#>  HasMzPossibly      KidCountBio     KidCountInNls         Mob            
#>  Min.   :0.00000   Min.   : 0.000   Min.   : 0.000   Min.   :1955-06-15  
#>  1st Qu.:0.00000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:1960-08-15  
#>  Median :0.00000   Median : 1.000   Median : 0.000   Median :1964-06-15  
#>  Mean   :0.01282   Mean   : 1.253   Mean   : 0.907   Mean   :1973-01-26  
#>  3rd Qu.:0.00000   3rd Qu.: 2.000   3rd Qu.: 2.000   3rd Qu.:1985-06-15  
#>  Max.   :1.00000   Max.   :11.000   Max.   :11.000   Max.   :2009-10-15  
#>                    NA's   :3678     NA's   :11504    NA's   :2           
#>  LastSurveyYearCompleted AgeAtLastSurvey      IsMz        
#>  Min.   :1979            Min.   : 0.0082   Mode :logical  
#>  1st Qu.:2000            1st Qu.:21.9220   FALSE:24114    
#>  Median :2010            Median :29.2977   TRUE :76       
#>  Mean   :2004            Mean   :32.0523                  
#>  3rd Qu.:2010            3rd Qu.:47.3703                  
#>  Max.   :2010            Max.   :54.9213                  
#>  NA's   :1095            NA's   :1095                     

oldPar <- par(mfrow = c(3, 2), mar = c(2, 2, 1, .5), tcl = 0, mgp = c(1, 0, 0))
hist(
  SubjectDetails79$SiblingCountInNls,
  main = "",
  breaks = seq(from = 0, to = max(SubjectDetails79$SiblingCountInNls, na.rm = TRUE), by = 1)
)
hist(
  SubjectDetails79$BirthOrderInNls,
  main   = "",
  breaks = seq(from = 0, to = max(SubjectDetails79$BirthOrderInNls, na.rm = TRUE), by = 1)
)
hist(
  SubjectDetails79$SimilarAgeCount,
  main   = "",
  breaks = seq(from = 0, to = max(SubjectDetails79$SimilarAgeCount, na.rm = TRUE), by = 1)
)
hist(
  SubjectDetails79$KidCountBio,
  main   = "",
  breaks = seq(from = 0, to = max(SubjectDetails79$KidCountBio, na.rm = TRUE), by = 1)
)
hist(
  SubjectDetails79$KidCountInNls,
  main   = "",
  breaks = seq(from = 0, to = max(SubjectDetails79$KidCountInNls, na.rm = TRUE), by = 1)
)
# hist(SubjectDetails79$Mob, main="",
#     breaks=seq.Date(
#       from=min(SubjectDetails79$Mob, na.rm=TRUE),
#       to=max(SubjectDetails79$Mob, na.rm=TRUE),
#       by="year")
# )
par(oldPar)