R/survey-79-doc.R
Survey79.Rd
Each row represents a survey that a subject completed (or didn't complete). It can be very helpful when restructuring the NLS investigator extracts into a longitudinal dataset that's aligned by age (instead of by survey wave). The Age variables can help to align other response variables across subjects. While the 'SurveySource' indicates where to look for their responses.
These variables are useful to many types of analyses (not just behavior genetics), and are provided to save users time.
A data frame with 580,752 observations on the following 6 variables.
SubjectTag see the variable of the same name in Links79Pair
SurveySource The location of that subject's survey responses that year. Values are NoInterview
, Gen1
, Gen2C
or Gen2YA
.
SurveyYear The year/wave of the survey.
Survey79 The exact date of the administered survey.
Age The subject's age at the time of the survey, rounded to 1 decimal. See Details below.
Gen1 information comes from the Summer 2013 release of the NLSY79 sample. Gen2 information comes from the January 2015 release of the NLSY79 Children and Young Adults sample. Data were extracted with the NLS Investigator (https://www.nlsinfo.org/investigator/).
Age
uses a variable called AgeCalculateYears
(according to a their own
response, or their mother's response) if it's available, and uses
AgeSelfReportYears
(according to a their own response, or their mother's
response)
if not. The values usually agree, but not always.
These two separated variables are available for download,
but withheld from the R package
(otherwise we would exceed CRAN's maximum size).
The exact date of birth isn't public (only the subject's month of birth).
To balance the downward bias of two weeks, their birthday is set to the
15th day of the month to produce AgeCalculateYears
.
In the Gen2 Child dataset, self-reported age is stated by month (eg, the child is 38 months old); a constant of 0.5 months has been added to balance the downward bias. In the Gen2 YA and Gen1 datasets, self-reported age is stated by year (eg, the subject is 52 years old); a constant of 0.5 years has been added.
library(NlsyLinks) # Load the package into the current R session.
summary(Survey79)
#> SubjectTag SurveySource SurveyYear Age
#> Min. : 100 NoInterview:257047 Min. :1979 Min. : 0.00
#> 1st Qu.: 313900 Gen1 :236338 1st Qu.:1985 1st Qu.:18.50
#> Median : 619752 Gen2C : 52038 Median :1990 Median :24.70
#> Mean : 618530 Gen2YA : 35329 Mean :1992 Mean :25.31
#> 3rd Qu.: 914403 3rd Qu.:1998 3rd Qu.:32.30
#> Max. :1268600 Max. :2010 Max. :54.90
#> NA's :257047
table(Survey79$SurveyYear, Survey79$SurveySource)
#>
#> NoInterview Gen1 Gen2C Gen2YA
#> 1979 11512 12686 0 0
#> 1980 12057 12141 0 0
#> 1981 12003 12195 0 0
#> 1982 12075 12123 0 0
#> 1983 11977 12221 0 0
#> 1984 12129 12069 0 0
#> 1985 13304 10894 0 0
#> 1986 8581 10655 4962 0
#> 1987 13713 10485 0 0
#> 1988 8064 10465 5669 0
#> 1989 13593 10605 0 0
#> 1990 7971 10436 5791 0
#> 1991 15180 9018 0 0
#> 1992 8674 9016 6508 0
#> 1993 15187 9011 0 0
#> 1994 8287 8889 6042 980
#> 1996 8494 8636 5396 1672
#> 1998 8764 8399 4896 2139
#> 2000 9755 8033 3385 3025
#> 2002 9047 7724 3189 4238
#> 2004 9058 7661 2455 5024
#> 2006 8944 7654 1756 5844
#> 2008 8936 7757 1200 6305
#> 2010 9742 7565 789 6102
table(is.na(Survey79$AgeSelfReportYears), is.na(Survey79$AgeCalculateYears))
#> < table of extent 0 x 0 >
if (require(ggplot2) & require(dplyr)) {
dsSourceYear <- Survey79 %>%
dplyr::count(SurveyYear, SurveySource) %>%
dplyr::filter(SurveySource != "NoInterview")
Survey79 %>%
dplyr::filter(SurveySource != "NoInterview") %>%
dplyr::group_by(SurveySource, SurveyYear) %>%
dplyr::summarize(
age_min = min(Age, na.rm = TRUE),
age_max = max(Age, na.rm = TRUE)
) %>%
dplyr::ungroup() %>%
ggplot(aes(x = SurveyYear, ymin = age_min, ymax = age_max, color = SurveySource)) +
geom_errorbar() +
scale_color_brewer(palette = "Dark2") +
theme_minimal() +
theme(legend.position = c(0, 1), legend.justification = c(0, 1))
}
#> Loading required package: ggplot2
#> Loading required package: dplyr
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
#> `summarise()` has grouped output by 'SurveySource'. You can override using the
#> `.groups` argument.
#> Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
#> 3.5.0.
#> ℹ Please use the `legend.position.inside` argument of `theme()` instead.