overweight-4y.Rmd
BDS model overweight-4y
predicts overweight risk around
the age of 4 years given child data up to the age of 4 months. The model
was fitted by Mirthe Hendriks for the C4PO project using combined CBS
microdata.
This implementation is based on the random forest model version
20230525_rfr_mtry.rds
and the code book
202211MH codebook JGZ predictors.xlsx
. Here’s a summary of
the fitted model:
Ranger result
Call:
ranger(overweight ~ ., data = overweight_train, num.trees = 1000, importance = "impurity", probability = TRUE)
Type: Probability estimation
Number of trees: 1000
Sample size: 13149
Number of independent variables: 138
Mtry: 11
Target node size: 10
Variable importance mode: impurity
Splitrule: gini
OOB prediction error (Brier s.): 0.0761
The model contains 138 predictors. The names of the predictor variables, ordered in terms of their importance score, are
importance
bmi_4mnd_znl 103.1
gewicht_4mnd_znl 84.0
bmi_3mnd_znl 64.8
gewicht_3mnd_znl 56.5
bmi_8wks_znl 49.6
gewicht_8wks_znl 43.7
bmi_4wks_znl 41.7
PRNL_birthweight 39.3
lengte_4mnd_znl 37.6
gewicht_4wks_znl 36.6
lengte_3mnd_znl 34.8
lengte_8wks_znl 34.1
ZVWK_sumtotal_mo 33.1
income_fa 32.9
lengte_4wks_znl 32.0
income_parents 32.0
ZVWK_hospital_mo 31.9
ZVWK_GP_other_fa 31.2
income_mo 30.9
ZVWK_GP_other_mo 30.9
ZVWK_GP_consult_mo 29.8
ZVWK_pharmacy_mo 29.1
ED_woz 29.0
ZVWK_sumtotal_fa 28.7
age_at_birth_mo 27.4
ZVWK_birth_obstetrician_mo 26.3
ZVWK_birth_maternitycare_mo 26.1
SPOLIS_wages_fa 26.0
age_at_birth_fa 25.9
ZVWK_GP_regist_mo 25.1
hh_vermogen 25.0
ZVWK_GP_regist_fa 24.1
ZVWK_GP_consult_fa 22.9
SPOLIS_wages_mo 22.8
SPOLIS_paidhours_fa 21.9
SPOLIS_paidhours_mo 21.2
ZVWK_pharmacy_fa 20.5
income_hh 18.8
educationlevel_mo 18.3
PRNL_gestational_age_week_f 17.5
ZVWK_appliances_mo 16.7
educationlevel_fa 15.2
ZVWK_hospital_fa 14.9
opl1 11.8
ZVWK_other_mo 10.4
ZVWK_other_fa 10.1
opl2 9.7
geslacht 9.2
STED 9.1
PRNL_parity 8.8
LAND_ETNG_fa 8.7
ZVWK_appliances_fa 8.1
LAND_ACHTS_fa 6.8
ZVWK_patient_transport_lie_mo 6.8
LAND_ETNG_mo 6.7
LAND_ACHTS_mo 6.7
SECM_mo 6.7
SECM_fa 6.3
ZVWK_physical_other_mo 6.2
SPOLIS_contract_mo 5.7
SPOLIS_contract_fa 5.6
LAND_ETNG_gebl2 5.4
LAND_ACHTS_gebl2 5.4
LAND_ACHTS_gebl1 5.3
ED_rentown 5.2
LAND_ETNG_gebl1 5.0
GBA_generation_fa 4.7
income_hh_source 4.6
house_ownership 4.4
ZVWK_dentalcare_mo 4.0
ZVWK_physical_therapy_mo 4.0
ZVWK_mentalhealth_bas_mo 4.0
residence_same_for_parents 3.9
GBA_generation_mo 3.8
ZVWK_mentalhealth_spec_mo 3.7
ZVWK_mentalh_spec_nostay_inst_mo 3.5
SECM_disability_mo 3.1
GBA_generation_kid 3.0
premature_birth 2.9
ZVWK_mentalh_spec_nostay_ind_mo 2.9
l_income_hh_pov_binary 2.8
SECM_selfemployed_fa 2.8
ZVWK_GP_basic_mo 2.7
SECM_student_fa 2.6
l_income_hh_min_binary 2.5
SECM_selfemployed_mo 2.5
SECM_otherwork_mo 2.5
SECM_socialassistance_mo 2.5
ZVWK_mentalhealth_bas_fa 2.5
SECM_employee_fa 2.4
NA_dummy_bmi_4wks_znl 2.4
NA_dummy_bmi_4mnd_znl 2.4
l_income_hh_pov_4j_binary 2.3
l_income_hh_min_4j_binary 2.3
SECM_unemployed_mo 2.3
ZVWK_dentalcare_fa 2.3
ZVWK_abroad_fa 2.3
ZVWK_GP_basic_fa 2.2
ZVWK_patient_transport_lie_fa 2.2
NA_dummy_SPOLIS_wages_mo 2.2
SECM_employee_mo 2.1
SECM_otherwork_fa 2.1
SECM_unemployed_fa 2.1
SECM_disability_fa 2.0
NA_dummy_SPOLIS_wages_fa 2.0
PRNL_multiples 1.9
ZVWK_abroad_mo 1.9
NA_dummy_income_hh 1.9
ZVWK_physical_therapy_fa 1.8
SECM_student_mo 1.7
ZVWK_mentalhealth_spec_fa 1.7
SECM_otherassistance_fa 1.6
NA_dummy_ZVWK_mentalh_spec_other_fa 1.6
SECM_director_fa 1.5
ZVWK_physical_other_fa 1.5
NA_dummy_ZVWK_GP_basic_fa 1.5
SECM_familywork_fa 1.4
ZVWK_mentalh_spec_nostay_inst_fa 1.4
SECM_otherassistance_mo 1.3
SECM_socialassistance_fa 1.2
NA_dummy_ZVWK_GP_basic_mo 1.1
NA_dummy_ZVWK_mentalh_spec_other_mo 1.1
SECM_familywork_mo 1.0
NA_dummy_income_mo 1.0
SECM_retirement_mo 0.9
SECM_retirement_fa 0.9
ZVWK_mentalh_spec_nostay_ind_fa 0.9
ZVWK_patient_transport_sit_mo 0.5
ZVWK_birth_obstetrician_fa 0.5
NA_dummy_income_fa 0.5
ZVWK_birth_maternitycare_fa 0.4
NA_dummy_age_at_birth_fa 0.4
SECM_director_mo 0.3
ZVWK_mentalhealth_spec_stay_fa 0.3
ZVWK_mentalhealth_spec_stay_mo 0.0
ZVWK_mentalh_spec_other_mo 0.0
ZVWK_patient_transport_sit_fa 0.0
ZVWK_mentalh_spec_other_fa 0.0
Not all predictors are defined in the Dutch Basisdataset and many variables have low importance scores.
The following table describes the relevant fields defined by the Dutch Basisdataset for a subset for most important predictors.
Variabele | BDS | JAMES | Imp | In | Comments |
---|---|---|---|---|---|
BMI | 235, 245, 724, 20 | Y | 103 | Y | check |
weight | 245, 724, 20 | Y | 84 | Y | check |
PRNL_birthweight | 110 | Y | 39 | Y | check |
length/height | 235, 724, 20 | Y | 38 | Y | check |
ZVWK_sumtotal_mo | N | 33 | N | not available | |
income_fa | N | 33 | N | not available | |
income_parents | N | 32 | N | not available | |
income_mo | N | 31 | N | not available | |
ED_woz | N | 29 | Y | derive from PC4 | |
ZVWK_sumtotal_fa | N | 29 | N | not available | |
age_at_birth_mo | 63, 62, 20 | Y | 27 | Y | check |
age_at_birth_fa | 63, 62, 20 | Y | 26 | Y | check |
SPOLIS_paidhours_fa | N | 22 | N | not available | |
SPOLIS_paidhours_mo | N | 21 | N | not available | |
PRNL_gestational_age | 82 | Y | 18 | Y | check |
educationlevel_mo | N | 18 | N | duplicate | |
educationlevel_fa | N | 15 | N | duplicate | |
opl1 | 66, 62 | N | 12 | Y | check |
opl2 | 66, 62 | N | 10 | Y | check |
sex | 19 | Y | 9 | Y | check |
PRNL_parity | 741 | N | 9 | Y | check |
STED | 16 | Y | 9 | Y | derive from PC4 |
LAND_ETNG_gebl1 | 71, 62 | Y | 9 | Y | check |
LAND_ACHTS_gebl1 | 71, 62 | N | 9 | N | duplicate |
LAND_ETNG_gebl2 | 71, 62 | Y | 7 | Y | check |
LAND_ACHTS_gebl2 | 71, 62 | N | 7 | N | duplicate |
GBA_generation_fa | N | 5 | N | low imp | |
ED_rentown | N | 5 | N | low imp | |
GBA_generation_mo | N | 4 | N | low imp | |
residence_same_for_parents | N | 4 | N | low imp | |
GBA_generation_kid | N | 3 | N | low imp | |
premature_birth | 82 | N | 3 | N | duplicate |
SECM_{xxx} | N | 3 | N | low imp | |
PRNL_multiples | 108 | N | 2 | N | low imp |
birth date | 20 | Y | - | N | not in model |
measurement date | 724 | Y | - | N | not in model |
height mother | 238 | Y | - | N | not in model |
height father | 240 | Y | - | N | not in model |
weight mother | N | - | N | not in model | |
weight father | N | - | N | not in model | |
smoking pregnancy | 91 | Y | - | N | not in model |
The table maps the overlap between two data sources: BDS (and
accessible through JAMES) and the CBS data used to create the model.
Predictors are ordered in terms of the importance score. The column
In
specifies which predictors are selected to be included
into a compressed model with 15 predictors (12 fixed, 3 time-varying).
The most important variables (child BMI
,
weight
and height
) are available for personal
risk prediction. Income variables are not available, but a
ED_woz
value for the PC4 postal code is a reasonable proxy
to parental income. Age of parents at birth, gestational age and
parental education are included, and well as some of the less important
variables (sex, parity, parental birth country).
Note that importance scores vary depending on what other variables are in the model, so for a proper picture of variable importance, we need to refit the model using the subset of the 15 predictors.
In other to be able to apply the random forest model in practice, it is vital that the variable codings used in the CBS-model and JAMES-model match.
Here is a proposal for variable coding in both models:
JAMES | CBS name | Code | Description |
---|---|---|---|
bmi_z |
BMI_{time}_znl |
Body Mass Index, Z-score (AGD::nl4.bmi) | |
hgt_z |
lengte_{time}_znl |
Length/height, Z-score (AGD::nl4.hgt) | |
wgt_z |
gewicht_{time}_znl |
Weight, Z-score (AGD::nl4.wgt) | |
age |
Decimal age: round((date - dob) / 365.25, 4) | ||
bw |
PRNL_birthweight |
Birth weight (grammes) | |
woz |
ED_woz |
WOZ value - immovable property (EURO) | |
agem |
age_at_birth_mo |
Mother age at child’s birth (years) | |
agef |
age_at_birth_fa |
Father age at child’s birth (years) | |
ga |
PRNL_gestational_age |
Gestational age (completed weeks) | |
eduf |
opl1 |
Level of education, biol father (number) | |
1 | None | ||
2 | Basis | ||
3 | MLK/VMBO-LWOO/LBO/VBO/VMBO-BBL&KBL | ||
4 | MAVO/VMBO/GL&TL | ||
5 | MBO | ||
6 | HAVO/VWO | ||
7 | HBO/HTS/HEAO | ||
8 | WO | ||
NA |
Other (98), unknown (00) | ||
edum |
opl2 |
Level of education, biol mother (number) | |
See opl1 codes |
|||
sex |
geslacht |
Sex of child (number) | |
1 | Male | ||
0 | Female | ||
NA |
Unknown, undetermined | ||
par |
PRNL_parity |
Number of births (GA >=16w), inc this (number) | |
urb |
STED |
Urbanisation grade (number) | |
1 | >= 2500 addresses/km^2 | ||
2 | 1500-2500 addresses/km^2 | ||
3 | 1000-1500 addresses/km^2 | ||
4 | 500-1000 addresses/km^2 | ||
5 | < 500 addresses/km^2 | ||
NA |
Unknown | ||
ctrf |
LAND_ACHTS_gebl1 |
Country of birth, biological father (factor) | |
1 | Netherlands | ||
2 | EU-15, other developed economies | ||
3 | New EU-countries, economies in transition | ||
4 | Northern Africa | ||
5 | East Asia | ||
6 | Other Africa, Asia, Latin America | ||
7 | Surinam and (former) Nederlands Antilles | ||
8 | Turkey | ||
9 | Unknown | ||
ctrm |
LAND_ACHTS_gebl2 |
Country of birth, biological mother (factor) | |
See LAND_ACHTS_gebl1 codes |