Description

BDS model overweight-4y predicts overweight risk around the age of 4 years given child data up to the age of 4 months. The model was fitted by Mirthe Hendriks for the C4PO project using combined CBS microdata.

This implementation is based on the random forest model version 20230525_rfr_mtry.rds and the code book 202211MH codebook JGZ predictors.xlsx. Here’s a summary of the fitted model:

Ranger result

Call:
 ranger(overweight ~ ., data = overweight_train, num.trees = 1000,      importance = "impurity", probability = TRUE) 

Type:                             Probability estimation 
Number of trees:                  1000 
Sample size:                      13149 
Number of independent variables:  138 
Mtry:                             11 
Target node size:                 10 
Variable importance mode:         impurity 
Splitrule:                        gini 
OOB prediction error (Brier s.):  0.0761 

Predictor names and importance

The model contains 138 predictors. The names of the predictor variables, ordered in terms of their importance score, are

                                    importance
bmi_4mnd_znl                             103.1
gewicht_4mnd_znl                          84.0
bmi_3mnd_znl                              64.8
gewicht_3mnd_znl                          56.5
bmi_8wks_znl                              49.6
gewicht_8wks_znl                          43.7
bmi_4wks_znl                              41.7
PRNL_birthweight                          39.3
lengte_4mnd_znl                           37.6
gewicht_4wks_znl                          36.6
lengte_3mnd_znl                           34.8
lengte_8wks_znl                           34.1
ZVWK_sumtotal_mo                          33.1
income_fa                                 32.9
lengte_4wks_znl                           32.0
income_parents                            32.0
ZVWK_hospital_mo                          31.9
ZVWK_GP_other_fa                          31.2
income_mo                                 30.9
ZVWK_GP_other_mo                          30.9
ZVWK_GP_consult_mo                        29.8
ZVWK_pharmacy_mo                          29.1
ED_woz                                    29.0
ZVWK_sumtotal_fa                          28.7
age_at_birth_mo                           27.4
ZVWK_birth_obstetrician_mo                26.3
ZVWK_birth_maternitycare_mo               26.1
SPOLIS_wages_fa                           26.0
age_at_birth_fa                           25.9
ZVWK_GP_regist_mo                         25.1
hh_vermogen                               25.0
ZVWK_GP_regist_fa                         24.1
ZVWK_GP_consult_fa                        22.9
SPOLIS_wages_mo                           22.8
SPOLIS_paidhours_fa                       21.9
SPOLIS_paidhours_mo                       21.2
ZVWK_pharmacy_fa                          20.5
income_hh                                 18.8
educationlevel_mo                         18.3
PRNL_gestational_age_week_f               17.5
ZVWK_appliances_mo                        16.7
educationlevel_fa                         15.2
ZVWK_hospital_fa                          14.9
opl1                                      11.8
ZVWK_other_mo                             10.4
ZVWK_other_fa                             10.1
opl2                                       9.7
geslacht                                   9.2
STED                                       9.1
PRNL_parity                                8.8
LAND_ETNG_fa                               8.7
ZVWK_appliances_fa                         8.1
LAND_ACHTS_fa                              6.8
ZVWK_patient_transport_lie_mo              6.8
LAND_ETNG_mo                               6.7
LAND_ACHTS_mo                              6.7
SECM_mo                                    6.7
SECM_fa                                    6.3
ZVWK_physical_other_mo                     6.2
SPOLIS_contract_mo                         5.7
SPOLIS_contract_fa                         5.6
LAND_ETNG_gebl2                            5.4
LAND_ACHTS_gebl2                           5.4
LAND_ACHTS_gebl1                           5.3
ED_rentown                                 5.2
LAND_ETNG_gebl1                            5.0
GBA_generation_fa                          4.7
income_hh_source                           4.6
house_ownership                            4.4
ZVWK_dentalcare_mo                         4.0
ZVWK_physical_therapy_mo                   4.0
ZVWK_mentalhealth_bas_mo                   4.0
residence_same_for_parents                 3.9
GBA_generation_mo                          3.8
ZVWK_mentalhealth_spec_mo                  3.7
ZVWK_mentalh_spec_nostay_inst_mo           3.5
SECM_disability_mo                         3.1
GBA_generation_kid                         3.0
premature_birth                            2.9
ZVWK_mentalh_spec_nostay_ind_mo            2.9
l_income_hh_pov_binary                     2.8
SECM_selfemployed_fa                       2.8
ZVWK_GP_basic_mo                           2.7
SECM_student_fa                            2.6
l_income_hh_min_binary                     2.5
SECM_selfemployed_mo                       2.5
SECM_otherwork_mo                          2.5
SECM_socialassistance_mo                   2.5
ZVWK_mentalhealth_bas_fa                   2.5
SECM_employee_fa                           2.4
NA_dummy_bmi_4wks_znl                      2.4
NA_dummy_bmi_4mnd_znl                      2.4
l_income_hh_pov_4j_binary                  2.3
l_income_hh_min_4j_binary                  2.3
SECM_unemployed_mo                         2.3
ZVWK_dentalcare_fa                         2.3
ZVWK_abroad_fa                             2.3
ZVWK_GP_basic_fa                           2.2
ZVWK_patient_transport_lie_fa              2.2
NA_dummy_SPOLIS_wages_mo                   2.2
SECM_employee_mo                           2.1
SECM_otherwork_fa                          2.1
SECM_unemployed_fa                         2.1
SECM_disability_fa                         2.0
NA_dummy_SPOLIS_wages_fa                   2.0
PRNL_multiples                             1.9
ZVWK_abroad_mo                             1.9
NA_dummy_income_hh                         1.9
ZVWK_physical_therapy_fa                   1.8
SECM_student_mo                            1.7
ZVWK_mentalhealth_spec_fa                  1.7
SECM_otherassistance_fa                    1.6
NA_dummy_ZVWK_mentalh_spec_other_fa        1.6
SECM_director_fa                           1.5
ZVWK_physical_other_fa                     1.5
NA_dummy_ZVWK_GP_basic_fa                  1.5
SECM_familywork_fa                         1.4
ZVWK_mentalh_spec_nostay_inst_fa           1.4
SECM_otherassistance_mo                    1.3
SECM_socialassistance_fa                   1.2
NA_dummy_ZVWK_GP_basic_mo                  1.1
NA_dummy_ZVWK_mentalh_spec_other_mo        1.1
SECM_familywork_mo                         1.0
NA_dummy_income_mo                         1.0
SECM_retirement_mo                         0.9
SECM_retirement_fa                         0.9
ZVWK_mentalh_spec_nostay_ind_fa            0.9
ZVWK_patient_transport_sit_mo              0.5
ZVWK_birth_obstetrician_fa                 0.5
NA_dummy_income_fa                         0.5
ZVWK_birth_maternitycare_fa                0.4
NA_dummy_age_at_birth_fa                   0.4
SECM_director_mo                           0.3
ZVWK_mentalhealth_spec_stay_fa             0.3
ZVWK_mentalhealth_spec_stay_mo             0.0
ZVWK_mentalh_spec_other_mo                 0.0
ZVWK_patient_transport_sit_fa              0.0
ZVWK_mentalh_spec_other_fa                 0.0

Connection between predictors and Basisdataset

Not all predictors are defined in the Dutch Basisdataset and many variables have low importance scores.

The following table describes the relevant fields defined by the Dutch Basisdataset for a subset for most important predictors.

Variabele BDS JAMES Imp In Comments
BMI 235, 245, 724, 20 Y 103 Y check
weight 245, 724, 20 Y 84 Y check
PRNL_birthweight 110 Y 39 Y check
length/height 235, 724, 20 Y 38 Y check
ZVWK_sumtotal_mo N 33 N not available
income_fa N 33 N not available
income_parents N 32 N not available
income_mo N 31 N not available
ED_woz N 29 Y derive from PC4
ZVWK_sumtotal_fa N 29 N not available
age_at_birth_mo 63, 62, 20 Y 27 Y check
age_at_birth_fa 63, 62, 20 Y 26 Y check
SPOLIS_paidhours_fa N 22 N not available
SPOLIS_paidhours_mo N 21 N not available
PRNL_gestational_age 82 Y 18 Y check
educationlevel_mo N 18 N duplicate
educationlevel_fa N 15 N duplicate
opl1 66, 62 N 12 Y check
opl2 66, 62 N 10 Y check
sex 19 Y 9 Y check
PRNL_parity 741 N 9 Y check
STED 16 Y 9 Y derive from PC4
LAND_ETNG_gebl1 71, 62 Y 9 Y check
LAND_ACHTS_gebl1 71, 62 N 9 N duplicate
LAND_ETNG_gebl2 71, 62 Y 7 Y check
LAND_ACHTS_gebl2 71, 62 N 7 N duplicate
GBA_generation_fa N 5 N low imp
ED_rentown N 5 N low imp
GBA_generation_mo N 4 N low imp
residence_same_for_parents N 4 N low imp
GBA_generation_kid N 3 N low imp
premature_birth 82 N 3 N duplicate
SECM_{xxx} N 3 N low imp
PRNL_multiples 108 N 2 N low imp
birth date 20 Y - N not in model
measurement date 724 Y - N not in model
height mother 238 Y - N not in model
height father 240 Y - N not in model
weight mother N - N not in model
weight father N - N not in model
smoking pregnancy 91 Y - N not in model

The table maps the overlap between two data sources: BDS (and accessible through JAMES) and the CBS data used to create the model. Predictors are ordered in terms of the importance score. The column In specifies which predictors are selected to be included into a compressed model with 15 predictors (12 fixed, 3 time-varying). The most important variables (child BMI, weight and height) are available for personal risk prediction. Income variables are not available, but a ED_woz value for the PC4 postal code is a reasonable proxy to parental income. Age of parents at birth, gestational age and parental education are included, and well as some of the less important variables (sex, parity, parental birth country).

Note that importance scores vary depending on what other variables are in the model, so for a proper picture of variable importance, we need to refit the model using the subset of the 15 predictors.

Predictor codings

In other to be able to apply the random forest model in practice, it is vital that the variable codings used in the CBS-model and JAMES-model match.

Here is a proposal for variable coding in both models:

JAMES CBS name Code Description
bmi_z BMI_{time}_znl Body Mass Index, Z-score (AGD::nl4.bmi)
hgt_z lengte_{time}_znl Length/height, Z-score (AGD::nl4.hgt)
wgt_z gewicht_{time}_znl Weight, Z-score (AGD::nl4.wgt)
age Decimal age: round((date - dob) / 365.25, 4)
bw PRNL_birthweight Birth weight (grammes)
woz ED_woz WOZ value - immovable property (EURO)
agem age_at_birth_mo Mother age at child’s birth (years)
agef age_at_birth_fa Father age at child’s birth (years)
ga PRNL_gestational_age Gestational age (completed weeks)
eduf opl1 Level of education, biol father (number)
1 None
2 Basis
3 MLK/VMBO-LWOO/LBO/VBO/VMBO-BBL&KBL
4 MAVO/VMBO/GL&TL
5 MBO
6 HAVO/VWO
7 HBO/HTS/HEAO
8 WO
NA Other (98), unknown (00)
edum opl2 Level of education, biol mother (number)
See opl1 codes
sex geslacht Sex of child (number)
1 Male
0 Female
NA Unknown, undetermined
par PRNL_parity Number of births (GA >=16w), inc this (number)
urb STED Urbanisation grade (number)
1 >= 2500 addresses/km^2
2 1500-2500 addresses/km^2
3 1000-1500 addresses/km^2
4 500-1000 addresses/km^2
5 < 500 addresses/km^2
NA Unknown
ctrf LAND_ACHTS_gebl1 Country of birth, biological father (factor)
1 Netherlands
2 EU-15, other developed economies
3 New EU-countries, economies in transition
4 Northern Africa
5 East Asia
6 Other Africa, Asia, Latin America
7 Surinam and (former) Nederlands Antilles
8 Turkey
9 Unknown
ctrm LAND_ACHTS_gebl2 Country of birth, biological mother (factor)
See LAND_ACHTS_gebl1 codes

Strategy to deal with missing predictor values

We still have to devise a strategy here.