R/calculate_matches2.R
calculate_matches2.Rd
Curve matching is a technology that aims to predict individual growth curves. The method finds persons similar to the target person, and learn the possible future course of growth from the realized curves of the matched individuals.
A data.frame
or tbl_df
with donor data. One row is
one potential donor.
A data.frame
or tbl_df
with data of children for
which we seek matches. Every row corresponds to one child.
A character vector containing the names of the dependent
variables in data
.
A character vector containing the names of predictive variables
in data
to will go into the linear part of the model.
A character vector containing the names of the variables for which the match should be exact.
A character vector containing the names of the treatment
variables in data
. The current function will only fit the model to
only the first element of t_name
.
Logical expression defining the set of rows taken from
data
. This subset is selected before any other calculations are
made, and this can be used to trim down the size of the data in which
matches are defined and sought.
Requested number of matches. The default is k = 10
.
A logical that indicates whether to match with or without
replacement. The default is FALSE
.
An integer value between 0 and 1 that indicates the blend
between predictive mean matching with replacement (1
) and euclidian
distance matching (0
). The default is 1
.
A logical indicating whether ties should broken randomly.
The default (TRUE
) breaks ties randomly.
A numeric value that serves as the sensitivity parameter for the
inverse distance weighting. Used when drawing with replacement. The default
is 3
.
A logical indicating whether diagnostic information should be printed.
Arguments passed down to match_pmm()
.
An object of class match_list
which can be post-processed by
the extract_matches
function to extract the row numbers in
data
of the matched children. The length of the list will be always
equal to m
if replace == TRUE
, but may be shorter if
replace == FALSE
if the donors are exhausted. The length is zero if
no matches can be found.
The function finds k
matches for an individual in the same data set by
means of stratified predictive mean matching or by nearest neighbour matching.
The procedure search for matches in data
for each row
in newdata
.
Note that if x_name
contains one or more factors, then it is
possible that the factor level of the target case is unique among all
potential donors. In that case, the model can still be fit, but prediction
will fail, and hence no matches will be found.
If break_ties
is FALSE
, the function returns the first nmatch
matches as they appear in the order of data
. This method overuses
the first part of the data if there are ties, and hence may underestimate
variability. The default option is to break ties randomly.
van Buuren, S. (2014). Curve matching: A data-driven technique to improve individual prediction of childhood growth. Annals of Nutrition & Metabolism, 65(3), 227-233. van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.
data <- datasets::ChickWeight
data[1, ]
#> weight Time Chick Diet
#> 1 42 0 1 1
# find matches for observation in row 1
m1 <- calculate_matches2(data, data[1, ],
subset = !rownames(data) %in% "1",
y_name = "weight", x_name = c("Time", "Diet")
)
# data of matched cases (may vary because of tie breaking)
data[extract_matches(m1), ]
#> weight Time Chick Diet
#> 13 40 0 2 1
#> 25 43 0 3 1
#> 61 41 0 6 1
#> 73 41 0 7 1
#> 85 42 0 8 1
#> 108 41 0 10 1
#> 132 41 0 12 1
#> 156 41 0 14 1
#> 176 41 0 16 1
#> 209 41 0 20 1
# without tie breaking, we pick the earlier rows (not recommended)
m2 <- calculate_matches2(data, data[1, ],
subset = !rownames(data) %in% "1",
y_name = "weight", x_name = c("Time", "Diet"), break_ties = FALSE
)
data[extract_matches(m2), ]
#> weight Time Chick Diet
#> 13 40 0 2 1
#> 25 43 0 3 1
#> 37 42 0 4 1
#> 49 41 0 5 1
#> 61 41 0 6 1
#> 73 41 0 7 1
#> 85 42 0 8 1
#> 96 42 0 9 1
#> 108 41 0 10 1
#> 120 43 0 11 1