R/calculate_matches.R
calculate_matches.Rd
Curve matching is a technology that aims to predict individual growth curves. The method finds persons similar to the target person, and learn the possible future course of growth from the realized curves of the matched individuals.
A data.frame
or tbl_df
.
Logical expression defining the set of rows in data
for which matches will be sought. Missing values are taken as false. If
omitted, all rows will be successively taken as targets. This can result in
intensive computation if nrow(data)
is large.
A character vector containing the names of the dependent
variables in data
.
A character vector containing the names of predictive variables
in data
to will go into the linear part of the model.
A character vector containing the names of the variables for which the match should be exact.
A character vector containing the names of the treatment
variables in data
. The current function will only fit the model to
only the first element of t_name
.
Logical expression defining the set of rows taken from
data
. This subset is selected before any other calculations are
made, and this can be used to trim down the size of the data in which
matches are defined and sought.
Requested number of matches. The default is k = 10
.
A logical that indicates whether to match with or without
replacement. The default is FALSE
.
An integer value between 0 and 1 that indicates the blend
between predictive mean matching with replacement (1
) and euclidian
distance matching (0
). The default is 1
.
A logical indicating whether ties should broken randomly.
The default (TRUE
) breaks ties randomly.
A logical that indicates whether the non-active
target cases may be found as a match. The default is TRUE
.
A logical that indicates whether the target case is
included in the model. See details. The default is TRUE
.
A numeric value that serves as the sensitivity parameter for the
inverse distance weighting. Used when drawing with replacement. The default
is 3
.
A logical indicating whether diagnostic information should be printed.
Arguments passed down to match_pmm()
.
An object of class match_list
which can be post-processed by
the extract_matches
function to extract the row numbers in
data
of the matched children. The length of the list will be always
equal to m
if replace == TRUE
, but may be shorter if
replace == FALSE
if the donors are exhausted. The length is zero if
no matches can be found.
The function finds k
matches for an individual in the same data set by
means of stratified predictive mean matching or nearest neighbour matching.
By default, if the outcome variabe of the target case is observed,
then it used to fit the model, together with the candidate donors. The
default behavior can be changed by setting include_target = FALSE
.
Note that if x_name
contains one or more factors, then it is
possible that the factor level of the target case is unique among all
potential donors. In that case, the model can still be fit, but prediction
will fail, and hence no matches will be found.
If break_ties == FALSE
, the function returns the first nmatch
matches as they appear in the order of data
. This method leads to an
overuse of the first part of the data, and hence underestimates
variability. The better option is to break ties randomly (the default).
van Buuren, S. (2014). Curve matching: A data-driven technique to improve individual prediction of childhood growth. Annals of Nutrition & Metabolism, 65(3), 227-233. van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.
library("curvematching")
data <- datasets::ChickWeight
data[543, ]
#> weight Time Chick Diet
#> 543 39 0 48 4
# find matches for observation in row 543 for outcome weight
m <- calculate_matches(data, Time == 0 & Chick == 48, y_name = "weight", x_name = c("Time", "Diet"))
# row numbers of matched cases
extract_matches(m)
#> [1] 294 461 473 485 497 507 519 531 555 567
# data of matched cases
data[extract_matches(m), ]
#> weight Time Chick Diet
#> 294 46 2 27 2
#> 461 42 0 41 4
#> 473 42 0 42 4
#> 485 42 0 43 4
#> 497 42 0 44 4
#> 507 41 0 45 4
#> 519 40 0 46 4
#> 531 41 0 47 4
#> 555 40 0 49 4
#> 567 41 0 50 4