Curve matching is a technology that aims to predict individual growth curves. The method finds persons similar to the target person, and learn the possible future course of growth from the realized curves of the matched individuals.

calculate_matches(
  data,
  condition,
  y_name,
  x_name = character(),
  e_name = character(),
  t_name = character(),
  subset = NULL,
  k = 10,
  replace = FALSE,
  blend = 1,
  break_ties = TRUE,
  allow_matched_targets = TRUE,
  include_target = TRUE,
  kappa = 3,
  verbose = TRUE,
  ...
)

Arguments

data

A data.frame or tbl_df.

condition

Logical expression defining the set of rows in data for which matches will be sought. Missing values are taken as false. If omitted, all rows will be successively taken as targets. This can result in intensive computation if nrow(data) is large.

y_name

A character vector containing the names of the dependent variables in data.

x_name

A character vector containing the names of predictive variables in data to will go into the linear part of the model.

e_name

A character vector containing the names of the variables for which the match should be exact.

t_name

A character vector containing the names of the treatment variables in data. The current function will only fit the model to only the first element of t_name.

subset

Logical expression defining the set of rows taken from data. This subset is selected before any other calculations are made, and this can be used to trim down the size of the data in which matches are defined and sought.

k

Requested number of matches. The default is k = 10.

replace

A logical that indicates whether to match with or without replacement. The default is FALSE.

blend

An integer value between 0 and 1 that indicates the blend between predictive mean matching with replacement (1) and euclidian distance matching (0). The default is 1.

break_ties

A logical indicating whether ties should broken randomly. The default (TRUE) breaks ties randomly.

allow_matched_targets

A logical that indicates whether the non-active target cases may be found as a match. The default is TRUE.

include_target

A logical that indicates whether the target case is included in the model. See details. The default is TRUE.

kappa

A numeric value that serves as the sensitivity parameter for the inverse distance weighting. Used when drawing with replacement. The default is 3.

verbose

A logical indicating whether diagnostic information should be printed.

...

Arguments passed down to match_pmm().

Value

An object of class match_list which can be post-processed by the extract_matches function to extract the row numbers in data of the matched children. The length of the list will be always equal to m if replace == TRUE, but may be shorter if replace == FALSE if the donors are exhausted. The length is zero if no matches can be found.

Details

The function finds k matches for an individual in the same data set by means of stratified predictive mean matching or nearest neighbour matching.

By default, if the outcome variabe of the target case is observed, then it used to fit the model, together with the candidate donors. The default behavior can be changed by setting include_target = FALSE. Note that if x_name contains one or more factors, then it is possible that the factor level of the target case is unique among all potential donors. In that case, the model can still be fit, but prediction will fail, and hence no matches will be found.

If break_ties == FALSE, the function returns the first nmatch matches as they appear in the order of data. This method leads to an overuse of the first part of the data, and hence underestimates variability. The better option is to break ties randomly (the default).

References

van Buuren, S. (2014). Curve matching: A data-driven technique to improve individual prediction of childhood growth. Annals of Nutrition & Metabolism, 65(3), 227-233. van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.

Author

Stef van Buuren 2017

Examples

library("curvematching")
data <- datasets::ChickWeight
data[543, ]
#>     weight Time Chick Diet
#> 543     39    0    48    4

# find matches for observation in row 543 for outcome weight
m <- calculate_matches(data, Time == 0 & Chick == 48, y_name = "weight", x_name = c("Time", "Diet"))

# row numbers of matched cases
extract_matches(m)
#>  [1] 294 461 473 485 497 507 519 531 555 567

# data of matched cases
data[extract_matches(m), ]
#>     weight Time Chick Diet
#> 294     46    2    27    2
#> 461     42    0    41    4
#> 473     42    0    42    4
#> 485     42    0    43    4
#> 497     42    0    44    4
#> 507     41    0    45    4
#> 519     40    0    46    4
#> 531     41    0    47    4
#> 555     40    0    49    4
#> 567     41    0    50    4