Calculates matches for one or more children by blended distance matching

Curve matching is a technology that aims to predict individual growth curves. The method finds persons similar to the target person, and learn the possible future course of growth from the realized curves of the matched individuals.

calculate_matches2(
  data,
  newdata,
  y_name = character(0L),
  x_name = character(0L),
  e_name = character(0L),
  t_name = character(0L),
  subset = TRUE,
  k = 10L,
  replace = FALSE,
  blend = 1,
  break_ties = TRUE,
  kappa = 3,
  verbose = TRUE,
  ...
)

Arguments

data: A data.frame or tbl_df with donor data. One row is one potential donor.
newdata: A data.frame or tbl_df with data of children for which we seek matches. Every row corresponds to one child.
y_name: A character vector containing the names of the dependent variables in data.
x_name: A character vector containing the names of predictive variables in data to will go into the linear part of the model.
e_name: A character vector containing the names of the variables for which the match should be exact.
t_name: A character vector containing the names of the treatment variables in data. The current function will only fit the model to only the first element of t_name.
subset: Logical expression defining the set of rows taken from data. This subset is selected before any other calculations are made, and this can be used to trim down the size of the data in which matches are defined and sought.
k: Requested number of matches. The default is k = 10.
replace: A logical that indicates whether to match with or without replacement. The default is FALSE.
blend: An integer value between 0 and 1 that indicates the blend between predictive mean matching with replacement (1) and euclidian distance matching (0). The default is 1.
break_ties: A logical indicating whether ties should broken randomly. The default (TRUE) breaks ties randomly.
kappa: A numeric value that serves as the sensitivity parameter for the inverse distance weighting. Used when drawing with replacement. The default is 3.
verbose: A logical indicating whether diagnostic information should be printed.
...: Arguments passed down to match_pmm().

Value

An object of class match_list which can be post-processed by the extract_matches function to extract the row numbers in data of the matched children. The length of the list will be always equal to m if replace == TRUE, but may be shorter if replace == FALSE if the donors are exhausted. The length is zero if no matches can be found.

Details

The function finds k matches for an individual in the same data set by means of stratified predictive mean matching or by nearest neighbour matching.

The procedure search for matches in data for each row in newdata. Note that if x_name contains one or more factors, then it is possible that the factor level of the target case is unique among all potential donors. In that case, the model can still be fit, but prediction will fail, and hence no matches will be found.

If break_ties is FALSE, the function returns the first nmatch matches as they appear in the order of data. This method overuses the first part of the data if there are ties, and hence may underestimate variability. The default option is to break ties randomly.

References

van Buuren, S. (2014). Curve matching: A data-driven technique to improve individual prediction of childhood growth. Annals of Nutrition & Metabolism, 65(3), 227-233. van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.

Author

Stef van Buuren 2021

Examples

data <- datasets::ChickWeight
data[1, ]
#>   weight Time Chick Diet
#> 1     42    0     1    1

# find matches for observation in row 1
m1 <- calculate_matches2(data, data[1, ],
  subset = !rownames(data) %in% "1",
  y_name = "weight", x_name = c("Time", "Diet")
)

# data of matched cases (may vary because of tie breaking)
data[extract_matches(m1), ]
#>     weight Time Chick Diet
#> 13      40    0     2    1
#> 25      43    0     3    1
#> 61      41    0     6    1
#> 73      41    0     7    1
#> 85      42    0     8    1
#> 108     41    0    10    1
#> 132     41    0    12    1
#> 156     41    0    14    1
#> 176     41    0    16    1
#> 209     41    0    20    1

# without tie breaking, we pick the earlier rows (not recommended)
m2 <- calculate_matches2(data, data[1, ],
  subset = !rownames(data) %in% "1",
  y_name = "weight", x_name = c("Time", "Diet"), break_ties = FALSE
)
data[extract_matches(m2), ]
#>     weight Time Chick Diet
#> 13      40    0     2    1
#> 25      43    0     3    1
#> 37      42    0     4    1
#> 49      41    0     5    1
#> 61      41    0     6    1
#> 73      41    0     7    1
#> 85      42    0     8    1
#> 96      42    0     9    1
#> 108     41    0    10    1
#> 120     43    0    11    1