The predictions from a broken stick model coincide with the
group-conditional means of the random effects. This function takes
an object of class brokenstick
and returns predictions
in one of several formats. The user can calculate predictions
for new persons, i.e., for persons who are not part of
the fitted model, through the x
and y
arguments.
Arguments
- object
A
brokenstick
object.- newdata
Optional. A data frame in which to look for variables with which to predict. The training data are used if omitted and if
object$light
isFALSE
.- ...
Not used, but required for extensibility.
- x
Optional. A numeric vector with values of the predictor. It could also be the special keyword
x = "knots"
replacesx
by the positions of the knots.- y
Optional. A numeric vector with measurements.
- group
A vector with group identifications
- strip_data
A logical indicating whether the row with the observed data from
newdata
should be stripped from the return value. The default isTRUE
. Set toFALSE
to infer which data points are extracted fromnewdata
. Works best forshape = "long"
.- shape
A string:
"long"
(default),"wide"
or"vector"
specifying the shape of the return value. Note that use of"wide"
with many unique values inx
creates an unwieldy, large and sparse matrix.- what
Which knots to predict when
x = "knots"
? Seeget_knots()
. The default,NULL
, calculates all knots.
Value
If shape == "long"
a long data.frame
of predictions. If x
, y
and group
are not specified, the number of rows in the data frame is guaranteed to
be the same as the number of rows in newdata
. If
If shape == "wide"
a wide data.frame
of predictions, one record per group. Note
that this format could be inefficient, depending on the data.
If shape == "vector"
a vector of predicted values, of all x-values and groups.
Details
By default, predict()
calculates predictions for every row in
newdata
. If the user specifies no newdata
argument, then the
function searches object
for the training data (which are only
available if object$light
is FALSE
).
It is possible to tailor the behaviour of predict()
through the
x
, y
and group
arguments. What exactly happens depends on
which of these arguments is specified:
If the user specifies
x
, but noy
andgroup
, the function returns - for every group innewdata
- predictions at the specifiedx
values. This method will use the data fromnewdata
.If the user specifies
x
andy
but nogroup
, the function forms a hypothetical new group with thex
andy
values. This method uses no information fromnewdata
, and also works for a lightbrokenstick
object.If the user specifies
group
, but nox
ory
, the function searches for the relevant data innewdata
and limits its predictions to those groups. This is useful if the user needs a prediction for only one or a few groups. This does not work for a lightbrokenstick
object.If the user specifies
x
andgroup
, but noy
, the function will create new values forx
in eachgroup
, search for the relevant data innewdata
and provide predictions at values ofx
in those groups.If the user specifies
x
,y
andgroup
, the function assumes that these vectors contain additional data on top on what is already available innewdata
. The lengths ofx
,y
andgroup
must match. For a lightbrokenstick
object, case effectively becomes case 6. See below.As case 5, but now without
newdata
available. All data are specified throughx
,y
andgroup
and form a data frame. Matching tonewdata
is attempted, but as long as group id's are different from the training sample effectively new cases will be made.
Examples
if (FALSE) {
library("dplyr")
# -- Data
train <- smocc_200[1:1198, ]
test <- smocc_200[1199:1940, ]
# -- Fit model
fit <- brokenstick(hgt_z ~ age | id, data = train, knots = 0:3, seed = 1)
fit_light <- brokenstick(hgt_z ~ age | id,
data = train, knots = 0:3,
light = TRUE, seed = 1
)
# -- Predict, standard cases
# Use train data, return column with predictions
pred <- predict(fit)
identical(nrow(train), nrow(pred))
# Predict without newdata, not possible for light object
predict(fit_light)
# Use test data
pred <- predict(fit, newdata = test)
identical(nrow(test), nrow(pred))
# Predict, same but using newdata with the light object
pred_light <- predict(fit_light, newdata = test)
identical(pred, pred_light)
# -- Predict, special cases
# -- Case 1: x, -y, -group
# Case 1: x as "knots", standard estimates, train sample (n = 124)
z <- predict(fit, x = "knots", shape = "wide")
head(z, 3)
# Case 1: x as values, linearly interpolated, train sample (n = 124)
z <- predict(fit, x = c(0.5, 1, 1.5), shape = "wide")
head(z, 3)
# Case 1: x as values, linearly interpolated, test sample (n = 76)
z <- predict(fit, test, x = c(0.5, 1, 1.5), shape = "wide")
head(z, 3)
# -- Case 2: x, y, -group
# Case 2: form one new group with id = 0
predict(fit, x = "knots", y = c(1, 1, 0.5, 0), shape = "wide")
# Case 2: works also for a light object
predict(fit_light, x = "knots", y = c(1, 1, 0.5, 0), shape = "wide")
# -- Case 3: -x, -y, group
# Case 3: Predict at observed age for subset of groups, training sample
pred <- predict(fit, group = c(10001, 10005, 10022))
head(pred, 3)
# Case 3: Of course, we cannot do this for light objects
pred_light <- predict(fit_light, group = c(10001, 10005, 10022))
# Case 3: We can use another sample. Note there is no child 999
pred <- predict(fit, test, group = c(11045, 11120, 999))
tail(pred, 3)
# Case 3: Works also for a light object
pred_light <- predict(fit_light, test, group = c(11045, 11120, 999))
identical(pred, pred_light)
# -- Case 4: x, -y, group
# Case 4: Predict at specified x, only in selected groups, train sample
pred <- predict(fit, x = c(0.5, 1, 1.25), group = c(10001, 10005, 10022))
pred
# Case 4: strip_data = FALSE provides access to the observed data
pred_all <- predict(fit,
x = c(0.5, 1, 1.25), group = c(10001, 10005, 10022),
strip_data = FALSE
)
pred_all %>%
dplyr::filter(id == 10001) %>%
dplyr::arrange(age)
# Case 4: Applies also to test sample
pred <- predict(fit, test, x = c(0.5, 1, 1.25), group = c(11045, 11120, 999))
pred
# Case 4: Works also with light object
pred_light <- predict(fit_light, test,
x = c(0.5, 1, 1.25),
group = c(11045, 11120, 999)
)
identical(pred_light, pred)
# -- Case 5: x, y, group
# Case 5: Add new data to training sample, and refreshes broken stick
# estimate at age x.
# Note that novel child (not in train) 999 has one data point
predict(fit,
x = c(0.9, 0.9, 0.9), y = c(1, 1, 1),
group = c(10001, 10005, 999)
)
# Case 5: Same, but now for test sample. Novel child 899 has two data points
predict(fit, test,
x = c(0.5, 0.9, 0.6, 0.9),
y = c(0, 0.5, 0.5, 0.6), group = c(11045, 11120, 899, 899)
)
# Case 5: Also works for light object
predict(fit_light, test,
x = c(0.5, 0.9, 0.6, 0.9),
y = c(0, 0.5, 0.5, 0.6), group = c(11045, 11120, 899, 899)
)
# -- Case 6: As Case 5, but without previous data
# Case 6: Same call as last, but now without newdata = test
# All children are de facto novel as they do not occur in the training sample.
# Note: Predictions for 11045 and 11120 differ from prediction in Case 5.
predict(fit,
x = c(0.5, 0.9, 0.6, 0.9),
y = c(0, 0.5, 0.5, 0.6), group = c(11045, 11120, 899, 899)
)
# This also work for the light brokenstick object
predict(fit_light,
x = c(0.5, 0.9, 0.6, 0.9),
y = c(0, 0.5, 0.5, 0.6), group = c(11045, 11120, 899, 899)
)
}