help prsnperd
--------------------------------------------------------------------------------------------------------------------------------
Title
prsnperd -- A utility for creating person-period datasets for discrete time longitudinal analyses
Syntax
prsnperd id length-to-event [censor] [, truncate(#) pretrunc(#) cswitch tvp(names) fev(name) copyleft]
options description
--------------------------------------------------------------------------------------------------------------------------
Miscellaneous
truncate(#) truncate the maximum time of length-to-event
pretrunc(#) ignore some initial time periods in the model
cswitch invert censor coding
tvp(names) provide root names of flat-encoded time varying predictors
fev(name) provide root name of flat-encoded time varying event occurrence
copyleft display license information
--------------------------------------------------------------------------------------------------------------------------
Description
prsnperd transforms a person-time dataset into a person-period dataset for discrete-time longitudinal analyses, for
example, using dthaz. Input variables are id: the unique id number of each observed individual in the person-time dataset;
length-to-event: the duration to event occurrence (in number of discrete time intervals since the study's Beginning of
Time); and censor, which indicates censoring status of the observed individual (where 0 = not censored; and 1 = censored,
unless the cswitch option is used). Given an input data set of this form, an output dataset is created with expanded
observations and several new variables.
NOTE: individuals who were never observed to have experienced an event should be coded as having a length-to-event equal
to their total time in the study, and should be censored.
Each individual observation within the person-time dataset is replaced with a number of new observations equal to
length-to-event for that id. If there is no event occurrence for a given time period, the user is so notified. Within
these new observations either one, or several new variables are created, depending on whether the survival analysis or
growth-modeling syntax is used. If application is for growth modeling, then only the _period variable is created,
otherwise all the following variables are produced.
_period Specific time interval of this observation. Each id will have at least one observation with _period = 1. The
maximum value for _period is equal to the maximum length-to-event of the person-time dataset (or to truncate if
specified).
_d1-_dX (Where X is the maximum value for period) These are indicator variables (i.e. "dummy variables") for the current
period.
_Y _Y indicates event occurrence for the given period (where 0 = event did not happen and 1 = event happened). _Y is
usefully employed as the outcome in event history models. As in a simple logit hazard model:
. logit _Y d1-d8, nocons
produces an estimate of baseline hazard corresponding perfectly with the sample hazard where ^H(t_j) =
1/1+e^-(B_j). The estimate becomes more interesting when additional predictors are added thus:
. logit _Y d1-d8 age, nocons or
Exploration of estimated differences in ^H(t_j) can therefore be modeled using standard nested models of multiple
predictors. The or function provides estimated odds for hazard of event compared to non-event for each predictor.
_status A categorical status variable for producing life-tables (where 1 = event occurred; 2 = event did not occur; and 3
= censored). Life tables with sample hazard can be created by using the following:
. tabulate _period _status, row
Options
+---------------+
----+ Miscellaneous +-----------------------------------------------------------------------------------------------------
truncate(#) restricts the maximum value for length-to-event, censoring those observations with integer values greater than
truncate.
NOTE: Specifying values of truncate greater than the maximum value of length-to-event (or specifying negative values)
produces the same dataset as one with no value of truncate specified.
pretrunc(#) discards early time periods from the new dataset. For example, when pre-truncating with a value of 2, the
period that would be indicated by _d3 becomes _d1 instead, and the value of _period would be decreased by 2.
NOTE: Specifying values of truncate greater than the one minus the maximum value of length-to-event (or specifying
negative values) produces the same dataset as one with no value of truncate specified. Also, truncate and pretrunc cannot be
combined when their values would result in fewer than two periods. Discrete time survival analyses conducted upon
pre-truncated datasets are, in effect analyses conducted upon separate populations from the not pre-truncated datasets
{it:if the conditional hazard during the pre-truncated periods is greater than zero}. The author suggests that an analyst
may desire to perform a pre-truncated analysis either because there are no events during initial periods, or because she is
interested in analyzing a surviving sub-population at a later starting period. However, in cases where events occurred
during the pre-truncated periods, a survival analysis cannot be said to generalize to the population of the not
pre-truncated dataset. In cases where events occur in initial periods, but at rates that are too few to provide reliable
estimates for these periods, the analyst should both employ a sensitivity analysis to describe differences between models on
pre-truncated and not pre-truncated datasets, but also examine the characteristics of anomalous individuals--qualitative
data may particularly help illuminate how these persons differ from the majority of individuals who remain in the
pre-truncated dataset.
cswitch tells prsnperd to expect that censored data are coded with 0 = censored, and 1 = event/failure.
tvp(names) generates variable(s) with the supplied name(s) if the names correspond precisely to prefixed portions of flat
coded time varying predictors. Person-time data sets are often constructed with time-varying predictors encoded in
such a format (for example, predictor1, predictor2, predictor3, predictor4, where the numeric suffix indicates which
time-period the observation was made in). Missing values will not be imputed. The time-designation in the suffix must
be ordered in the same manner as the periods of observation.
fev(name) constructs variables named length_to_event and censored with appropriate values if event data are in a flat
indicator format (for example, event1 event2 event3 event4), rather than in a single length-to-event variable by
specifying the common portion of the event variables' names (for example "event"). This option assumes that all event
variables share a common prefix (name), that name has values 0 (no event), 1 (event, or first event), or . (censored),
and that there is no left-censoring of observations. The created variables will override supplied length-to-event and
censored variables. prsnperd will exit with an error if it encounters left-censored data with the fev option. fev also
expects that no data are middle-censored (i.e. all time periods have been observed for each individual between the
study's beginning of time and either the first occurence of the event, or right-censoring).
copyleft prsnperd is free software, licensed under the GPL. The copyleft option displays the copying permission statement
for prsnperd which is a part of the dthaz package. The full license can be obtained by typing:
. net describe dthaz, from (http://www.alexisdinno.com/stata)
and clicking on the click here to get link for the ancillary file.
Examples
. prsnperd id length censored
. prsnperd id length censored, truncate(8)
. prsnperd id, tvp(predictor) fev(event)
Author
Alexis Dinno
Portland State University
alexis dot dinno at pdx dot edu
Please contact me with any questions, bug reports or suggestions for improvement.
My thanks to Dr. Suzanne Graham, Dr. Jim Stiles, and Dr Anna Song.
References
Singer JD and Willett JB. 2003. Applied Longitudinal Data Analysis: Modeling Change and Event Occurence. Oxford, UK: Oxford
University Press. 672 pages.
Willet JB and Singer JD. 1991. "From Whether to When: New Methods for Studying Student Dropout and Teacher Attrition." Review of
Educational Research. 61: 407-450
Singer JD and Willett JB. 1991. "Modeling the Days of Our Lives: Using Survival Analysis When Designing and Analyzing
Longitudinal Studies of Duration and Timing of Events." Psychological Bulletin. 110: 268-290
Also See
Help: dthaz, msdthaz