Sharon Minnick <[email protected]> is working with survival data from a
complex survey, and is having trouble with -svy: stcox-:
> In Stata10, I have been attempting to analyze my survival data with cox
> regression while accounting for the sampling design and I always get
> this error:
>
> the stset ID variable is not nested within the final stage sampling unit
>
> Originally I thought this might be because some of our subjects moved
> during the study and so they change strata and cluster from where they
> were originally sampled. But I just tried creating a temporary variable
> that holds the strata constant for each subject and dropped the cluster
> variables and I still get the same error.
>
> So my srvyset code is:
>
> svyset _n, strata(staticzone) vce(linearized) singleunit(missing)
>
> and the cox regression code is:
>
> svy: stcox agegrp male
>
> If I change the svyset code to use my subject ID variable instead of _n,
> then I get this error:
>
> no observations;
> stset and subpop() option identify disjoint subsets of the data
>
> which I don't understand since I am not using a subpop option.
The -svy- prefix, when used with -stcox- or -streg-, requires that subjects
with multiple records be contained within the final stage clusters. Thus
subjects are not allowed to belong to more than one cluster.
Although Sharon did not show us the -stset- command she used, it appears that
it looked something like
. stset time, id(subject) failure(failed) ...
where 'time', 'subject', and 'failed' represent the names of Stata variables
Sharon used to -stset- her data.
Given the above -stset-, Sharon should use the 'subject' variable instead of
'_n' to identify the final stage units. If Sharon's has data from a
single-stage survey design, this means that the -svyset- command should look
like
. svyset subject, strata(staticzone) vce(linearized) singleunit(missing)
Using '_n' implies that the records were sampled in the first stage,
which cannot be true given the above -stset-.
After changing her -svyset- to use the 'subject' variable, Sharon was
presented with the following error message:
no observations;
stset and subpop() option identify disjoint subsets of the data
This indicates that -svy- was left with 'no observations' after removing
observations with missing values and checking for subpopulation
specifications.
In addition to the -subpop()- option, -svy- uses the following options of
-stset- to identify the subpopulation:
if(exp)
ever(exp)
never(exp)
after(exp)
before(exp)
Without Sharon's dataset, we can't say definitively what is going on.
However, comparing the results from the following two commands
. stset
. svydes agegrp male
should indicate how many observations are being dropped because of missing
values. The 'no observations' error message will result if the subpopulation
specification identifies only observations that contain missing values in the
variables of interest (that is: time, subject, failed, staticzone, agegrp, and
male).
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/