If Andrew could send me (privately) a dataset and a short do-file to
replicate the problem, I would be happy to look into it.
-- Isabel
-- icanette(at)stata(dot)com
Andrew Hall wrote:
> Hello
>
> I am having problems using stset and sttocc to create a case-control
> data set from a simple cross-sectional data set. Sttocc is selecting
> cases as controls. I am using Intercooled STATA 9.2 on Windows XP.
> This is what is happening.
>
> I have cross-sectional survey data on 7,572 Ethiopian schoolchildren of
> whom 1,283 are orphans and 6,289 are non-orphans. The variable orphan
> is coded as 1 = orphan, 0 = non orphan.
>
> The data were collected between 28/11/2006 and 8/2/2007.
>
> I want to randomly select a non-orphan for each orphan matched on sex
> and age at least to create a case-control data set.
>
> I have stset the dataset using either date of visit (dov1) as the time
> variable or created a new time variable fixed on one arbitrary date e.g.
> 01/01/2007 (dovfixed)
>
> stset dov1, failure(orphan=1)
> or stset dovfixed, failure(orphan=1)
>
> This creates new temporary variables including _d which cross-tabs
> perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan =
> 0, n=6289). It seems that the dataset have been properly stset (or has
> it?).
>
> I then use sttocc to match each case to one control on the variables sex
> (1=male; 2=female) and ageyrs (in years) as follows:
>
> sttocc, match (sex ageyrs) number(1)
>
> This works and cannot find controls for 2 cases only.
>
> But when I do a cross-tab of orphan by _case I find that 278 controls
> who should be non-orphans have been selected from the cases (orphans,
> failure=1). All controls should be selected from the non-orphans.
>
> Snapspan has no effect on the dataset as all id numbers in the data set
> are unique anyway.
>
> Why are orphans (failure) being selected as controls for orphans
> (failure) when I have specified non-orphans? Am I just being dim? Is
> there something wrong the way I'm using stset? I have tried setting
> origin, enter and exit, but they are not really relevant as all subjects
> were in effect studied on the same day, so it is not time series data.
> I am something of a novice and can't find any similar issues discussed
> on the listserv archives, hence my request..
>
> Thanks for reading this and suggestions would be gratefully received.
>
>
> Andrew Hall MSc PhD RPHNutr
>
> Reader in Public Health Nutrition
> Centre for Public Health Nutrition
> Westminster University
> 115 New Cavendish Street
> London W1W 6UW
>
> Tel: + 44 (0)207 911 5000 Ext 3910
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/