Dear Statalist,
The following table shows data from schoolchildren on the proportion of their friends who are smokers and whether or not they themselves have become regular smokers.
Proportion of friends who smoke Regular smoker (dependent variable)
No (0) Yes (1)
1 None 301 0
2 Some/a few 373 14
3 All 837 61
A 3-category independent variable representing these data was used in a regression model (xi: logistic) to predict the likelihood of schoolchildren becoming regular smokers. Using 1 as the base category, Stata returned extremely high odds ratios for 2 and 3 and noted "301 failures and 0 successes completely determined". If 2 or 3 is used as the base category, 1 is dropped from the model with a note "!=0 predicts failure perfectly".
It's important to compare pupils who say all their friends are smokers (3) and those who say some/a few friends are smokers (2) , with those who say none of their friends are smokers (1) * not just to compare the risks of pupils in categories 2 and 3 - so that the variable can be used later to adjust for these data in a multivariable model, using all the observations.
Is it acceptable to choose one observation from category 1 at random, and code it 1 for outcome, in order to obtain odds ratio estimates for categories 2 and 3?
Or does anyone know of another trick which will allow us to obtain reasonable estimates and allow a multivariable model to be estimated using all observations?
Many thanks in advance
Heather
Heather Rothwell
Research Associate
Cardiff Institute of Society, Health and Ethics
Cardiff University School of Social Sciences
53 Park Place
Cardiff
CF10 3AT.
Tel: +44 (0) 29 2087 0192
Fax: +44 (0) 29 2087 9054
Email: [email protected]
web: http://www.cf.ac.uk/socsi/cishe/index.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/