[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: help needed on discrete-time hazard model

From	"Lili Yan" <[email protected]>
To	[email protected]
Subject	Re: st: help needed on discrete-time hazard model
Date	Thu, 18 Oct 2007 20:40:50 -0400

Thanks a lot for your prompt response, Tom!



On 10/18/07, Steichen, Thomas J. <[email protected]> wrote:
> I see nothing wrong with the data generation steps you performed,
> so the question is whether this model makes sense.
>
> First, I will speculate that you have brand-specific prices at
> the time of each wave. Since cigarette prices tend to rise
> fairly uniformly between brands over time, either due to
> manufacturer price increases due to inflation or government tax
> increases, there is almost certainly a meaningful correlation
> between wave and price. Thus, having both a "price" variable and
> one or more "wave" variables will lead to confusion in the
> coefficients.
>
> In this model, the "wave2" variable can be thought of as estimating
> the average quit rate differential from the missing wave (wave 1)...
> and this includes an average price differential effect. Likewise,
> "wave3" estimates the average quit rate differential of wave 3 from
> wave 1.
>
> So what does "price" itself estimate in this model? I'd speculate
> it really only estimates how specific brands affect quitting.
> In your logit model, I'd guess that it indicates that subjects
> who smoke higher-than-average-priced brands quit at a lower rate.
> Said differently, those who smoke low-priced brands are more likely
> to quit due to a price increase. However, without knowing exactly
> what your variables represent, I can't go beyond speculation.
>
> I'm less clear why it remains negative when you take the wave
> variables out. If real, it implies that price differential (if
> it truly has a positive effect on quitting) wasn't great enough to
> overcome other, competing but correlated issues (not explained by
> any other variable in the model)that caused smokers to continue
> smoking during this time period. If so, price represents the
> increase in ALL of these issues and the ones for continued smoking
> dominated the result.
>
> On a different issue, using or not using the svy: prefix should
> change the estimated coefficients, so no particular importance
> should be placed on the fact that a coefficient changed signs
> between these two. Without the prefix, you are estimating what
> happened for the specific group of subjects surveyed in this study.
> When you add the weighting via the svy: prefix, you change the
> importance of those individual subjects based on their sampling
> weights.
>
> For example, you may have surveyed specific subjects who quit
> but represent only a very, very small part of the overall population.
> If you don't use the survey weights, their behavior may have
> a large effect on the sample results but little effect on the
> population results, even to the point of sign reversal.
>
> On yet another issue, marking pattern SQS as a successful "quit"
> seems possibly misleading. Clearly, if price continued to rise
> over the time period between waves (which seems likely to me),
> prices were higher in wave 3 than wave 2, yet these individuals
> started smoking again. This seems to suggest that price was not
> the most important motivating factor for quiting in wave 2 (or
> restatring in wave 3). One can argue that you should code these
> subjects as at "risk" for all three waves and as failing to quit.
>
> Tom
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Lili Yan
> Sent: Thursday, October 18, 2007 2:25 PM
> To: [email protected]
> Subject: Re: st: help needed on discrete-time hazard model
>
> Hi Thomas,
>
> Thank you very much for helping out!
> I know little about this model, so I thought the two zeros indicate
> something wrong in the data. The e(N) is correct, which I am sure.
>
> Here are some codes of setting up the data. I need explain first that
> smok_stat = 1 for SSS, 2 for SSQ, 3 for SQS and 4 for SQQ.
> ................codes start here................
>
> gen smk_time=3 if smok_stat==1 | smok_stat==2;
> replace smk_time=2 if smok_stat==3 | smok_stat==4;
>
> gen cessyear=2004 if smok_stat==1;
> replace cessyear=2004 if smok_stat==2;
> replace cessyear=2003 if (smok_stat==3 | smok_stat==4);
>
> expand smk_time;
> bysort uniqid: gen seqvar=_n;
> bysort uniqid: gen qtsmok=smok_stat>1 & _n==_N;
>
> bysort uniqid: gen evntyear=cessyear;
> replace evntyear=2002 if seqvar==1;
> replace evntyear=2003 if seqvar==2;
> drop cessyear;
> rename evntyear cessyear;
>
> gen wave=1 if cessyear==2002;
> replace wave=2 if cessyear==2003;
> replace wave=3 if cessyear==2004;
>
> gen wave1=wave==1;
> gen wave2=wave==2;
> gen wave3=wave==3;
>
> svy: logit qtsmok male age married white mdrt_educ high_educ incm_mdrt
> incm_high canada rPSPPPi wave2 wave3, noconstant
> ...............codes end here..........
>
> Here is the output:
>
> ..............output starts here................
> Survey: Logistic regression
>
> Number of strata   =        26                  Number of obs      =      5642
> Number of PSUs     =      5642                  Population size    = 5773.9291
>                                                Design df          =      5616
>                                                F(  12,   5605)    =    166.35
>                                                Prob > F           =    0.0000
>
> ------------------------------------------------------------------------------
>             |             Linearized
>      qtsmok |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>        male |  -.1715913   .1273081    -1.35   0.178    -.4211643    .0779817
>         age |  -.0326805   .0053098    -6.15   0.000    -.0430898   -.0222713
>     married |   .0156776   .1427494     0.11   0.913    -.2641663    .2955215
>       white |  -.5607068   .1443603    -3.88   0.000    -.8437088   -.2777048
>   mdrt_educ |  -.0291425   .1441877    -0.20   0.840    -.3118061    .2535212
>   high_educ |   .5113156   .1800797     2.84   0.005     .1582899    .8643414
>   incm_mdrt |  -.0339146   .1557743    -0.22   0.828    -.3392925    .2714632
>   incm_high |   .1405313   .1766122     0.80   0.426    -.2056968    .4867595
>      canada |   1.802811   .2552666     7.06   0.000      1.30239    2.303233
>     rPSPPPi |  -.0083975    .000842    -9.97   0.000    -.0100481   -.0067468
>       wave2 |   2.111112   .1326945    15.91   0.000     1.850979    2.371244
>       wave3 |   2.411039   .1389374    17.35   0.000     2.138668     2.68341
> ------------------------------------------------------------------------------
> ....................output ends here..............
>
> The rPSPPPi is our price variable. We have more price variables but
> logit results with them are similar to what reported here.
>
> Thank you very much!
>
> Lili
>
> On 10/18/07, Steichen, Thomas J. <[email protected]> wrote:
> > Why do you consider this an indication of something wrong?
> >
> > Having zero completely determined successes e(N_cds) and failures
> > e(N_cdf) is what you prefer.
> >
> > Is your overall # of  records e(N) wrong?
> >
> > Show us some sample commands and output so we can see what you are doing.
> >
> >
> > -----Original Message-----
> >
> > I checked the data just now. After running logit model with our
> > dependent variable, the stored results show:
> >
> > e(N) = 5463
> > e(N_cds) = 0
> > e(N_cdf) = 0
> >
> > So seems there is something wrong in the data setup. Could anyone
> > please give me some help?
> >
> >
> > -----------------------------------------
> > CONFIDENTIALITY NOTE: This e-mail message, including any
> > attachment(s), contains information that may be confidential,
> > protected by the attorney-client or other legal privileges, and/or
> > proprietary non-public information. If you are not an intended
> > recipient of this message or an authorized assistant to an intended
> > recipient, please notify the sender by replying to this message and
> > then delete it from your system. Use, dissemination, distribution,
> > or reproduction of this message and/or any of its attachments (if
> > any) by unintended recipients is not authorized and may be
> > unlawful.
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: help needed on discrete-time hazard model
  - From: Lili Yan <[email protected]>
- Re: st: help needed on discrete-time hazard model
  - From: "Lili Yan" <[email protected]>
- RE: st: help needed on discrete-time hazard model
  - From: "Steichen, Thomas J." <[email protected]>
- Re: st: help needed on discrete-time hazard model
  - From: "Lili Yan" <[email protected]>
- RE: st: help needed on discrete-time hazard model
  - From: "Steichen, Thomas J." <[email protected]>

Prev by Date: st: RE: allocating more memory to stata 10
Next by Date: st: missing t statistics
Previous by thread: RE: st: help needed on discrete-time hazard model
Next by thread: st: cross-sectional (retarded) averages
Index(es):
- Date
- Thread