I imagine that experts will be able to look
at this helpfully. I just want to pick up
on one incidental detail, as the point
is of wider interest. You have code
gen year=1 if family_id<=20
replace year=2 if family_id>20 & family_id<=40
replace year=3 if family_id>40 & family_id<=60
replace year=4 if family_id>60 & family_id<=80
replace year=5 if family_id>80 & family_id<=100
replace year=6 if family_id>100 & family_id<=120
replace year=7 if family_id>120 & family_id<=140
replace year=8 if family_id>140 & family_id<=160
replace year=9 if family_id>160 & family_id<=180
replace year=10 if family_id>180 & family_id<=200
replace year=11 if family_id>200 & family_id<=220
replace year=12 if family_id>220 & family_id<=240
replace year=13 if family_id>240 & family_id<=260
replace year=14 if family_id>260 & family_id<=280
replace year=15 if family_id>280 & family_id<=300
This could boil down to
gen year = ceil(family_id/20)
The tiny but useful trick here is that -ceil()-, short
for ceiling, always rounds up to the next integer.
-ceil()- has a sibling, -floor()-, which always rounds
down.
There is a long-winded excursus on this one point in
SJ-3-4 dm0002 . . . . . . . . Stata tip 2: Building with floors and ceilings
Q4/03 SJ 3(4):446--447
but the simple definition and memorable terminology (due to Kenneth E.
Iverson) are sufficient to give this an edge over, say,
solutions with -int()-.
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Peter Wright
> Sent: 09 November 2005 10:07
> To: [email protected]
> Subject: st: Nested logit with shares/grouped data
>
>
> In response to Nick's comment I have added a bit more
> explanation of what I have attempted below. To remind you of
> my problem, the question is how do you estimate a nested
> logit model in STATA when your left hand side variable takes
> the form of a count (or a market share). i.e. the dataset
> records how many sales of each product are made in each time period.
>
> The stata web site offers advice for a multinomial logit model:
>
> http://www.stata.com/support/faqs/stat/grouped.html
>
> This advice suggests first putting your data in "long" form
> and then using frequency weights (fweights) with the mlogit
> command. The question is, is such a procedure suitable in the
> case of a nested logit model?
>
> To implement the model, I ran the following code:
>
> *****************************************************************
> * As an example I use the STATA restaurant data.
> * However, I collapse it to make it look like a dataset of
> shares/grouped data
> * The collapsed dataset has information on the choices made
> by 20 households
> * regarding 7 restaurants. 15 yearly samples are taken
> *****************************************************************
>
> clear
> use restaurant.dta
>
> gen year=1 if family_id<=20
> replace year=2 if family_id>20 & family_id<=40
> replace year=3 if family_id>40 & family_id<=60
> replace year=4 if family_id>60 & family_id<=80
> replace year=5 if family_id>80 & family_id<=100
> replace year=6 if family_id>100 & family_id<=120
> replace year=7 if family_id>120 & family_id<=140
> replace year=8 if family_id>140 & family_id<=160
> replace year=9 if family_id>160 & family_id<=180
> replace year=10 if family_id>180 & family_id<=200
> replace year=11 if family_id>200 & family_id<=220
> replace year=12 if family_id>220 & family_id<=240
> replace year=13 if family_id>240 & family_id<=260
> replace year=14 if family_id>260 & family_id<=280
> replace year=15 if family_id>280 & family_id<=300
>
> collapse (sum) chosen (mean) income kids cost rating
> distance, by(restaurant year)
> rename chosen sales
>
> sort year
> by year: egen total_sales=sum(sales)
> gen market_share=sales/total_sales
>
> **************************************************************
> ************************
> * the dataset has information on sales (and sale-shares) as well
> * as some explanatory variables
> * There are 20 households choosing between 7 restaurants.
> * The sample is repeated for 20 years.
> * This is the kind of dataset that I had in mind.
> * How would you run a nested logit model using such
> shares/grouped data?
> **************************************************************
> ************************
> * If we follow a similar methodology to that suggested by for
> the multinomial model,
> * we need to expand the data so that it has 7*7 rows for each year
> expand 7
> sort year restaurant
>
> * number the choices 1 to 7
> egen alt_id=fill(1 2 3 4 5 6 7 1 2 3 4 5 6 7)
>
> * create an artificial chosen variable which is one for each
> restaurant in turn (zero for the others)
> sort year alt_id restaurant
> gen chosen=0
> by year alt_id: replace chosen=1 if _n==alt_id
>
> * you also need a weighting variable to tell stata how many
> times each restaurant was
> * chosen (from the group of 7)
> replace sales=. if chosen==0
> by year alt_id: egen sales2=mean(sales)
> gen alt_id2=10*year+alt_id
>
> * You can see that the dataset now looks very much like one
> based on individual data.
> * The only difference is that the sample will be weighted by sales2
>
> gen type=0
> replace type=1 if restaurant==1| restaurant==2
> replace type=2 if restaurant==3| restaurant==4| restaurant==5
> replace type=3 if restaurant==6| restaurant==7
>
> * Now specify your nested logit model and run
> gen incFast=(type==1)*income
> gen incFancy=(type==3)*income
> gen kidFast=(type==1)*kids
> gen kidFancy=(type==3)*kids
>
> nlogit chosen (restaurant = cost rating distance)
> (type=incFast incFancy kidFast kidFancy) [fweight=sales2],
> group(alt_id2)
>
> *******************************************************************
> This procedure yields the following results:
>
> Nested logit regression
> Levels = 2 Number of obs
> = 2100
> Dependent variable = chosen LR chi2(10)
> = -676.381
> Log likelihood = -513.32241 Prob > chi2
> = 1.0000
>
> --------------------------------------------------------------
> ----------------
> | Coef. Std. Err. z P>|z|
> [95% Conf. Interval]
> -------------+------------------------------------------------
> ----------------
> restaurant |
> cost | -.2347816 .1384955 -1.70 0.090
> -.5062277 .0366645
> rating | .3833214 .2482818 1.54 0.123
> -.1033021 .8699449
> distance | -.3779229 .2466483 -1.53 0.125
> -.8613448 .1054989
> -------------+------------------------------------------------
> ----------------
> type |
> incFast | .0054128 .069671 0.08 0.938
> -.1311398 .1419654
> incFancy | .0715661 .0505795 1.41 0.157
> -.0275679 .1707
> kidFast | -.5918203 .6533741 -0.91 0.365
> -1.87241 .6887694
> kidFancy | -.6183388 .5423909 -1.14 0.254
> -1.681405 .4447279
> -------------+------------------------------------------------
> ----------------
> (incl. value |
> parameters) |
> type |
> /type1 | 3.94913 3.423767 1.15 0.249
> -2.761329 10.65959
> /type2 | 2.633478 2.804631 0.94 0.348
> -2.863497 8.130453
> /type3 | 1.281784 .7357307 1.74 0.081
> -.1602222 2.723789
> --------------------------------------------------------------
> ----------------
> LR test of homoskedasticity (iv = 1): chi2(3)= -680.30
> Prob > chi2 = 1.0000
> --------------------------------------------------------------
> ----------------
>
> In an attempt to check these results I checked them against
> LIMDEP NLOGIT (which claims to be able to cope with
> shares/grouped data) I get different results.
>
> Normal exit from iterations. Exit status=0.
> +---------------------------------------------+
> | FIML: Nested Multinomial Logit Model |
> | Maximum Likelihood Estimates |
> | Dependent variable SALES |
> | Weighting variable ONE |
> | Number of observations 105 |
> | Iterations completed 5 |
> | Log likelihood function -524.2610 |
> | Restricted log likelihood -592.6711 |
> | Chi-squared 136.8201 |
> | Degrees of freedom 10 |
> | Significance level .0000000 |
> | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj |
> | No coefficients -592.6711 .11543 .00486 |
> | Constants only. Must be computed directly. |
> | Use NLOGIT ;...; RHS=ONE $ |
> | At start values -527.7727 .00665 -.11751 |
> | Response data are given as frequencies. |
> +---------------------------------------------+
>
> +---------------------------------------------+
> | FIML: Nested Multinomial Logit Model |
> | The model has 2 levels. |
> | Coefs. for branch level begin with B5 |
> | Number of obs.= 15, skipped 0 bad obs. |
> +---------------------------------------------+
> +---------+--------------+----------------+--------+---------+
> ----------+
> |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z]
> | Mean of X|
> +---------+--------------+----------------+--------+---------+
> ----------+
> Attributes in the Utility Functions
> B2 -.1827804310 .22258969E-01 -8.212 .0000
> B3 .5317320087 .13437809 3.957 .0001
> B4 .5306557987 .13130855 4.041 .0001
> Attributes of Branch Choice Equations
> B5 -.5031389578E-01 .94708096E-01 -.531 .5952
> B6 .6830782466 1.5458378 .442 .6586
> B7 .2425946290E-01 .44919700E-01 .540 .5892
> B8 -.4414619828 .66479462 -.664 .5067
> Inclusive Value Parameters
> TYPE1 .9859433268 .25540989 3.860 .0001
> TYPE2 1.054564664 .25223344 4.181 .0000
> TYPE3 .9397231335 .27169252 3.459 .0005
>
> Is this because you cannot proceed as I suggest above (or
> because LIMDEP is wrong)? (Incidentally I think the stata
> results are more likely to be correct as the t-ratios appear
> too high in LIMDEP).
>
>
> This message has been checked for viruses but the contents of
> an attachment
> may still contain software viruses, which could damage your
> computer system:
> you are advised to perform your own checks. Email
> communications with the
> University of Nottingham may be monitored as permitted by UK
> legislation.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/