In response to Nick's comment I have added a bit more explanation of what I have attempted below. To remind you of my problem, the question is how do you estimate a nested logit model in STATA when your left hand side variable takes the form of a count (or a market share). i.e. the dataset records how many sales of each product are made in each time period.
The stata web site offers advice for a multinomial logit model:
http://www.stata.com/support/faqs/stat/grouped.html
This advice suggests first putting your data in "long" form and then using frequency weights (fweights) with the mlogit command. The question is, is such a procedure suitable in the case of a nested logit model?
To implement the model, I ran the following code:
*****************************************************************
* As an example I use the STATA restaurant data.
* However, I collapse it to make it look like a dataset of shares/grouped data
* The collapsed dataset has information on the choices made by 20 households
* regarding 7 restaurants. 15 yearly samples are taken
*****************************************************************
clear
use restaurant.dta
gen year=1 if family_id<=20
replace year=2 if family_id>20 & family_id<=40
replace year=3 if family_id>40 & family_id<=60
replace year=4 if family_id>60 & family_id<=80
replace year=5 if family_id>80 & family_id<=100
replace year=6 if family_id>100 & family_id<=120
replace year=7 if family_id>120 & family_id<=140
replace year=8 if family_id>140 & family_id<=160
replace year=9 if family_id>160 & family_id<=180
replace year=10 if family_id>180 & family_id<=200
replace year=11 if family_id>200 & family_id<=220
replace year=12 if family_id>220 & family_id<=240
replace year=13 if family_id>240 & family_id<=260
replace year=14 if family_id>260 & family_id<=280
replace year=15 if family_id>280 & family_id<=300
collapse (sum) chosen (mean) income kids cost rating distance, by(restaurant year)
rename chosen sales
sort year
by year: egen total_sales=sum(sales)
gen market_share=sales/total_sales
**************************************************************************************
* the dataset has information on sales (and sale-shares) as well
* as some explanatory variables
* There are 20 households choosing between 7 restaurants.
* The sample is repeated for 20 years.
* This is the kind of dataset that I had in mind.
* How would you run a nested logit model using such shares/grouped data?
**************************************************************************************
* If we follow a similar methodology to that suggested by for the multinomial model,
* we need to expand the data so that it has 7*7 rows for each year
expand 7
sort year restaurant
* number the choices 1 to 7
egen alt_id=fill(1 2 3 4 5 6 7 1 2 3 4 5 6 7)
* create an artificial chosen variable which is one for each restaurant in turn (zero for the others)
sort year alt_id restaurant
gen chosen=0
by year alt_id: replace chosen=1 if _n==alt_id
* you also need a weighting variable to tell stata how many times each restaurant was
* chosen (from the group of 7)
replace sales=. if chosen==0
by year alt_id: egen sales2=mean(sales)
gen alt_id2=10*year+alt_id
* You can see that the dataset now looks very much like one based on individual data.
* The only difference is that the sample will be weighted by sales2
gen type=0
replace type=1 if restaurant==1| restaurant==2
replace type=2 if restaurant==3| restaurant==4| restaurant==5
replace type=3 if restaurant==6| restaurant==7
* Now specify your nested logit model and run
gen incFast=(type==1)*income
gen incFancy=(type==3)*income
gen kidFast=(type==1)*kids
gen kidFancy=(type==3)*kids
nlogit chosen (restaurant = cost rating distance) (type=incFast incFancy kidFast kidFancy) [fweight=sales2], group(alt_id2)
*******************************************************************
This procedure yields the following results:
Nested logit regression
Levels = 2 Number of obs = 2100
Dependent variable = chosen LR chi2(10) = -676.381
Log likelihood = -513.32241 Prob > chi2 = 1.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
restaurant |
cost | -.2347816 .1384955 -1.70 0.090 -.5062277 .0366645
rating | .3833214 .2482818 1.54 0.123 -.1033021 .8699449
distance | -.3779229 .2466483 -1.53 0.125 -.8613448 .1054989
-------------+----------------------------------------------------------------
type |
incFast | .0054128 .069671 0.08 0.938 -.1311398 .1419654
incFancy | .0715661 .0505795 1.41 0.157 -.0275679 .1707
kidFast | -.5918203 .6533741 -0.91 0.365 -1.87241 .6887694
kidFancy | -.6183388 .5423909 -1.14 0.254 -1.681405 .4447279
-------------+----------------------------------------------------------------
(incl. value |
parameters) |
type |
/type1 | 3.94913 3.423767 1.15 0.249 -2.761329 10.65959
/type2 | 2.633478 2.804631 0.94 0.348 -2.863497 8.130453
/type3 | 1.281784 .7357307 1.74 0.081 -.1602222 2.723789
------------------------------------------------------------------------------
LR test of homoskedasticity (iv = 1): chi2(3)= -680.30 Prob > chi2 = 1.0000
------------------------------------------------------------------------------
In an attempt to check these results I checked them against LIMDEP NLOGIT (which claims to be able to cope with shares/grouped data) I get different results.
Normal exit from iterations. Exit status=0.
+---------------------------------------------+
| FIML: Nested Multinomial Logit Model |
| Maximum Likelihood Estimates |
| Dependent variable SALES |
| Weighting variable ONE |
| Number of observations 105 |
| Iterations completed 5 |
| Log likelihood function -524.2610 |
| Restricted log likelihood -592.6711 |
| Chi-squared 136.8201 |
| Degrees of freedom 10 |
| Significance level .0000000 |
| R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj |
| No coefficients -592.6711 .11543 .00486 |
| Constants only. Must be computed directly. |
| Use NLOGIT ;...; RHS=ONE $ |
| At start values -527.7727 .00665 -.11751 |
| Response data are given as frequencies. |
+---------------------------------------------+
+---------------------------------------------+
| FIML: Nested Multinomial Logit Model |
| The model has 2 levels. |
| Coefs. for branch level begin with B5 |
| Number of obs.= 15, skipped 0 bad obs. |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Attributes in the Utility Functions
B2 -.1827804310 .22258969E-01 -8.212 .0000
B3 .5317320087 .13437809 3.957 .0001
B4 .5306557987 .13130855 4.041 .0001
Attributes of Branch Choice Equations
B5 -.5031389578E-01 .94708096E-01 -.531 .5952
B6 .6830782466 1.5458378 .442 .6586
B7 .2425946290E-01 .44919700E-01 .540 .5892
B8 -.4414619828 .66479462 -.664 .5067
Inclusive Value Parameters
TYPE1 .9859433268 .25540989 3.860 .0001
TYPE2 1.054564664 .25223344 4.181 .0000
TYPE3 .9397231335 .27169252 3.459 .0005
Is this because you cannot proceed as I suggest above (or because LIMDEP is wrong)? (Incidentally I think the stata results are more likely to be correct as the t-ratios appear too high in LIMDEP).
This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/