Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Arka Roy Chaudhuri <gabuisi@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE |
Date | Mon, 27 Sep 2010 13:16:46 -0700 |
Dear Steve, Now I am having some problems in estimating a IV regression.It would be great if you could please help me with my problem. I have the following variables in my data set:districtid, average residual gender wage gap in a district(avggap), scaled district tariff(district_tariff_scaled), unscaled district tariffs(district_tariff_unscaled), a set of district dummies(_Idistricti*), a time dummy since I have two time periods(time), district population(district_popn). I am interested in looking at the effect of scaled district tariffs on the average residual gender wage gap using the unscaled district tariffs as instruments for district tariffs. I run the following 3 regressions(I use the district population as weights and cluster over districts to correct for standard errors):- 1)regress avggap district_tariff_scaled time _Idistricti* [aweight=district_popn],cluster(districtid) In this regression I look at the structural equation i.e the effect of scaled district tariffs on average gender wage gap. I do not get any error in this case. 2)regress avggap district_tariff_unscaled time _Idistricti* [aweight=district_popn],cluster(districtid) In this regression I l look at the reduced form relationship between unscaled tariffs and the average gender wage gap. I do not get any error in this case. 3)ivregress 2sls avggap (district_tariff_scaled =district_tariff_unscaled) time _Idistricti* [aweight=district_popn],cluster(districtid) This is the equation that I have problem estimating.I use the unscaled tariffs as instruments for the scaled tariffs.However Stata gives me the following error: ivregress 2sls avggap (district_tariff_scaled =district_tariff_unscaled) time _Idistricti* [aweight=district_popn],cluster(districtid) (sum of wgt is 0.0000e+00) no observations r(2000); Surprisingly if I estimate the third equation without clustering over the districts Stata gives me results without any error.I tried using the vce option instead of the cluster option but I get the same error. I do not understand why clustering over districts does not create any problem in the estimation of the first two equations while it returns an error while I am estimating the 3rd equation. Since I am using a difference in difference approach it is essential that I cluster over district. I am using Stata11. I will be really grateful if you could help me out with this problem.Thanks Regards, Arka On Mon, Sep 27, 2010 at 6:18 AM, Steve Samuels <sjsamuels@gmail.com> wrote: > You are welcome, Arka. The 50% RSE criterion I've seen is a worst > case; 30% would be more believable. > > Steve > > On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >> Dear Steve, >> >> Thanks for all your suggestions. I have already ensured that I >> have adequate number of observations in each district-industry cell. I >> will also look at the relative standard error criterion.Once again >> thanks a lot for your help. >> >> Regards, >> Arka >> >> >> >> >> Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> Well, there will be numbers for up to 196,000 cells. many will be >>> empty because of missing data; I would hesitate to call the remainder >>> "estimates' unless the standard errors are reasonable and they were >>> based on >10 -20 observations in the category. >>> >>> I have seen designs in which sum-of-weights estimates were worthless >>> for estimating population totals, even with large sample sizes. PPS >>> designs are less vulnerable to this kind of problem. >>> >>> Survey organizations generally have policies for suppressing >>> estimates based on small sample sizes. Perhaps there is a standard >>> practice in your field. I suggest that, in each district, you screen >>> the industries present in the sample for a minimum number of >>> individuals, say 10-20, and report proper survey estimates, with >>> standard errors, and sample n's only for those. You can group smaller >>> industries groups to meet these criteria.. The relative standard >>> error (SE/estimate) x 100% is another criterion people use for >>> suppressing estimates, and I've seen RSE's of 50% used as a maximum. >>> >>> Good luck! >>> >>> Steve >>> >>> Steven J. Samuels >>> sjsamuels@gmail.com >>> 18 Cantine's Island >>> Saugerties NY 12477 >>> USA >>> Voice: 845-246-0774 >>> Fax: 206-202-4783 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ---------- Forwarded message ---------- >>> From: Arka Roy Chaudhuri <gabuisi@gmail.com> >>> Date: Fri, Sep 24, 2010 at 4:03 PM >>> Subject: Re: st: R: Estimating the number of workers in each industry >>> in each district - flag: Stata 9/2 SE >>> To: statalist@hsphsun2.harvard.edu >>> >>> >>> Dear Steve, >>> >>> Thanks a lot for all your advice.The problem is that in my dataset >>> I have about 490 industries and 400 districts. Both industries and >>> districts come with a code identifying them.I used the following >>> command to estimate the number of workers in each industry in a >>> district: >>> >>> bysort districtid industryid:egen workers=total(weight) /*here weight >>> represents the inverse of probability of the household being sampled*/ >>> duplicates drop districtid industryid,force >>> keep districtid industryid workers >>> save"T:\arka\industry_district.dta", >>> >>> >>> Is the above estimation strategy leaving aside the issue of -svyset- >>> my data? Please advice. >>> >>> Arka >>> >>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> My advice about handling household counts of workers was wrong. Do not expand. >>>> >>>> Say you have counts for the number of workers in the hh in three industries >>>> >>>> n_agriculture >>>> n_service >>>> n_sales >>>> >>>> Then you would use do a separate command for each industry, for example: >>>> ********************************************* >>>> levelsof district, local(districts) >>>> foreach x of local districts{ >>>> svy: total n_agriculture if district==`x' >>>> } >>>> *********************************************** >>>> You would use this form rather than an -over()- or -subpop()- option, >>>> because districts are sampling strata. >>>> >>>> -Steve >>>> >>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>> Arka- >>>>> >>>>> Based on your description, you would -svyset- your data as follows: >>>>> >>>>> Define a variable (call it "psu" for "primary sampling unit") which is >>>>> the village number (rural sector) or urban block( urban sector) >>>>> >>>>> >>>>> then >>>>> ******************************************************** >>>>> svyset psu [pw = your weight], strata(district) >>>>> *********************************************************** >>>>> >>>>> If your data has one line per person, with "industry" categorized >>>>> >>>>> then the command for totals might be >>>>> >>>>> ***************************************************** >>>>> svy: tab district industry, count se format(%10.0fc) >>>>> ***************************************************** >>>>> >>>>> If your data has only counts of workers in each industry in each HH, >>>>> then you should -expand- the data first so that it has one line for >>>>> each worker in the HH, e.g. >>>>> >>>>> ************* >>>>> expand hhsize >>>>> ************* >>>>> >>>>> (but that might include children, so you will have to take some care) >>>>> >>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As >>>>> you are a student, I suggest that you seek guidance from a faculty >>>>> member who is experienced in surveys, if not in Stata. (I know that >>>>> the Department of Statistics at UBC has a survey sampling course). I >>>>> also suggest that you obtain a text to learn about sampnling, such as >>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also >>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and >>>>> Berglund (2010); it uses Stata almost exclusively for its examples. >>>>> >>>>> Best wishes, >>>>> >>>>> Steve >>>>> >>>>> Steven J. Samuels >>>>> sjsamuels@gmail.com >>>>> 18 Cantine's Island >>>>> Saugerties NY 12477 >>>>> USA >>>>> Voice: 845-246-0774 >>>>> Fax: 206-202-4783 >>>>> >>>>> >>>>> >>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <gabuisi@gmail.com> wrote: >>>>>> Hi, >>>>>> Thanks for the help. In my dataset all the districts in the target >>>>>> population are include. The sampling design is stratified multi-stage >>>>>> design with the first stage units being villages in the rural sector >>>>>> and urban blocks in the urban sector. The ultimate stage units (USU) >>>>>> are households in both the sectors. >>>>>> >>>>>> I only have one set of weights that comes with the data. The >>>>>> documentation states that the weights represent the probability that >>>>>> the particular household was included in the sample. Please let me >>>>>> know if I should include any other information. I am really thankful >>>>>> for all the help. >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Arka >>>>>> >>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>>>>> >>>>>>> Arka- >>>>>>> >>>>>>> I can't answer without more information about the sampling design. >>>>>>> Please describe the design in detail, including answers to the >>>>>>> following questin.. >>>>>>> >>>>>>> 1. Were all districts in the target population included in the sample? >>>>>>> Or, were districts sampled? >>>>>>> >>>>>>> 2. Are the final sampling weights the probability sampling weights? Or >>>>>>> was there adjustment to the probabilithy weights (post-stratification, >>>>>>> "raking") so that the sample results will better reflect population >>>>>>> census proportions? If the weights are so adjusted, are the original >>>>>>> sampling weights available to you? >>>>>>> >>>>>>> >>>>>>> Steve >>>>>>> >>>>>>> Steven J. Samuels >>>>>>> sjsamuels@gmail.com >>>>>>> 18 Cantine's Island >>>>>>> Saugerties NY 12477 >>>>>>> USA >>>>>>> Voice: 845-246-0774 >>>>>>> Fax: 206-202-4783 >>>>>>> >>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <carlo.lazzaro@tin.it> wrote: >>>>>>> > Arka wrote: >>>>>>> > "Now I want to estimate the number of workers >>>>>>> > belonging to each industry in a particular district" >>>>>>> > >>>>>>> > A quite trivial example about Arka's issue may be the following one (set >>>>>>> > aside survey technicalities): >>>>>>> > >>>>>>> > ---------------------code begins------------------------------------ >>>>>>> > drop _all >>>>>>> > set obs 100 >>>>>>> > g Workers=_n >>>>>>> > g District="East" in 1/50 >>>>>>> > replace District="West" in 51/100 >>>>>>> > g Industry="Concrete" in 1/30 >>>>>>> > replace Industry="Steel" in 31/100 >>>>>>> > g A= 1 if District=="East" & Industry=="Steel" >>>>>>> > g B= 1 if District=="West" & Industry=="Steel" >>>>>>> > g C= 1 if District=="East" & Industry=="Concrete" >>>>>>> > ---------------------code ends------------------------------------ >>>>>>> > >>>>>>> > HTH and Kind Regards, >>>>>>> > Carlo >>>>>>> > -----Messaggio originale----- >>>>>>> > Da: owner-statalist@hsphsun2.harvard.edu >>>>>>> > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Arka Roy >>>>>>> > Chaudhuri >>>>>>> > Inviato: mercoledì 15 settembre 2010 9.24 >>>>>>> > A: statalist@hsphsun2.harvard.edu >>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each >>>>>>> > district >>>>>>> > >>>>>>> > Dear All, >>>>>>> > I have a data set which has information at the individual >>>>>>> > level.I have variables which record the district of residence of the >>>>>>> > individual, the industry of employment of the individual and other >>>>>>> > demographic characterstics.The data set also comes with weights which >>>>>>> > represents the probability that a particular household is included in >>>>>>> > the sample.Thus all individuals belonging to a particular household >>>>>>> > get the same weight.Now I want to estimate the number of workers >>>>>>> > belonging to each industry in a particular district.Could anyone >>>>>>> > please advice on the correct stata code that I should write to get my >>>>>>> > desired estimates?Also I would be grateful if somebody could advice me >>>>>>> > on the possible biases that might affect my estimates at the >>>>>>> > industry-district level.I would really appreciate any help in this >>>>>>> > regard.Thanks >>>>>>> > >>>>>>> > Regards, >>>>>>> > Arka >>>>>>> > -- >>>>>>> > Arka Roy Chaudhuri >>>>>>> > PhD Student >>>>>>> > University of British Columbia >>>>>>> > 997-1873 East Mall >>>>>>> > Vancouver >>>>>>> > Canada >>>>>>> > Ph: +1 (604) 349-8283 >>>>>>> > Email: gabuisi@gmail.com >>>>>>> > >>>>>>> > * >>>>>>> > * For searches and help try: >>>>>>> > * http://www.stata.com/help.cgi?search >>>>>>> > * http://www.stata.com/support/statalist/faq >>>>>>> > * http://www.ats.ucla.edu/stat/stata/ >>>>>>> > >>>>>>> > >>>>>>> > * >>>>>>> > * For searches and help try: >>>>>>> > * http://www.stata.com/help.cgi?search >>>>>>> > * http://www.stata.com/support/statalist/faq >>>>>>> > * http://www.ats.ucla.edu/stat/stata/ >>>>>>> > >>>>>>> >>>>>>> * >>>>>>> * For searches and help try: >>>>>>> * http://www.stata.com/help.cgi?search >>>>>>> * http://www.stata.com/support/statalist/faq >>>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>>> >>>>>> * >>>>>> * For searches and help try: >>>>>> * http://www.stata.com/help.cgi?search >>>>>> * http://www.stata.com/support/statalist/faq >>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>>> >>>>> >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/