Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date
Mon, 27 Sep 2010 17:01:27 -0400
Please resend with a new subject heading. I have no expertise in this
area and those who do will not necessarily see your post.
On Mon, Sep 27, 2010 at 4:16 PM, Arka Roy Chaudhuri <[email protected]> wrote:
> Dear Steve,
> Now I am having some problems in estimating a IV regression.It would be
> great if you could please help me with my problem.
>
> I have the following variables in my data set:districtid, average
> residual gender wage gap in a district(avggap), scaled district
> tariff(district_tariff_scaled), unscaled district
> tariffs(district_tariff_unscaled), a set of district
> dummies(_Idistricti*), a time dummy since I have two time
> periods(time), district population(district_popn). I am interested in
> looking at the effect of scaled district tariffs on the average
> residual gender wage gap using the unscaled district tariffs as
> instruments for district tariffs. I run the following 3 regressions(I
> use the district population as weights and cluster over districts to
> correct for standard errors):-
>
> 1)regress avggap district_tariff_scaled time _Idistricti*
> [aweight=district_popn],cluster(districtid)
> In this regression I look at the structural equation i.e the effect of
> scaled district tariffs on average gender wage gap. I do not get any
> error in this case.
>
> 2)regress avggap district_tariff_unscaled time _Idistricti*
> [aweight=district_popn],cluster(districtid)
> In this regression I l look at the reduced form relationship between
> unscaled tariffs and the average gender wage gap. I do not get any
> error in this case.
>
> 3)ivregress 2sls avggap (district_tariff_scaled
> =district_tariff_unscaled) time _Idistricti*
> [aweight=district_popn],cluster(districtid)
> This is the equation that I have problem estimating.I use the unscaled
> tariffs as instruments for the scaled tariffs.However Stata gives me
> the following error:
>
> ivregress 2sls avggap (district_tariff_scaled
> =district_tariff_unscaled) time _Idistricti*
> [aweight=district_popn],cluster(districtid)
> (sum of wgt is 0.0000e+00)
> no observations
> r(2000);
>
> Surprisingly if I estimate the third equation without clustering over
> the districts Stata gives me results without any error.I tried using
> the vce option instead of the cluster option but I get the same error.
> I do not understand why clustering over districts does not create any
> problem in the estimation of the first two equations while it returns
> an error while I am estimating the 3rd equation. Since I am using a
> difference in difference approach it is essential that I cluster over
> district. I am using Stata11.
>
> I will be really grateful if you could help me out with this problem.Thanks
>
> Regards,
> Arka
>
> On Mon, Sep 27, 2010 at 6:18 AM, Steve Samuels <[email protected]> wrote:
>> You are welcome, Arka. áThe 50% RSE criterion I've seen is a worst
>> case; 30% would be more believable.
>>
>> Steve
>>
>> On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <[email protected]> wrote:
>>> Dear Steve,
>>>
>>> á á Thanks for all your suggestions. I have already ensured that I
>>> have adequate number of observations in each district-industry cell. I
>>> will also look at the relative standard error criterion.Once again
>>> thanks a lot for your help.
>>>
>>> Regards,
>>> Arka
>>>
>>>
>>>
>>>
>>> áFri, Sep 24, 2010 at 2:27 PM, Steve Samuels <[email protected]> wrote:
>>>> Well, there will be numbers for up to 196,000 cells. ámany will be
>>>> empty because of missing data; I would hesitate to call the remainder
>>>> "estimates' áunless the standard errors are reasonable and they were
>>>> based on >10 -20 observations in the category.
>>>>
>>>> I have seen designs in which sum-of-weights estimates were worthless
>>>> for estimating population totals, even with large sample sizes. áPPS
>>>> designs are less vulnerable to this kind of problem.
>>>>
>>>> Survey organizations generally have policies for suppressing
>>>> estimates based on small sample sizes. Perhaps there is a standard
>>>> practice in your field. I suggest that, in each district, you screen
>>>> the industries present in the sample for a minimum number of
>>>> individuals, say 10-20, and report proper survey estimates, with
>>>> standard errors, and sample n's only for those. You can group smaller
>>>> industries ágroups to meet these criteria.. The relative standard
>>>> error (SE/estimate) x 100% áis another criterion people use for
>>>> suppressing estimates, and I've seen áRSE's of 50% used as a maximum.
>>>>
>>>> Good luck!
>>>>
>>>> Steve
>>>>
>>>> Steven J. Samuels
>>>> [email protected]
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:á á 206-202-4783
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Arka Roy Chaudhuri <[email protected]>
>>>> Date: Fri, Sep 24, 2010 at 4:03 PM
>>>> Subject: Re: st: R: Estimating the number of workers in each industry
>>>> in each district - flag: Stata 9/2 SE
>>>> To: [email protected]
>>>>
>>>>
>>>> Dear Steve,
>>>>
>>>> á Thanks a lot for all your advice.The problem is that in my dataset
>>>> I have about 490 industries and 400 districts. Both industries and
>>>> districts come with a code identifying them.I used the following
>>>> command to estimate the number of workers in each industry in a
>>>> district:
>>>>
>>>> bysort districtid industryid:egen workers=total(weight) /*here weight
>>>> represents the inverse of probability of the household being sampled*/
>>>> duplicates drop districtid industryid,force
>>>> keep ádistrictid industryid workers
>>>> save"T:\arka\industry_district.dta",
>>>>
>>>>
>>>> Is the above estimation strategy leaving aside the issue of -svyset-
>>>> my data? Please advice.
>>>>
>>>> Arka
>>>>
>>>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <[email protected]> wrote:
>>>>> My advice about handling household counts of workers was wrong. Do not expand.
>>>>>
>>>>> Say you have counts for the number of workers in the hh áin three industries
>>>>>
>>>>> n_agriculture
>>>>> n_service
>>>>> n_sales
>>>>>
>>>>> Then you would use do a separate command for each industry, for example:
>>>>> *********************************************
>>>>> levelsof district, local(districts)
>>>>> foreach x of álocal districts{
>>>>> svy: total n_agriculture if district==`x'
>>>>> }
>>>>> ***********************************************
>>>>> You would use this form rather than an -over()- áor -subpop()- option,
>>>>> because districts are sampling strata.
>>>>>
>>>>> -Steve
>>>>>
>>>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <[email protected]> wrote:
>>>>>> Arka-
>>>>>>
>>>>>> Based on your description, you would -svyset- your data as follows:
>>>>>>
>>>>>> Define a variable (call it "psu" for "primary sampling unit") which is
>>>>>> the village number (rural sector) or urban block( urban sector)
>>>>>>
>>>>>>
>>>>>> then
>>>>>> ********************************************************
>>>>>> svyset psu [pw = your weight], strata(district)
>>>>>> ***********************************************************
>>>>>>
>>>>>> If your data has one line per person, with "industry" categorized
>>>>>>
>>>>>> then the command for totals might be
>>>>>>
>>>>>> *****************************************************
>>>>>> svy: tab district industry, count se format(%10.0fc)
>>>>>> *****************************************************
>>>>>>
>>>>>> If your data has only counts of workers in each industry in each HH,
>>>>>> then you should -expand- the data first so that it has one line for
>>>>>> each worker in the HH, e.g.
>>>>>>
>>>>>> *************
>>>>>> expand hhsize
>>>>>> *************
>>>>>>
>>>>>> (but that might include children, so you will have to take some care)
>>>>>>
>>>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>>>>>> you are a student, I suggest that you seek guidance from a faculty
>>>>>> member who is experienced in surveys, if not in Stata. (I know that
>>>>>> the Department of Statistics at UBC has a survey sampling course). I
>>>>>> also suggest that you obtain a text to learn about sampnling, such as
>>>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). áI also
>>>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>>>>>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>>>>>
>>>>>> Best wishes,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Steven J. Samuels
>>>>>> [email protected]
>>>>>> 18 Cantine's Island
>>>>>> Saugerties NY 12477
>>>>>> USA
>>>>>> Voice: 845-246-0774
>>>>>> Fax:á á 206-202-4783
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>> áThanks for the help. In my dataset all the districts in the target
>>>>>>> population are include. The sampling design is stratified multi-stage
>>>>>>> design with the first stage units being villages in the rural sector
>>>>>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>>>>>> are households in both the sectors.
>>>>>>>
>>>>>>> á I only have one set of weights that comes with the data. The
>>>>>>> documentation states that the weights represent the probability that
>>>>>>> the particular household was included in the sample. áPlease let me
>>>>>>> know if I should include any other information. I am really thankful
>>>>>>> for all the help.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Arka
>>>>>>>
>>>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Arka-
>>>>>>>>
>>>>>>>> I can't answer áwithout more information about the sampling design.
>>>>>>>> Please describe the design in detail, including answers to the
>>>>>>>> following questin..
>>>>>>>>
>>>>>>>> 1. Were all districts in the target population included in the sample?
>>>>>>>> Or, were districts sampled?
>>>>>>>>
>>>>>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>>>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>>>>>> "raking") áso that the sample results will better reflect population
>>>>>>>> census proportions? If the weights are so adjusted, áare the original
>>>>>>>> sampling weights available to you?
>>>>>>>>
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>> Steven J. Samuels
>>>>>>>> [email protected]
>>>>>>>> 18 Cantine's Island
>>>>>>>> Saugerties NY 12477
>>>>>>>> USA
>>>>>>>> Voice: 845-246-0774
>>>>>>>> Fax:á á 206-202-4783
>>>>>>>>
>>>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <[email protected]> wrote:
>>>>>>>> > Arka wrote:
>>>>>>>> > "Now I want to estimate the number of workers
>>>>>>>> > belonging to each industry in a particular district"
>>>>>>>> >
>>>>>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>>>>>> > aside survey technicalities):
>>>>>>>> >
>>>>>>>> > ---------------------code begins------------------------------------
>>>>>>>> > drop _all
>>>>>>>> > set obs 100
>>>>>>>> > g Workers=_n
>>>>>>>> > g District="East" in 1/50
>>>>>>>> > replace District="West" in 51/100
>>>>>>>> > g Industry="Concrete" in 1/30
>>>>>>>> > replace áIndustry="Steel" in 31/100
>>>>>>>> > g A= 1 if áDistrict=="East" & áIndustry=="Steel"
>>>>>>>> > g B= 1 if áDistrict=="West" & áIndustry=="Steel"
>>>>>>>> > g C= 1 if áDistrict=="East" & áIndustry=="Concrete"
>>>>>>>> > ---------------------code ends------------------------------------
>>>>>>>> >
>>>>>>>> > HTH and Kind Regards,
>>>>>>>> > Carlo
>>>>>>>> > -----Messaggio originale-----
>>>>>>>> > Da: [email protected]
>>>>>>>> > [mailto:[email protected]] Per conto di Arka Roy
>>>>>>>> > Chaudhuri
>>>>>>>> > Inviato: mercoledý 15 settembre 2010 9.24
>>>>>>>> > A: [email protected]
>>>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>>>>>> > district
>>>>>>>> >
>>>>>>>> > Dear All,
>>>>>>>> > áááááá I have a data set which has information at the individual
>>>>>>>> > level.I have variables which record the district of residence of the
>>>>>>>> > individual, the industry of employment of the individual and other
>>>>>>>> > demographic characterstics.The data set also comes with weights which
>>>>>>>> > represents the probability that a particular household is included in
>>>>>>>> > the sample.Thus all individuals belonging to a particular household
>>>>>>>> > get the same weight.Now I want to estimate the number of workers
>>>>>>>> > belonging to each industry in a particular district.Could anyone
>>>>>>>> > please advice on the correct stata code that I should write to get my
>>>>>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>>>>>> > on the possible biases that might affect my estimates at the
>>>>>>>> > industry-district level.I would really appreciate any help in this
>>>>>>>> > regard.Thanks
>>>>>>>> >
>>>>>>>> > Regards,
>>>>>>>> > Arka
>>>>>>>> > --
>>>>>>>> > Arka Roy Chaudhuri
>>>>>>>> > PhD Student
>>>>>>>> > University of British Columbia
>>>>>>>> > 997-1873 East Mall
>>>>>>>> > Vancouver
>>>>>>>> > Canada
>>>>>>>> > Ph: +1 (604) 349-8283
>>>>>>>> > Email: [email protected]
>>>>>>>> >
>>>>>>>> > *
>>>>>>>> > * á For searches and help try:
>>>>>>>> > * á http://www.stata.com/help.cgi?search
>>>>>>>> > * á http://www.stata.com/support/statalist/faq
>>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > *
>>>>>>>> > * á For searches and help try:
>>>>>>>> > * á http://www.stata.com/help.cgi?search
>>>>>>>> > * á http://www.stata.com/support/statalist/faq
>>>>>>>> > * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>> >
>>>>>>>>
>>>>>>>> *
>>>>>>>> * á For searches and help try:
>>>>>>>> * á http://www.stata.com/help.cgi?search
>>>>>>>> * á http://www.stata.com/support/statalist/faq
>>>>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>> *
>>>>>>> * á For searches and help try:
>>>>>>> * á http://www.stata.com/help.cgi?search
>>>>>>> * á http://www.stata.com/support/statalist/faq
>>>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>
>>>>>
>>>>> *
>>>>> * á For searches and help try:
>>>>> * á http://www.stata.com/help.cgi?search
>>>>> * á http://www.stata.com/support/statalist/faq
>>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>> *
>>>> * á For searches and help try:
>>>> * á http://www.stata.com/help.cgi?search
>>>> * á http://www.stata.com/support/statalist/faq
>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * á For searches and help try:
>>>> * á http://www.stata.com/help.cgi?search
>>>> * á http://www.stata.com/support/statalist/faq
>>>> * á http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> * á For searches and help try:
>>> * á http://www.stata.com/help.cgi?search
>>> * á http://www.stata.com/support/statalist/faq
>>> * á http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> * á For searches and help try:
>> * á http://www.stata.com/help.cgi?search
>> * á http://www.stata.com/support/statalist/faq
>> * á http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/