Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date
Mon, 27 Sep 2010 09:18:22 -0400
You are welcome, Arka. The 50% RSE criterion I've seen is a worst
case; 30% would be more believable.
Steve
On Mon, Sep 27, 2010 at 2:14 AM, Arka Roy Chaudhuri <[email protected]> wrote:
> Dear Steve,
>
> Thanks for all your suggestions. I have already ensured that I
> have adequate number of observations in each district-industry cell. I
> will also look at the relative standard error criterion.Once again
> thanks a lot for your help.
>
> Regards,
> Arka
>
>
>
>
> Fri, Sep 24, 2010 at 2:27 PM, Steve Samuels <[email protected]> wrote:
>> Well, there will be numbers for up to 196,000 cells. many will be
>> empty because of missing data; I would hesitate to call the remainder
>> "estimates' unless the standard errors are reasonable and they were
>> based on >10 -20 observations in the category.
>>
>> I have seen designs in which sum-of-weights estimates were worthless
>> for estimating population totals, even with large sample sizes. PPS
>> designs are less vulnerable to this kind of problem.
>>
>> Survey organizations generally have policies for suppressing
>> estimates based on small sample sizes. Perhaps there is a standard
>> practice in your field. I suggest that, in each district, you screen
>> the industries present in the sample for a minimum number of
>> individuals, say 10-20, and report proper survey estimates, with
>> standard errors, and sample n's only for those. You can group smaller
>> industries groups to meet these criteria.. The relative standard
>> error (SE/estimate) x 100% is another criterion people use for
>> suppressing estimates, and I've seen RSE's of 50% used as a maximum.
>>
>> Good luck!
>>
>> Steve
>>
>> Steven J. Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax: 206-202-4783
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Arka Roy Chaudhuri <[email protected]>
>> Date: Fri, Sep 24, 2010 at 4:03 PM
>> Subject: Re: st: R: Estimating the number of workers in each industry
>> in each district - flag: Stata 9/2 SE
>> To: [email protected]
>>
>>
>> Dear Steve,
>>
>> Thanks a lot for all your advice.The problem is that in my dataset
>> I have about 490 industries and 400 districts. Both industries and
>> districts come with a code identifying them.I used the following
>> command to estimate the number of workers in each industry in a
>> district:
>>
>> bysort districtid industryid:egen workers=total(weight) /*here weight
>> represents the inverse of probability of the household being sampled*/
>> duplicates drop districtid industryid,force
>> keep districtid industryid workers
>> save"T:\arka\industry_district.dta",
>>
>>
>> Is the above estimation strategy leaving aside the issue of -svyset-
>> my data? Please advice.
>>
>> Arka
>>
>> On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <[email protected]> wrote:
>>> My advice about handling household counts of workers was wrong. Do not expand.
>>>
>>> Say you have counts for the number of workers in the hh in three industries
>>>
>>> n_agriculture
>>> n_service
>>> n_sales
>>>
>>> Then you would use do a separate command for each industry, for example:
>>> *********************************************
>>> levelsof district, local(districts)
>>> foreach x of local districts{
>>> svy: total n_agriculture if district==`x'
>>> }
>>> ***********************************************
>>> You would use this form rather than an -over()- or -subpop()- option,
>>> because districts are sampling strata.
>>>
>>> -Steve
>>>
>>> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <[email protected]> wrote:
>>>> Arka-
>>>>
>>>> Based on your description, you would -svyset- your data as follows:
>>>>
>>>> Define a variable (call it "psu" for "primary sampling unit") which is
>>>> the village number (rural sector) or urban block( urban sector)
>>>>
>>>>
>>>> then
>>>> ********************************************************
>>>> svyset psu [pw = your weight], strata(district)
>>>> ***********************************************************
>>>>
>>>> If your data has one line per person, with "industry" categorized
>>>>
>>>> then the command for totals might be
>>>>
>>>> *****************************************************
>>>> svy: tab district industry, count se format(%10.0fc)
>>>> *****************************************************
>>>>
>>>> If your data has only counts of workers in each industry in each HH,
>>>> then you should -expand- the data first so that it has one line for
>>>> each worker in the HH, e.g.
>>>>
>>>> *************
>>>> expand hhsize
>>>> *************
>>>>
>>>> (but that might include children, so you will have to take some care)
>>>>
>>>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>>>> you are a student, I suggest that you seek guidance from a faculty
>>>> member who is experienced in surveys, if not in Stata. (I know that
>>>> the Department of Statistics at UBC has a survey sampling course). I
>>>> also suggest that you obtain a text to learn about sampnling, such as
>>>> Sharon Lohr's "Sampling: Design and Analysis" (2009). I also
>>>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>>>> Berglund (2010); it uses Stata almost exclusively for its examples.
>>>>
>>>> Best wishes,
>>>>
>>>> Steve
>>>>
>>>> Steven J. Samuels
>>>> [email protected]
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax: 206-202-4783
>>>>
>>>>
>>>>
>>>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>>>>> Hi,
>>>>> Thanks for the help. In my dataset all the districts in the target
>>>>> population are include. The sampling design is stratified multi-stage
>>>>> design with the first stage units being villages in the rural sector
>>>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>>>> are households in both the sectors.
>>>>>
>>>>> I only have one set of weights that comes with the data. The
>>>>> documentation states that the weights represent the probability that
>>>>> the particular household was included in the sample. Please let me
>>>>> know if I should include any other information. I am really thankful
>>>>> for all the help.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Arka
>>>>>
>>>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <[email protected]> wrote:
>>>>>>
>>>>>> Arka-
>>>>>>
>>>>>> I can't answer without more information about the sampling design.
>>>>>> Please describe the design in detail, including answers to the
>>>>>> following questin..
>>>>>>
>>>>>> 1. Were all districts in the target population included in the sample?
>>>>>> Or, were districts sampled?
>>>>>>
>>>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>>>> "raking") so that the sample results will better reflect population
>>>>>> census proportions? If the weights are so adjusted, are the original
>>>>>> sampling weights available to you?
>>>>>>
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Steven J. Samuels
>>>>>> [email protected]
>>>>>> 18 Cantine's Island
>>>>>> Saugerties NY 12477
>>>>>> USA
>>>>>> Voice: 845-246-0774
>>>>>> Fax: 206-202-4783
>>>>>>
>>>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <[email protected]> wrote:
>>>>>> > Arka wrote:
>>>>>> > "Now I want to estimate the number of workers
>>>>>> > belonging to each industry in a particular district"
>>>>>> >
>>>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>>>> > aside survey technicalities):
>>>>>> >
>>>>>> > ---------------------code begins------------------------------------
>>>>>> > drop _all
>>>>>> > set obs 100
>>>>>> > g Workers=_n
>>>>>> > g District="East" in 1/50
>>>>>> > replace District="West" in 51/100
>>>>>> > g Industry="Concrete" in 1/30
>>>>>> > replace Industry="Steel" in 31/100
>>>>>> > g A= 1 if District=="East" & Industry=="Steel"
>>>>>> > g B= 1 if District=="West" & Industry=="Steel"
>>>>>> > g C= 1 if District=="East" & Industry=="Concrete"
>>>>>> > ---------------------code ends------------------------------------
>>>>>> >
>>>>>> > HTH and Kind Regards,
>>>>>> > Carlo
>>>>>> > -----Messaggio originale-----
>>>>>> > Da: [email protected]
>>>>>> > [mailto:[email protected]] Per conto di Arka Roy
>>>>>> > Chaudhuri
>>>>>> > Inviato: mercoledì 15 settembre 2010 9.24
>>>>>> > A: [email protected]
>>>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>>>> > district
>>>>>> >
>>>>>> > Dear All,
>>>>>> > I have a data set which has information at the individual
>>>>>> > level.I have variables which record the district of residence of the
>>>>>> > individual, the industry of employment of the individual and other
>>>>>> > demographic characterstics.The data set also comes with weights which
>>>>>> > represents the probability that a particular household is included in
>>>>>> > the sample.Thus all individuals belonging to a particular household
>>>>>> > get the same weight.Now I want to estimate the number of workers
>>>>>> > belonging to each industry in a particular district.Could anyone
>>>>>> > please advice on the correct stata code that I should write to get my
>>>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>>>> > on the possible biases that might affect my estimates at the
>>>>>> > industry-district level.I would really appreciate any help in this
>>>>>> > regard.Thanks
>>>>>> >
>>>>>> > Regards,
>>>>>> > Arka
>>>>>> > --
>>>>>> > Arka Roy Chaudhuri
>>>>>> > PhD Student
>>>>>> > University of British Columbia
>>>>>> > 997-1873 East Mall
>>>>>> > Vancouver
>>>>>> > Canada
>>>>>> > Ph: +1 (604) 349-8283
>>>>>> > Email: [email protected]
>>>>>> >
>>>>>> > *
>>>>>> > * For searches and help try:
>>>>>> > * http://www.stata.com/help.cgi?search
>>>>>> > * http://www.stata.com/support/statalist/faq
>>>>>> > * http://www.ats.ucla.edu/stat/stata/
>>>>>> >
>>>>>> >
>>>>>> > *
>>>>>> > * For searches and help try:
>>>>>> > * http://www.stata.com/help.cgi?search
>>>>>> > * http://www.stata.com/support/statalist/faq
>>>>>> > * http://www.ats.ucla.edu/stat/stata/
>>>>>> >
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/statalist/faq
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/statalist/faq
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/