Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE

From   Arka Roy Chaudhuri <[email protected]>
To   [email protected]
Subject   Re: st: R: Estimating the number of workers in each industry in each district - flag: Stata 9/2 SE
Date   Fri, 24 Sep 2010 13:03:04 -0700

Dear Steve,

   Thanks a lot for all your advice.The problem is that in my dataset
I have about 490 industries and 400 districts. Both industries and
districts come with a code identifying them.I used the following
command to estimate the number of workers in each industry in a

bysort districtid industryid:egen workers=total(weight) /*here weight
represents the inverse of probability of the household being sampled*/
duplicates drop districtid industryid,force
keep  districtid industryid workers

Is the above estimation strategy leaving aside the issue of -svyset-
my data? Please advice.


On Fri, Sep 24, 2010 at 8:55 AM, Steve Samuels <[email protected]> wrote:
> My advice about handling household counts of workers was wrong. Do not expand.
> Say you have counts for the number of workers in the hh  in three industries
> n_agriculture
> n_service
> n_sales
> Then you would use do a separate command for each industry, for example:
> *********************************************
> levelsof district, local(districts)
> foreach x of  local districts{
> svy: total n_agriculture if district==`x'
> }
> ***********************************************
> You would use this form rather than an -over()-  or -subpop()- option,
> because districts are sampling strata.
> -Steve
> On Fri, Sep 24, 2010 at 9:44 AM, Steve Samuels <[email protected]> wrote:
>> Arka-
>> Based on your description, you would -svyset- your data as follows:
>> Define a variable (call it "psu" for "primary sampling unit") which is
>> the village number (rural sector) or urban block( urban sector)
>> then
>> ********************************************************
>> svyset psu [pw = your weight], strata(district)
>> ***********************************************************
>> If your data has one line per person, with "industry" categorized
>> then the command for totals might be
>> *****************************************************
>> svy: tab district industry, count se format(%10.0fc)
>> *****************************************************
>> If your data has only counts of workers in each industry in each HH,
>> then you should -expand- the data first so that it has one line for
>> each worker in the HH, e.g.
>> *************
>> expand hhsize
>> *************
>> (but that might include children, so you will have to take some care)
>> Now a word of advice. It is easy to go wrong in a survey analysis. As
>> you are a student, I suggest that you seek guidance from a faculty
>> member who is experienced in surveys, if not in Stata. (I know that
>> the Department of Statistics at UBC has a survey sampling course). I
>> also suggest that you obtain a text to learn about sampnling, such as
>> Sharon Lohr's "Sampling: Design and Analysis" (2009).  I also
>> recommend "Applied Survey Data Analysis" by Heeringa, West,and
>> Berglund (2010); it uses Stata almost exclusively for its examples.
>> Best wishes,
>> Steve
>> Steven J. Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>> On Thu, Sep 23, 2010 at 8:24 PM, Arka Roy Chaudhuri <[email protected]> wrote:
>>> Hi,
>>>  Thanks for the help. In my dataset all the districts in the target
>>> population are include. The sampling design is stratified multi-stage
>>> design with the first stage units being villages in the rural sector
>>> and urban blocks in the urban sector. The ultimate stage units (USU)
>>> are households in both the sectors.
>>>   I only have one set of weights that comes with the data. The
>>> documentation states that the weights represent the probability that
>>> the particular household was included in the sample.  Please let me
>>> know if I should include any other information. I am really thankful
>>> for all the help.
>>> Regards,
>>> Arka
>>> On Wed, Sep 15, 2010 at 7:16 AM, Steve Samuels <[email protected]> wrote:
>>>> Arka-
>>>> I can't answer  without more information about the sampling design.
>>>> Please describe the design in detail, including answers to the
>>>> following questin..
>>>> 1. Were all districts in the target population included in the sample?
>>>> Or, were districts sampled?
>>>> 2. Are the final sampling weights the probability sampling weights? Or
>>>> was there adjustment to the probabilithy weights (post-stratification,
>>>> "raking")  so that the sample results will better reflect population
>>>> census proportions? If the weights are so adjusted,  are the original
>>>> sampling weights available to you?
>>>> Steve
>>>> Steven J. Samuels
>>>> [email protected]
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:    206-202-4783
>>>> On Wed, Sep 15, 2010 at 4:07 AM, Carlo Lazzaro <[email protected]> wrote:
>>>> > Arka wrote:
>>>> > "Now I want to estimate the number of workers
>>>> > belonging to each industry in a particular district"
>>>> >
>>>> > A quite trivial example about Arka's issue may be the following one (set
>>>> > aside survey technicalities):
>>>> >
>>>> > ---------------------code begins------------------------------------
>>>> > drop _all
>>>> > set obs 100
>>>> > g Workers=_n
>>>> > g District="East" in 1/50
>>>> > replace District="West" in 51/100
>>>> > g Industry="Concrete" in 1/30
>>>> > replace  Industry="Steel" in 31/100
>>>> > g A= 1 if  District=="East" &  Industry=="Steel"
>>>> > g B= 1 if  District=="West" &  Industry=="Steel"
>>>> > g C= 1 if  District=="East" &  Industry=="Concrete"
>>>> > ---------------------code ends------------------------------------
>>>> >
>>>> > HTH and Kind Regards,
>>>> > Carlo
>>>> > -----Messaggio originale-----
>>>> > Da: [email protected]
>>>> > [mailto:[email protected]] Per conto di Arka Roy
>>>> > Chaudhuri
>>>> > Inviato: mercoledì 15 settembre 2010 9.24
>>>> > A: [email protected]
>>>> > Oggetto: st: Estimating the number of workers in each industry in each
>>>> > district
>>>> >
>>>> > Dear All,
>>>> >        I have a data set which has information at the individual
>>>> > level.I have variables which record the district of residence of the
>>>> > individual, the industry of employment of the individual and other
>>>> > demographic characterstics.The data set also comes with weights which
>>>> > represents the probability that a particular household is included in
>>>> > the sample.Thus all individuals belonging to a particular household
>>>> > get the same weight.Now I want to estimate the number of workers
>>>> > belonging to each industry in a particular district.Could anyone
>>>> > please advice on the correct stata code that I should write to get my
>>>> > desired estimates?Also I would be grateful if somebody could advice me
>>>> > on the possible biases that might affect my estimates at the
>>>> > industry-district level.I would really appreciate any help in this
>>>> > regard.Thanks
>>>> >
>>>> > Regards,
>>>> > Arka
>>>> > --
>>>> > Arka Roy Chaudhuri
>>>> > PhD Student
>>>> > University of British Columbia
>>>> > 997-1873 East Mall
>>>> > Vancouver
>>>> > Canada
>>>> > Ph: +1 (604) 349-8283
>>>> > Email: [email protected]
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *
>>>> > *
>>>> > *
>>>> >
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *
>>>> > *
>>>> > *
>>>> >
>>>> *
>>>> *   For searches and help try:
>>>> *
>>>> *
>>>> *
>>> *
>>> *   For searches and help try:
>>> *
>>> *
>>> *
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index