That sounds like a good idea; I will have to look into it. Thanks for
the reference. There is a variable INTNUM for interviewer number.
I also just discovered this, via closer reading of the documentation
accompanying the data. It may be useful in solving my problem:
The pdf file from the ESDS Government website, that lists the names of
all the variables, lists this:
POINT3 Sample point(3)
The (3) is a footnote that reads: "3 Variable renamed AREA in archived
dataset."
-codebook- shows that there are 720 unique values of AREA in the dataset.
Here is partial output of a tablulation:
tab area
sample |
point | Freq. Percent Cum.
------------+-----------------------------------
101 | 18 0.10 0.10
102 | 28 0.15 0.25
103 | 10 0.05 0.30
104 | 15 0.08 0.39
105 | 30 0.16 0.55
106 | 28 0.15 0.70
107 | 32 0.17 0.88
108 | 18 0.10 0.97
109 | 15 0.08 1.05
110 | 23 0.13 1.18
111 | 37 0.20 1.38
112 | 14 0.08 1.46
I then found this on the Web at
http://www.ccsr.ac.uk/esds/events/hse/doyle.ppt
"In HSE, a multistage stratified probability design is used. Postcode
sectors are sorted by Health Authority and, within each HA by the % of
HHs where HoH is in a non-manual occupation. 720 postcode sectors are
selected with probability of selection proportional to number of
delivery points (or addresses) in each sector."
So is the AREA variable in the dataset perhaps the postal code? That
would be wonderful.
--
Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
and Wilson Family Practice Residency, Johnson City, NY
cryanatbinghamtondotedu
GnuPG and PGP public keys available at http://pgp.mit.edu
"If you want to build a ship, don't drum up the men to gather wood,
divide the work and give orders. Instead, teach them to yearn for the
vast and endless sea." [Antoine de St. Exupery]
Ulrich Kohler wrote:
> Christopher W. Ryan wrote:
>> "Thanks for your query. Yes you have understand correctly that postal
>> code is used as the PSU. Unfortunately you won't find this or strata in
>> the HSE datasets because of concerns over confidentiality. This is
>> something that we are going to raise with ONS and other data providers
>> as it is definitely one of the shortfalls with the datasets so thank you
>> for raising the issue. I'm sorry I can't bring you any better news."
>>
>> So knowing that the data are from a complex multistage sampling design,
>> but having no access to the psu information, what would be the best way
>> to proceed with analysis?
>
> You might consider using the interviewer number instead of the PSU identifier.
> Fieldwork institutes sometimes uses just one interviewer per PSU so that the
> interviewer number also identifies the PSU.
>
> A related thing is that the interviewers are a source for clustering
> themselfes. A recent publication by Schnell and Kreuter shows that
> interviewer effects can be stronger than PSU effects. Hence, there are
> reasons to look at the design-effects from the interviewers, anyway.
>
> @ARTICLE{schnell05b,
> author = {Schnell, Rainer and Kreuter, Frauke},
> year = {2005},
> title = { {S}eparating {I}nterviewer and {S}ampling-{P}oint {E}ffects },
> journal = { {J}ournal of {O}fficial {S}tatistics },
> volume = {3},
> pages = {389--410},
> }
>
> Uli
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/