Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SE and CI by mrtab

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: SE and CI by mrtab
Date	Tue, 15 May 2012 10:17:18 +0100

That sounds a plausible first approximation, but the data generation
process varies. To make matters concrete, let's imagine a question
asked of n persons:

Which statistical software do you use routinely?

However, the protocol can vary:

1. People can name as many distinct programs as they like.

2. People should name precisely k distinct programs. (Perhaps a bit
unlikely with this particular question, but bear with me.)

3. People may name up to k distinct programs.

1', 2', 3'. As above, but the order of mention is important.

In practice these protocols all lead to datasets in which responses
are stored as several variables for analysis. (The exception in which
(e.g.) "Stata R SAS" is packed into a single string variable is not
much of an exception, as the contents need to be unpacked to do much
with them.)

Now, it seems key to me that the number of persons is an upper limit
on the number of mentions of a particular program, so the percent of
mentions of a particular answer is not bounded by 100, but by a lower
limit. (What to do about the enthusiast who says "Stata Stata Stata"
is a practical point, just as what to do about the enthusiast who says
"Manchester City".)

Also, although this may well be a much smaller point, the
interpretation of missing values differs between these protocols.

Nick

On Mon, May 14, 2012 at 10:43 PM, Steve Samuels <[email protected]> wrote:
> Each "percentage" has the form P = (mentions of category X)/(number of mentions).  Numerator and denominator are random for each person, so the percentages are actually
> ratios:
>
> ******************************
>  use http://fmwww.bc.edu/RePEc/bocode/d/drugs.dta, clear
>  mrtab inco1-inco7, include title(Sources of income) width(24)
>  egen sumi = rowtotal(inco*)
>  ratio inco1/sumi
> *****************************
>
> Since Abu is knowledgeable about SPSS, I'd appreciate a reference to the confidence interval formulas that SPSS uses when percentages add to more than 100%.  (I couldn't find one in the SPSS 16 algorithms manual.)  I'd appreciate it also if he would compare the calculation above to the one that SPSS reports.
>
> Steve
> [email protected]
>
>
> On May 14, 2012, at 8:40 AM, Nick Cox wrote:
>
> If a program is counting mentions, they are not people. Either way, I stand by what I said. I don't think even "sample size" is well defined for such data, so I don't see how inference is well defined.
>
> I can't comment on what SPSS does, but I repeat my request. I would be grateful for literature references showing that SPSS, or anybody else, really has a solution for this problem. Just counting mentions regardless of where they come from sounds somewhere between dubious and fallacious to me.
>
> The author of -mrtab- is Ben Jann, who is not a member of Statalist. If you want his answers, you need to write to him directly.
>
> Nick
> [email protected]
>
> Abu Camara
>
> Thanks Nick. Consider the table below and you want to get the "se"
> & "ci"for the responses variable which are in percentages. I was able
> to do this for
> other survey questions which are not multiple responses. Perhaps the
> author might consider
> including standard errors & confidence interval generation in his
> program. I will have to turn to
> SPSS which has the facility.
>
> --------------------------------------------------------------------------------------------------------.
>
> mrtab inco1-inco7, include title(Sources of income) width(24)
>
> Pct. of     Pct. of
> Sources of income       Freq.   responses       cases
> -------------------------------+-----------------------------------
> inco1          private support         226       12.83       23.25
> (partner, family,
> friends)
> inco2           public support         607       34.47       62.45
> (unemployment insurance,
> social benefits)
> inco3             drug dealing         293       16.64       30.14
> inco4    housebreaking, theft,          50        2.84        5.14
> robbery
> inco5             prostitution          82        4.66        8.44
> inco6       "mischeln"/begging         151        8.57       15.53
> inco7         legal occupation         352       19.99       36.21
> -------------------------------+-----------------------------------
> Total        1761      100.00      181.17
>
>
> On 14 May 2012 14:35, Nick Cox <[email protected]> wrote:
>> I don't really have further comments. I was half-assuming that you know exactly what you seek, but if so you are not spelling it out.
>>
>> As I see it, you would need to specify what data generation process you expect to apply and e.g. how confidence intervals are to be defined and calculated.
>>
>> For example, if the question is mode of transport to work and the answers look like
>>
>> Car
>> Car, train, walk
>> Walk
>> Yak
>> Horse
>> Camel
>> Personal helicopter
>> ...
>>
>> it is not clear to me what meaning there could be to a standard error around the percent of people who say "walk". If the principle is that people can specify a variety of answers, the associated data generation process seems elusive to me. You can always count "mentions" rather than "people" but the inference for that I don't think is obvious.
>>
>> So, I don't think you can blame Stata for neglecting this area unless you can point to literature in which the logic is explained.
>>
>> Nick
>> [email protected]
>>
>> Abu Camara
>>
>> Hi Nick,
>>
>> Thanks for the reply.
>> I have no idea of writing my own program for "mrtab" to compute "se" &
>> "ci". Further help/suggestion would be appreciated.
>> Official Stata appears to be weak in complex tabulation.
>> Abu.
>>
>> On 14 May 2012 12:18, Nick Cox <[email protected]> wrote:
>>> SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
>>>        (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
>>>        Q1/05   SJ 5(1):92--122
>>>        introduces new commands for the computation of one- and
>>>        two-way tables of multiple responses
>>>
>>> You are correct, I think. -mrtab- doesn't provide these, so you may
>>> need to write your own program.
>>>
>>> Nick
>>>
>>> On Mon, May 14, 2012 at 10:09 AM, Abu Camara <[email protected]> wrote:
>>>
>>>> I am running one and two way tables of multiple response using the
>>>> user-written command "mrtab" (Stata 11.2). I tried to generate both
>>>> standard errors and
>>>> confidence intervals for tables of percentages but I could not find
>>>> this as an option.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>

References:
- st: SE and CI by mrtab
  - From: Abu Camara <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Abu Camara <[email protected]>
- RE: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Abu Camara <[email protected]>
- RE: st: SE and CI by mrtab
  - From: Nick Cox <[email protected]>
- Re: st: SE and CI by mrtab
  - From: Steve Samuels <[email protected]>

Prev by Date: Re: st: plot a normal distribution using stata
Next by Date: Re: st: SE and CI by mrtab
Previous by thread: Re: st: SE and CI by mrtab
Next by thread: Re: st: SE and CI by mrtab
Index(es):
- Date
- Thread