Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Quintiles
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Quintiles
Date
Thu, 9 Aug 2012 11:38:35 +0100
If I read this correctly, Leonardo agrees that exactly equal
frequencies may be impossible with -xtile- but wants to appear to do
it exactly by subterfuge, using weights.
This can be done:
. sysuse auto
. xtile qmpg = mpg, n(5)
. tab qmpg
5 quantiles |
of mpg | Freq. Percent Cum.
------------+-----------------------------------
1 | 18 24.32 24.32
2 | 17 22.97 47.30
3 | 13 17.57 64.86
4 | 12 16.22 81.08
5 | 14 18.92 100.00
------------+-----------------------------------
Total | 74 100.00
. bysort qmpg : gen w = 1/_N
. tabstat w , by(qmpg) s(n sum)
Summary for variables: w
by categories of: qmpg (5 quantiles of mpg)
qmpg | N sum
---------+--------------------
1 | 18 1
2 | 17 1
3 | 13 1
4 | 12 1
5 | 14 1
---------+--------------------
Total | 74 5
------------------------------
However, why is exact equality such a big deal here? Why coarsen when
you have quantitative information to hand?
See also the thread gathered in
http://www.stata.com/statalist/archive/2012-06/msg01193.html on how
-xtile- on a negated version of a variable may (or may not) work
better.
Nick
On Thu, Aug 9, 2012 at 9:16 AM, Maarten Buis <[email protected]> wrote:
> On Wed, Aug 8, 2012 at 9:44 PM, Leonardo Jaime Gonzalez Allende wrote:
>> I don't was planning to cut a person or household in many parts. The question was about a possible adjustment to the weight factor, if the observation of the sample is the cut point of the quintile.
>>
>> If I sort the households of a sample by their incomes, a household "x" could represents 300 households but the accumulated frequency of the population is e.g. 20,02%.
>>
>> My question was if there is an efficient way (command) to repeat the observation and adjust weight factor as follow:
>>
>> the same household "xa" now represents 280 households and now the accumulated frequency of the population is e.g. 20% (exactly) (leaving to the first quintile).
>
> What kind of weight did you have in mind, aweigths, pweights,
> iweights, fweights? Weighting can be a remarkably tricky issue. There
> are many ways such a procedure could go wrong, and I don't know if
> there is way to get it right. Anyhow, I cannot imagine a situation
> where such an effort would be worth the cost (but that may just as
> well say something about a lack of imagination on my part). I would
> just live with the fact that the discrete nature of the number of
> observations leads to slight variations in group size.
>
> Did you look at the possibility that ties (different people reporting
> exactly the same income) are the source of differences in group size?
> In theory, such ties should be pretty rare for a (semi-)continuous
> variable like income. However, in practice respondents tend to round
> their answers, making such ties a lot more common.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/