Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE : RE: merging two kernel density graphs into one
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE : RE: merging two kernel density graphs into one
Date
Thu, 27 Oct 2011 12:22:26 +0100
A key point about densities is that they integrate to 1 over the support of the function. Once you add densities you get something else, but as long as you weight properly you should get sensible results.
That's a matter of principle but things get messier when you do this as you must for a finite set of points. Everything then also depends on nitty-gritty such as where you evaluate the density and whether some of the density gets lost in the far tails. Also, with income I would expect difficulties of the kind associated with any very skewed distribution, even with the same bandwidth for all estimations.
On a different level: What you are doing is sketched only in very general terms, so I think it's difficult for anyone to make precise comments.
Nick
[email protected]
Wies Kestens
This is indeed a methodological exercise and is not intended to be a strict or good approximation of reality.
The countries will be weighted by their population but as this is the easier part of the problem, in my opinion, I won't be needing your assistance and didn't want to bother you with it.
When I pooled all the income data I got a income density that indeed still varied with the bandwidth chosen. However, I don't think this method results in the same income densities as the income densities I would become when I could add up all the individual countries' income densities. However I'm not really sure about that. If the effect of different bandwidths were indeed the same, then my problem would be solved. Can anybody enlighten me on that?
Nick Cox [[email protected]]
What you want is clearer, but I wouldn't approach it your way at all.
If you pool income data for all countries, presumably weighting by
population (which you don't mention, but the problem seems meaningless
otherwise), you can then smooth that once to get a single density
curve. Adding the densities of (again presumably hundreds of)
countries, even with weighting, seems unnecessarily complicated by
comparison.
Your statement that -generate()- is limited to producing 10 data
points I think only makes sense in one context, in which that is the
default because you have just 10 data points. But if that is so, I
wouldn't apply density estimation at all. Also, if you are starting
with deciles alone you have already lost much of the detail and it is
optimistic to suppose that kernel density estimation can put it back.
However, as you hint the exercise could become an essay on limitations
of method.
There are people on this list who are top experts in income
distributions who may want to add to this (or subtract from it).
Nick
On Tue, Oct 25, 2011 at 10:49 AM, Wies Kestens
> Using the -kdensity- command, I become individual countries' income densities. I'm trying to estimate the world income density by adding up all these individual countries' income densities. It's important that I work with the estimated income densities and not just with the decile shares for each country because I want to compare the effect of different choices of bandwidth and such on the global income density.
>
> However, I can't figure out how to add op these different income densities.
>
> The -generate()-option is another approach to the problem. If I could extract 1000 or more points from the income density that would be the solution as that would be a fairly good approximation of the estimated income density. But the -generate()-option only extracts 10 points from each density and therefore doesn't help me.
Nick Cox [[email protected]]
> -save graph- is not legal Stata syntax.
>
> I don't understand the request. Probability [not population] density curves for quite different data can be superimposed, but usually they can not be combined otherwise. Much depends on what is meant by "combine".
>
> In addition, you appear to be confusing density with cumulative probability.
>
> -kdensity- has -generate()- options which can be used to keep the results. Then you can superimpose the curves on one graph.
Wies Kestens
> My problem concerns the merge of different graphs in stata.
> Each graph is made using the -kdensity- command and then saved.
> For example:
> kdensity algeria
> save graph algeria
> kdensity albania
> save graph albania
>
> The y-axis of these graphs shows the population density and the x-axis shows the income.
> I would like to merge those two graphs into one graph which would then
> give the combined population density on the Y-axis and the income on the
> x-axis.
> For example:
> when 50% of the people in Albania en 100% of the people in Algeria would
> earn 100$, the combined graph would state that 75% of the people earn
> 100$, given they both have the same population.
>
> No command I know of/could find information about is able to do this it seems.
>
> Can someone point me in the right direction?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/