Wolney Conde wrote (minus HTML and mailjunk):
> I want to perform a graphical analysis called "worm plot";
> it is a detrended plot, where one plots the difference between
> the empirical and theoretical distributions (y axis) against the
> theoretical distribution (x axis).
> I've obtained the empirical Z distribution and I need to generate
> the theoretical distribution within age ranges. I have tried the
following:
> egen ndist_t = rank(emp_z),by(agerange)
> egen Ndist_t = count(emp_z),by(agerange)
> gen zdist_t = n/(N+1)
> replace zdist_t = invnorm(zdist_t)
> At the end I imagine to have obtained the theoretical distribution
to my (Z) sample.
> Am I right? If so, is there some way more appropriated to do it?
Here by "theoretical" Wolney evidently means "expected if following
Gaussian (normal) distribution".
Part of this problem is obtaining the plotting positions
for the graphs. Note that this is discussed in an FAQ
How can I calculate percentile ranks?
How can I calculate plotting positions?
http://www.stata.com/support/faqs/stat/pcrank.html
which uses a very similar method employing -egen-.
A refinement is that (in Wolney's notation) n / (N +1)
is not an especially good choice for this problem,
although a sufficiently large sample size would
make all choices practically equivalent. (n - 0.375) /
(N + 0.25) is another possibility. For more, see
the FAQ above and its references.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/