| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: Re: st: question on distribution of values
By "unique" here also understand "distinct".
This may be one of those mid-Atlantic linguistic problems
differentiating English and American.
After all, the Unix utility uniq, invented in New Jersey, removes
duplicate lines, and leaves just one copy of each of the distinct lines
in a file. It does not identify lines that occur just once in the file.
Tim's suggestion is illegal in Stata, as only one -egen- function is
allowed on the RHS of an -egen- command.
It would not be correct if it were legal, as -egen, count()- does not
count distinct values. There is a function in -egenmore- from SSC that
does, but official Stata suffices here.
First, tag each distinct co-occurrence of -order- and -zip-
egen tag = tag(order zip)
Now sum within -order-
egen distinct = sum(tag), by(order)
OR
egen distinct = total(tag), by(order)
Now you are home and dry
gen average_pkg_per_zip = qt / distinct
It took me several years to realise that the -nvals()-
function in -egenmore- was pretty much redundant
given the -tag()- function of -egen- that I introduced
earlier (although did not really invent).
Without -egen- this is
bysort order zip : gen tag = sum(_n == 1)
by order zip : replace tag = tag[_N]
by order : gen distinct = sum(tag)
by order : replace distinct = distinct[_N]
gen average_pkg_per_zip = qt / distinct
Nick
[email protected]
Timothy Mak
What about:
bysort order: egen qt2 = mean(Qt) / count(Qt)
Andrea King
Here's an example of the data I'm working with:
Order# Qt Zip
1 5 00011
1 5 00012
1 5 00013
1 5 00014
2 3 00021
2 3 00023
3 8 00031
3 8 00035
3 8 00036
Here are my problems:
1. The quantity of packages (qt) listed does not correspond directly to
the zip code. For example, Order #1 requested 5 packages, to be
distributed among each of four zip codes, or 1.25 packages per unique
zip, not 5 packages per zip code.
2. I have yet to find the correct syntax that would allow me to create a
variable that would show the distribution of Qt among the zip codes.
I've played with egen, but can't get it to work.
So my question is:
how can I take one value of Qt (or if needed, an average of Qt), within
each unique Order# and divide it by the number of unique zip codes by
order#? Also, if it helps, the order number is listed each
time the zip code changes, so a count of order# would probably work,
too, but I'd prefer to do it by Zip.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/