Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: specify groups within a group


From   "Neil Shephard" <[email protected]>
To   [email protected]
Subject   Re: st: specify groups within a group
Date   Sat, 9 Dec 2006 12:50:19 +0900

On 12/9/06, Lucy Shum <[email protected]> wrote:
Hi, Thanks for the help. Could sb expand a little on the egen ... =
cut(age), group(4) command? I'm not sure how to interpret this in "English".
-man egen- explains it in english.

Further, I am also stumped about the first line where it says:

.. keep if _n==_N (so it's saying to keep the current patient observation (2
entries per patient in the Catheter.dta file) as long as it is the same as
the total number of observations in the dataset? - doesn't make sense to me.
That wouldn't make sense at all, since you would end up with only one
observation.

However in this example

. bysort patient (time): keep if _n == _N

the -keep- is prefixed by -bysort- so it is applied to each group
formed by the variable patient.    If there are five observations on
patient one, and four on patient two  then only the last time each was
seen is retained by the -keep- command.  The (time) in the above
statement sorts within the patient group in ascending order...

To demonstrate how this works consider the data set below

. list

    +----------------+
    | patient   time |
    |----------------|
 1. |       1      3 |
 2. |       1      4 |
 3. |       1      1 |
 4. |       1      2 |
 5. |       1      5 |
    |----------------|
 6. |       2      4 |
 7. |       2      2 |
 8. |       2      3 |
 9. |       2      1 |
10. |       3      6 |
    |----------------|
11. |       3      1 |
12. |       3      5 |
13. |       3      2 |
14. |       3      4 |
15. |       3      3 |
    |----------------|
16. |       3      7 |
    +----------------+

* Sorting the data does this...

. sort patient time

. list

    +----------------+
    | patient   time |
    |----------------|
 1. |       1      1 |
 2. |       1      2 |
 3. |       1      3 |
 4. |       1      4 |
 5. |       1      5 |
    |----------------|
 6. |       2      1 |
 7. |       2      2 |
 8. |       2      3 |
 9. |       2      4 |
10. |       3      1 |
    |----------------|
11. |       3      2 |
12. |       3      3 |
13. |       3      4 |
14. |       3      5 |
15. |       3      6 |
    |----------------|
16. |       3      7 |
    +----------------+
* Generate an indicator of which observation comes in which order
based on patient and time

. bysort patient (time) : gen _ = _n

. list

    +--------------------+
    | patient   time   _ |
    |--------------------|
 1. |       1      1   1 |
 2. |       1      2   2 |
 3. |       1      3   3 |
 4. |       1      4   4 |
 5. |       1      5   5 |
    |--------------------|
 6. |       2      1   1 |
 7. |       2      2   2 |
 8. |       2      3   3 |
 9. |       2      4   4 |
10. |       3      1   1 |
    |--------------------|
11. |       3      2   2 |
12. |       3      3   3 |
13. |       3      4   4 |
14. |       3      5   5 |
15. |       3      6   6 |
    |--------------------|
16. |       3      7   7 |
    +--------------------+

* Generate and indicator of how many observations there are on each patient

. bysort patient (time) : gen __ = _N

. list

    +-------------------------+
    | patient   time   _   __ |
    |-------------------------|
 1. |       1      1   1    5 |
 2. |       1      2   2    5 |
 3. |       1      3   3    5 |
 4. |       1      4   4    5 |
 5. |       1      5   5    5 |
    |-------------------------|
 6. |       2      1   1    4 |
 7. |       2      2   2    4 |
 8. |       2      3   3    4 |
 9. |       2      4   4    4 |
10. |       3      1   1    7 |
    |-------------------------|
11. |       3      2   2    7 |
12. |       3      3   3    7 |
13. |       3      4   4    7 |
14. |       3      5   5    7 |
15. |       3      6   6    7 |
    |-------------------------|
16. |       3      7   7    7 |
    +-------------------------+

* Retain the last observation on each patient based on time (which
will be where the value of _ is equal to __

. keep if _ == __
(13 observations deleted)

. list

    +-------------------------+
    | patient   time   _   __ |
    |-------------------------|
 1. |       1      5   5    5 |
 2. |       2      4   4    4 |
 3. |       3      7   7    7 |
    +-------------------------+

Stata is just doing all of this on the fly for you by using the
original command.

HTH's

Neil
--
"Doing science for the money is like having sex for the exercise." - Matt

Email - [email protected] / [email protected]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index