Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Need help for calculation across observations within variable
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Need help for calculation across observations within variable
Date
Tue, 21 May 2013 14:40:14 +0100
What's efficiency here?
If it's machine time, in principle you should not use -egen-. In
practice, it would take a big dataset or many repetitions to notice
the slow-down, likely to be less than the time taken to write
alternative code.
On (1), whether there is a difference:
If it's not machine time, but conciseness or simplicity of code, consider
bysort pt_name (year) : gen different = year[_N] != year[1]
except that a large group of Stata users might not agree on how
transparent that is.
This particular question is also an FAQ:
http://www.stata.com/support/faqs/data-management/listing-observations-in-group/
On (2), the number of distinct values, there is a detailed discussion in
SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command
Your solution is a good one.
Here is another
egen tag = tag(pt_name year)
egen max = total(tag), by(pt_name)
Am I being consistent about -egen-? This is how I resolve it:
1. Interactively, I will often use -egen- if an -egen- solution springs to mind.
2. In a program, I know I should rewrite -egen- calls to the extent
that a program is needed for serious or repeated use.
Nick
[email protected]
On 21 May 2013 14:17, Michael Stewart <[email protected]> wrote:
> HI,
>
> I am looking to see if anyone could an efficient code than what I have
> been using for a particular issues that I am dealing with
>
> My Need
>
> 1)Create a variable which shows if the "year" is same or different by pat_name
> 2)Create a variable which shows number of distinct years ,per patient
>
> My dataset structure is as follows
>
> pt_name year(string variable)
> 111 2009
> 111 2009
> 111 2009
> 111 2011
> 222 2009
> 222 2009
> 222 2010
>
> My code is two step one
> Step-1: bysort pt_name(year): gen flag=_n==_N
> Step-2:egen max=total(flag),by(pt_name)
>
> Please let me know if there is an more efficient one step code
>
>
> --
> Thank you ,
> Yours Sincerely,
> Mike.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/