Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: programming assist, too many unique values for levels


From   "Andrew O'Connor DO" <[email protected]>
To   <[email protected]>
Subject   Re: st: programming assist, too many unique values for levels
Date   Wed, 02 May 2007 17:34:12 -0400

thanks for the tips, this helps the program run more smoothly

Andrew O'Connor, DO, MPH
Division of Nephrology
Center for Healthcare Research and Policy
MetroHealth Medical Center/Case Western Reserve University
(216)778-8484
>>> [email protected] 05/02/07 12:02 PM >>>
Michael Blasnik already explained how you can cut the code
down and avoid -levels- (which in Stata 9 is called -levelsof-)
by using -egen, total()- and -egen, mean()-.

However, your code illustrates various points that arise
elsewhere, and so are worth brief comment.

How can code like this fragment be improved? (I have
indented your code, following standard precepts on
programming style.)

--------------------------------------- #0
gen time=.
levels pt, local(levels)
quietly foreach l of local levels {
	sum obstime if pt==`l'
	local total=r(sum)
	replace time=`total' if pt==`l'
}
---------------------------------------

1. Cut out the middle macro. The macro
used as message-bearer in

local total = r(sum)
replace ... = `total'

does no harm, but it is unnecessary.

--------------------------------------- #1
gen time=.
levels pt, local(levels)
quietly foreach l of local levels {
	sum obstime if pt==`l'
	replace time = r(sum) if pt==`l'
}
---------------------------------------

2. As you only want the sum, use -summarize, meanonly-.
With many variables, this is always worth doing.
-meanonly- is a dopey name, because it doesn't mean
what it says, but that's a issue aside.

-------------------------------------- #2
gen time=.
levels pt, local(levels)
quietly foreach l of local levels {
	sum obstime if pt==`l', meanonly
	replace time = r(sum) if pt==`l'
}
--------------------------------------

3. -pt- comes out of an -encode-, which yields integers
1 up. You might as well exploit that. That circumvents
problems with the limits on -levels- and the relative
inefficiency of -foreach-. The r-class result r(max)
should be treated like a macro, so that -forvalues-
sees its value, not the name.

-------------------------------------- #3
gen time=.
su pt, meanonly
quietly forval l = 1/`r(max)' {
	sum obstime if pt==`l', meanonly
	replace time = r(sum) if pt==`l'
}
--------------------------------------

4. In fact, you can do it directly without loops.
Michael might do it something like this.
(Note that Andrew already sorted by -pt-.

------------------------------------- #4
by pt : egen time = total(obstime)
--------------------------------------

5. However, this is more efficient, as
it avoids the interpretive overhead of -egen-
(-viewsource egen.ado- and -viewsource _gtotal.ado-
to see what I mean).

------------------------------------ #5
by pt : gen time = sum(obstime)
by pt : replace time = time[_N]
------------------------------------

Nick
[email protected]

Andrew O'Connor

I'm hoping someone can offer some help, I've been working on this for
some time now
I'm running STATA 8.2 SE and have a large dataset (>90,000 rows of data
with about 12,000 unique record numbers, multiple observations for the
same individual).
I'm trying to calculate a "time out of range" for each patient (i.e. the
proportion of each patients observation time that is predicted to be
greater than 140 assuming a linearly interpolated slope of acutally
measured blood pressures--not simply the proportion of blood pressure
readings that is > than my threshold).  I have 3 variables: MRN (medical
record number), Visit_date, bp_systolic

I've run into a problem due to the size of my data set, specifically
that I have too many levels.  Here is my code
    encode mrn, gen (pt)
sort pt visit_date
drop if bp_systolic==.
by pt:gen obstime =visit_date[_n+1]-visit_date
by pt:gen sys_diff=bp_systolic[_n+1]-bp_systolic
by pt:gen slope=sys_diff/obstime
by pt:gen predict=(140-bp_systolic)/slope if bp_systolic<140 &
bp_systolic[_n+1]>=140
by pt:gen date140=visit_date + predict
by pt:gen predict2=floor([140-bp_systolic]/slope) if bp_systolic>=140 &
bp_systolic[_n+1]<140
by pt:gen date140down=visit_date[_n-1] - predict2
by pt:gen out_range=obstime if bp_systolic>=140 & bp_systolic[_n+1]>=140
by pt: replace out_range=visit_date[_n+1]- date140 if bp_systolic<140 &
bp_systolic[_n+1]>=140
by pt: replace out_range=obstime- predict2 if bp_systolic>=140 &
bp_systolic[_n+1] <140
gen time=.

levels pt, local(levels)
quietly foreach l of local levels {
sum obstime if pt==`l'
local total=r(sum)
replace time=`total' if pt==`l'
}
gen time_out=.
quietly foreach l of local levels {
sum out_range if pt==`l'
local total =r(sum)
replace time_out=`total' if pt==`l'
}
gen time_o_r=(time_out/time)
local threshold = 140
   gen proportion=.
levels pt, local(levels)
  quietly foreach l of local levels {
    count if pt == `l' & bp_systolic !=.
     local total =r(N)
     count if bp_systolic >= `threshold' & bp_systolic !=. & pt == `l'
     replace proportion = r(N)/`total' if pt == `l'
}
Any suggestions for using a different set of programming statements???


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


The MetroHealth System: Saving lives in Northeast Ohio for 170 
years as the region's leader in critical care, community health 
and rehabilitation.  Visit us at http://www.MetroHealth.org for 
a complete list of services, health care providers, and 
locations.

This email and all attachments that may have been included are 
intended only for the use of the party to whom/which the email 
is addressed and may contain information that is privileged, 
confidential, or exempt from disclosure under applicable law. 
If you are not the addressee or the employee or agent of the 
intended recipient, you are hereby notified that you are 
strictly prohibited from printing, storing, disseminating, 
distributing, or copying this communication. If you have 
received this notification in error, please contact the 
Director of Risk/Privacy Management at (216)778-5728. For a copy 
of our Notice of Privacy Practices, please visit: 
http://www.metrohealth.org/general/privacy.asp 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index