Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: egen with user-defined function
From
Kieran McCaul <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: egen with user-defined function
Date
Tue, 20 Aug 2013 13:12:58 +0800
...
Hi Jan,
this is the way I've dealt with these data. The reshape at the end puts all the cods in one variable, which makes it easier to flag the ones that you're interested in.
BTW, are you using the ENTITY_AXIS_DATA data?
/* Open the death data */
outfile sequence r_year record_axis_data using "$deaths\record axis0.txt" if yeargrp==0, noquote wide replace
outfile sequence r_year record_axis_data using "$deaths\record axis1.txt" if yeargrp==1, noquote wide replace
outfile sequence r_year record_axis_data using "$deaths\record axis2.txt" if yeargrp==2, noquote wide replace
/* sequence is our ID variable
r_year is the registration year
*/
clear *
infile using "$deaths\record_axis0.dct", using("$deaths\record axis0.txt") clear
tempfile record1
save `record1', replace
infile using "$deaths\record_axis1.dct", using("$deaths\record axis1.txt") clear
append using `record1'
save `record1', replace
infile using "$deaths\record_axis2.dct", using("$deaths\record axis2.txt") clear
append using `record1'
save "$deaths\record_axis.dta", replace
reshape long ra_cod, i(sequence) j(order)
drop if ra_cod==""
drop order
save "$deaths\record_axis (long).dta", replace
* The record_axis.dct file looks like this. The others are variations on this basically.
infile dictionary {
str5 sequence %5s
_skip(6)
str6 year %4s
_skip(7)
str4 cod01 %5s
str4 cod02 %5s
str4 cod03 %5s
str4 cod04 %5s
str4 cod05 %5s
str4 cod06 %5s
str4 cod07 %5s
str4 cod08 %5s
str4 cod09 %5s
str4 cod10 %5s
str4 cod11 %5s
str4 cod12 %5s
str4 cod13 %5s
}
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jan Barendregt
Sent: Tuesday, 20 August 2013 9:47 AM
To: [email protected]
Subject: RE: st: egen with user-defined function
Hi Joe,
Thanks for your reply.
1) You're right, these are ICD10 codes. They are coded in a single string (see also my answer to Nick's reply, but, unfortunately, the way they are coded differes between years. And I also have a year with ICD-9 codes. So I'm trying to write something that with minor tweaks can handle all these variations. And yes, I'm at this point only interested whether a sepsis code is mentioned, not how many or which ones.
2) Using separate variables seems a pain, because there can be up to 20 different codes in the string.
Thanks,
Jan
------------------------------------------
Jan J Barendregt, MA, PhD
Assoc Prof of Epidemiological Modelling
School of Population Health, University of Queensland
Email: [email protected]
Skype: janbarendregt
Phone: +61 7 3102 3093
Visit www.epigear.com: home of Ersatz, MetaXL, DisMod II, Prevent, and more!
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Joe Canner
> Sent: Tuesday, 20 August 2013 11:32 AM
> To: [email protected]
> Subject: RE: st: egen with user-defined function
>
> A couple of minor comments:
>
> 1. Based on the context and the fact that you named your
> indicator variable "bool", I suspect you are only interested
> in knowing whether codes in the range A410-A419 appear at
> all, not how many there are. In that case, Nick's code can
> be simplified using the -strpos()- function to look for the
> first occurrence of "A41x". If this (as I suspect) is a
> well-behaved list of ICD-10 codes, you could even just look
> for "A41" and probably be OK.
>
> 2. While Nick's code is, as usual, quite clever and quite
> correct, I am wondering why you don't parse the 4-digit
> (ICD-10?) codes into separate variables. This, I believe, is
> probably how they were meant to be processed and so doing
> reduces the need for clever coding tricks.
>
> Just my $0.02.
>
> Joe Canner
> ________________________________________
> From: [email protected]
> [[email protected]] on behalf of Nick Cox
> [[email protected]]
> Sent: Monday, August 19, 2013 8:40 PM
> To: [email protected]
> Subject: Re: st: egen with user-defined function
>
> I don't see quite what -egen- is objecting to here but you have more
> fundamental problems.
>
> Wiring in particular names to a program is not illegal but it usually
> implies that you are pitching your problem at the wrong level.
>
> (*) I guess that you have some variable -RECORD_AXIS_DATA- and you are
> looking for occurrences of the strings "A410" ... "A419".
>
> Your code would at most look in the first observation as you are using
> the -if- command
> instead of an -if- qualifier.
>
> It would be probably be best if you described the concrete problem, as
> my strong guess is that no original program is needed here.
>
> But I'll guess that (*) is the nub of the problem.
>
> clonevar copy = RECORD_AXIS_DATA
>
> qui forval i = 0/9 {
> replace copy = subinstr(copy, "A41`i'", "", .)
> }
>
> gen count = (length(RECORD_AXIS_DATA) - length(copy)) / 4
>
> So in a copy of the variable we replace each occurrence of "A410"
> "A411" ... "A419" with an empty string. Each such deletion reduces the
> number of characters by 4.
>
> This was written up in
> http://www.stata-journal.com/article.html?article=dm0056
>
>
> Nick
> [email protected]
>
>
> On 20 August 2013 01:03, Jan Barendregt
> <[email protected]> wrote:
>
> > I'm trying to let egen to use a user-defined function. Here
> is what I'm doing:
> >
> > do "C:\Ddrive\MattCooper\Data\test.do"
> >
> > . capture program drop _grecordparse
> >
> > . program define _grecordparse
> > 1. local bool=0
> > 2. local li=1
> > 3. local i=0
> > 4. while `i' < RECORD_AXIS_COUNT {
> > 5. if (substr(RECORD_AXIS_DATA,`li',4)>="A410") &
> (substr(RECORD_AXIS_DATA,`li',4)<="A419") local bool=
> >> `bool'+1
> > 6. local li=`li'+4
> > 7. local i=`i'+1
> > 8. }
> > 9. gen `typlist' `varlist' = `bool'
> > 10. end
> >
> > .
> > end of do-file
> >
> > . egen sepsis=recordparse
> > unknown egen function recordparse()
> > r(133);
> >
> >
> > So I can't get egen to recognise my recordparse function. Any ideas?
> >
> > I'm using Stata 11.2 on Windows
> >
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/