Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Daniel Henriksen <henriksen.dp@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: extracting portions of a string variable using observations from another variable |
Date | Wed, 26 Jan 2011 21:52:30 +0100 |
I had some technical problems with my mail, sorry. Thank you very much Eric for your new solution. I need to sit down and read it through carefully. My hope is that I can use this example in general (and it looks like I can). Because I'd like to match the the drug names as well (cefuroxim, metronidazol ect). I have about 3000 combinations of drug names (some consist of one word others two or three words) and ways of administer them. Again, thank you very much! There're a lot of nice people around here! Cheers Daniel 2011/1/26 Eric Booth <ebooth@ppri.tamu.edu>: > <> > > > Daniel asked about matching more than one word in the first example using -merge- to match the data. > One way would be to just create a dataset for each word of the split 'dispersingsform' and match each one in during the loop. Below, I've modified the first example I provided to do what he asks (edits are marked with *comments*): > > **********************************! Begin Example > ** Note: Watch for Wrapping ** > > //DATASET OF WORDS TO BE EXTRACTED FROM RECORDS DATA--> > clear > inp str30 Dispenseringsform > "filmovertrukne tabl." > "oral opløsning" > "pulv.t.konc.t.inf.v." > "inj.-/inf.væske" > "enterotabletter" > "tabletter" > "pulv.t.inj.væske,opl" > "inf.væske, opløsning" > "pul.t.inj.+inf.,opl." > end > levelsof Dispenseringsform, loc(alt) > di `"`alt'"' > split Dispenseringsform > l Dispenseringsform1 > > **new** > **new** > preserve > keep Dispenseringsform1 > duplicates drop //--so that you can m:1 merge later > sa dispense_Dispenseringsform1.dta, replace > restore > preserve > keep Dispenseringsform2 > duplicates drop > sa dispense_Dispenseringsform2.dta, replace > restore > > > //1. EXTRACT WORDS IN DISPENSE.DTA USING MERGE --> > clear > inp str244 record > "Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl 00:00 + 100 ml kl 08:00 + 100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12 09-09-2010 00:35)" > "Metronidazol Actavis filmovertrukne tabl. 500 mg peroralt 1 tablet 3 gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)" > "Metronidazol B. Braun inf.væske, opløsning 5 mg/ml intraveøst 100 ml 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)" > "Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e) Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)" > end > sa records.dta, replace > > split record > l record2-record6 > > /* > extract words in 'record' that > match dispense.dta list: ( > pulv.t.inj.væske,opl, > filmovertrukne tabl. inf.væske, > opløsning and pul.t.inj.+inf.,opl.) > */ > > g str30 newvar = "" > > > **updated** > **updated** > forval n = 2/5 { > **Adding the -foreach- below allows you to merge over more > ****than one word split from 'dispersingsform' in the master data > foreach new in Dispenseringsform1 Dispenseringsform2 { > > rename record`n' `new' > merge m:1 `new' using "dispense_`new'.dta" > drop if _m==2 //--keep matched and master records only > replace newvar = `new' if _m==3 & mi(newvar) > rename `new' record`n' // --*I reordered the drop/rename lines > drop _merge > cap drop Dispenseringsfor* > } > } > order newvar > drop record? record?? > l > **********************************! End Example > Also, keep in mind that you can match on all the words using the second example I provided. > > - Eric > __ > Eric A. Booth > Public Policy Research Institute > Texas A&M University > ebooth@ppri.tamu.edu > > On Jan 26, 2011, at 9:27 AM, Steven Samuels wrote: > >> Daniel, for the edification of all users (including Eric) who might not remember your original question and his response, please include edited versions in follow-up questions. (FAQ 3.4 "Edit Previous Posting"). >> >> Steve >> sjsamuels@gmail.com >> >> >> >> On Jan 26, 2011, at 9:33 AM, Daniel Henriksen wrote: >> >> Dear Eric >> >> thank you so much for your suggestions! I will dig further into them asap. >> >> regarding your first suggestion, is it possible to match two or three >> words and not just the one parsed. excuse my ignorance. still a >> beginner when it comes to stata >> >> cheers >> daniel > > > >> Eric A. Booth wrote: >> >> <> >> >> >> Here are 2 approaches: >> >> The first one is less reliable (i.e., it might require careful examination and tweaking) but might be more useful if you are bringing over more variables from the 'dispersingsform'/using dataset to the 'records'/master dataset. Keep in mind that it matches on the first word (parsed by a space character) in 'dispersingsform' -- so it matches "filmovertrukne tabl" by the "filmovertrukne" part. >> >> The second approach is more straightforward if you are working with a list of 'dispersingsform' that is short enough to fit into a macro (see help limits) and you don't need to bring in any extra variables from the 'dispersingsform' dataset. It simply collects all the dispersingsform into a local macro (`alt') and then uses a string position function (see help string_functions) to find matches. >> >> The result of both approaches are stored in the variable 'newvar': >> >> <snip> > > >> On Jan 24, 2011, at 3:21 PM, Daniel Henriksen wrote: >> >>> Hello statalist >>> >>> Hope you can help me. Is it possible for stata to extract specific >>> words within a string using observations from another variable? >>> I have a dataset with a list different ways of dispensing the drug >>> (which form it is). here's an example: >>> >>> Dispenseringsform >>> filmovertrukne tabl. >>> oral opløsning >>> pulv.t.konc.t.inf.v. >>> inj.-/inf.væske >>> enterotabletter >>> tabletter >>> pulv.t.inj.væske,opl >>> inf.væske, opløsning >>> pul.t.inj.+inf.,opl. >>> (I have 270 rows of these (different forms and different ways of spelling it)) >>> >>> the I have another dataset (only one variable but many observations) >>> containing information on what drug, way of dispensing, dose and time >>> the drug is to be administered to the patient: >>> >>> Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl >>> 00:00 + 100 ml kl 08:00 + 100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12 >>> 09-09-2010 00:35) >>> Metronidazol Actavis filmovertrukne tabl. 500 mg peroralt 1 tablet 3 >>> gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20) >>> Metronidazol B. Braun inf.væske, opløsning 5 mg/ml intraveøst 100 ml >>> 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21) >>> Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e) >>> Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37) >>> >>> So I would like to extract pulv.t.inj.væske,opl, filmovertrukne >>> tabl. inf.væske, opløsning and pul.t.inj.+inf.,opl. from these four >>> observations and place them in a new variable without having to go >>> through all of the information manually. >>> I hope my question is clear. >>> >>> Thank you for your time >>> Daniel >>> > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Daniel Henriksen Ph.d. studerende, læge Infektionsmedicinsk afd Q / Akut Modtage Afdelingen Odense Universitetshospital Bygning 2, 1. sal Sdr. Boulevard 29 5000 Odense C * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/