Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: extracting portions of a string variable using observations from another variable

From	Eric Booth <[email protected]>
To	"<[email protected]>" <[email protected]>
Subject	Re: st: extracting portions of a string variable using observations from another variable
Date	Wed, 26 Jan 2011 15:49:17 +0000

<>


Daniel asked about matching more than one word in the first example using -merge- to match the data.  
One way would be to just create a dataset for each word of the split 'dispersingsform' and match each one in during the loop.  Below, I've modified the first example I provided to do what he asks (edits are marked with *comments*):

**********************************!  Begin Example
** Note: Watch for Wrapping **

//DATASET OF WORDS TO BE EXTRACTED FROM RECORDS DATA-->
clear
inp str30 Dispenseringsform
"filmovertrukne tabl."
"oral opløsning"
"pulv.t.konc.t.inf.v."
"inj.-/inf.væske"
"enterotabletter"
"tabletter"
"pulv.t.inj.væske,opl"
"inf.væske, opløsning"
"pul.t.inj.+inf.,opl."
end
levelsof Dispenseringsform, loc(alt)
di `"`alt'"'
split Dispenseringsform
l Dispenseringsform1
	
	**new**
	**new**
	preserve
	keep Dispenseringsform1
	duplicates drop   //--so that you can m:1 merge later
	sa dispense_Dispenseringsform1.dta, replace
	restore
	preserve 
	keep Dispenseringsform2
	duplicates drop
	sa dispense_Dispenseringsform2.dta, replace
	restore


//1. EXTRACT WORDS IN DISPENSE.DTA USING MERGE -->
clear
inp str244 record
"Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl 00:00 +  100 ml kl 08:00 +  100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12 09-09-2010 00:35)"
"Metronidazol Actavis filmovertrukne tabl. 500  mg peroralt 1 tablet 3 gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)"
"Metronidazol B. Braun inf.væske, opløsning 5  mg/ml intraveøst 100 ml 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)"
"Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e) Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)"
end
sa records.dta, replace

split record
l record2-record6

/* 
extract words in 'record' that 
match dispense.dta list:  (
pulv.t.inj.væske,opl,  
filmovertrukne tabl.  inf.væske, 
opløsning and  pul.t.inj.+inf.,opl.)
*/

g str30 newvar = ""


	**updated**
	**updated**
forval n = 2/5 {
	**Adding the -foreach- below allows you to merge over more 
	****than one word split from 'dispersingsform' in the master data	
foreach new in Dispenseringsform1 Dispenseringsform2 {
	
	rename record`n' `new'
	merge m:1 `new' using "dispense_`new'.dta"
	drop if _m==2 //--keep matched and master records only
	replace newvar = `new' if _m==3 & mi(newvar)
	rename `new'  record`n'  // --*I reordered the drop/rename lines
	drop _merge  
	cap drop Dispenseringsfor* 
		}
	}
order newvar
drop record? record??
l
**********************************!  End Example
Also, keep in mind that you can match on all the words using the second example I provided.

- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]

On Jan 26, 2011, at 9:27 AM, Steven Samuels wrote:

> Daniel, for the  edification of all users (including Eric) who might not remember your original question and his response, please include edited versions in follow-up questions.  (FAQ 3.4 "Edit Previous Posting").
> 
> Steve
> [email protected]
> 
> 
> 
> On Jan 26, 2011, at 9:33 AM, Daniel Henriksen wrote:
> 
> Dear Eric
> 
> thank you so much for your suggestions! I will dig further into them asap.
> 
> regarding your first suggestion, is it possible to match two or three
> words and  not just the one parsed. excuse my ignorance. still a
> beginner when it comes to stata
> 
> cheers
> daniel



> Eric A. Booth wrote:
> 
> <>
> 
> 
> Here are 2 approaches:
> 
> The first one is less reliable (i.e., it might require careful examination and tweaking) but might be more useful if you are bringing over more variables from the 'dispersingsform'/using dataset to the 'records'/master dataset. Keep in mind that it matches on the first word (parsed by a space character) in 'dispersingsform' -- so it matches "filmovertrukne tabl" by the "filmovertrukne" part.
> 
> The second approach is more straightforward if you are working with a  list of 'dispersingsform' that is short enough to fit into a macro (see help limits) and you don't need to bring in any extra variables from the 'dispersingsform' dataset.  It simply collects all the dispersingsform into a local macro (`alt') and then uses a string position function (see help string_functions) to find matches.
> 
> The result of both approaches are stored in the variable 'newvar':
> 
> <snip>


> On Jan 24, 2011, at 3:21 PM, Daniel Henriksen wrote:
> 
>> Hello statalist
>> 
>> Hope you can help me. Is it possible for stata to extract specific
>> words within a string using observations from another variable?
>> I have a dataset with a list different ways of dispensing the drug
>> (which form it is). here's an example:
>> 
>> Dispenseringsform
>> filmovertrukne tabl.
>> oral opløsning
>> pulv.t.konc.t.inf.v.
>> inj.-/inf.væske
>> enterotabletter
>> tabletter
>> pulv.t.inj.væske,opl
>> inf.væske, opløsning
>> pul.t.inj.+inf.,opl.
>> (I have 270 rows of these (different forms and different ways of spelling it))
>> 
>> the I have another dataset (only one variable but many observations)
>> containing information on what drug, way of dispensing, dose and time
>> the drug is to be administered to the patient:
>> 
>> Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl
>> 00:00 +  100 ml kl 08:00 +  100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12
>> 09-09-2010 00:35)
>> Metronidazol Actavis filmovertrukne tabl. 500  mg peroralt 1 tablet 3
>> gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)
>> Metronidazol B. Braun inf.væske, opløsning 5  mg/ml intraveøst 100 ml
>> 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)
>> Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e)
>> Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)
>> 
>> So I would like to extract  pulv.t.inj.væske,opl,  filmovertrukne
>> tabl.  inf.væske, opløsning and  pul.t.inj.+inf.,opl. from these four
>> observations and place them in a new variable without having to go
>> through all of the information manually.
>> I hope my question is clear.
>> 
>> Thank you for your time
>> Daniel
>> 



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: extracting portions of a string variable using observations from another variable
  - From: Daniel Henriksen <[email protected]>

References:
- st: extracting portions of a string variable using observations from another variable
  - From: Daniel Henriksen <[email protected]>
- Re: st: extracting portions of a string variable using observations from another variable
  - From: Eric Booth <[email protected]>
- Re: st: extracting portions of a string variable using observations from another variable
  - From: Daniel Henriksen <[email protected]>
- Re: st: extracting portions of a string variable using observations from another variable
  - From: Steven Samuels <[email protected]>

Prev by Date: st: RE: Herfindahl, segregation index
Next by Date: RE: st: RE: selectvars and factor variables
Previous by thread: Re: st: extracting portions of a string variable using observations from another variable
Next by thread: Re: st: extracting portions of a string variable using observations from another variable
Index(es):
- Date
- Thread