Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Extracting different portions of string values
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: Extracting different portions of string values
Date
Fri, 1 Oct 2010 11:53:10 +0100
I suspect you might need a combination of -strpos()- and substr()-, but I don't understand your criteria well enough to suggest exact code. How does one discriminate a "citation number" from anything else? That may be a matter of a regular expression.
Alternatively, check out -split-.
Nick
[email protected]
Florian Seliger
we are searching for commands in order to extract different portions of string values.
Our data with patent citations looks like this:
id cit_1
1 EP696218-A -- WO9215370-A SUND _SUND-Individual_
2 WO9425112-A -- GB298635-A
3 EP578126-A -- CH180906-A AGE_OK
4 EP562128-A -- DE1684639-A
5 WO9318277-A -- DK137935-B
6 US4434855-A SEC OF NAVY _USNA_
.
.
.
.
with 100,000 IDs and about 500 affected variables (cit_1, cit_2, cit_3...).
In this example, we only want to keep the second portion for the IDs 1-5, but the first portion for ID 6. We want to extract the first portion whenever there is only one citation number.
The data should thus look like this:
id cit_1
1 WO9215370-A
2 GB298635-A
3 CH180906-A
4 DE1684639-A
5 DK137935-B
6 US4434855-A
.
.
.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/