Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: RE: AW: Search for string values in dataset??
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE: RE: AW: Search for string values in dataset??
Date
Mon, 21 Feb 2011 17:42:09 +0000
These answers focused on looking in a single named variable. An extension of the problem is to find which string variables contain any such values.
Using -findname- from SJ:
. findname, any(strpos(@, "GmbH"))
which you could extend using -lower()- if desired.
-strpos()- is the modern name for -index()-.
Note that -strmatch()- and -regexm()- are functions, and not commands.
Nick
[email protected]
Eric Booth
One way to match insensitive to case would be to create a lowercase (and (optionally) temporary) version of the string variable and match on that, so:
*********************!
clear
inp str10(var1)
"GmbH7UuIZ"
"GMbH7UuIZ"
"gmbh7Uuiz"
end
//1. from Markus and Junlin//
g one = 1 if strmatch(var1, "*GmbH*")
g byte two = regexm(var1, "GmbH")
list var1 if regexm(var1, "GmbH")
//2. Another option: index //
g str10 three = var1 if index(var1, "GmbH")
g str10 four = var1 if index(var1, "gmbh")
list var1 if index(var1, "GmbH")
//3. case insensitive - lower case matches //
tempvar var1_lower
g `var1_lower' = lower(var1)
g str10 five = var1 if index(`var1_lower', "gmbh")
/* could evaluate to 1 if matched (instead of var1 contents) */
l
*********************!
Liao, Junlin
> Continue on this topic, anyone has a good way to make those two commands case insensitive? Thanks,
Liao, Junlin
> Another command is -regexm-
>
> gen byte GmbH_Match = regexm(variable, "GmbH")
>
> If you simply want to list the entries:
>
> list variable if regexm(variable, "GmbH")
Wiemann, Markus
> try the -strmatch- command.
> For example:
> gen GmbH = 1 if strmatch(variablename, "*GmbH*")
miyu Lee
> Is there ANY way to search for specific string values in a dataset with string variables? For example: I am searching for all entries showing the part "GmbH" in a vector with firm names. I have a bad feeling about this!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/