Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Turning text pages into indicators
From
Jen Zhen <[email protected]>
To
[email protected]
Subject
st: Turning text pages into indicators
Date
Wed, 8 Aug 2012 15:01:18 +0200
Dear Statalisters,
(1) I'd like to create a list of indicators to cover whether a string
variable contains at least one out of several words.
I know I can check whether it contains one specific word with - gen
indicator=regexm(string,"word1") - but can I also cover several words
in one command line with this?
I tried - gen indicator=regexm(string,"word1" "word2") - and gen
indicator=regexm(string,"word1" | "word2") - and these wouldn't work,
but maybe there's another way to do this?
I know I can as well generate a separate indicator for each word and
then just sum them up, but since I have many words and many strings to
cover that would be inefficient.
(2) I'm starting with long texts, think half a page or a full page, so
I presumably can't read the entire page into a single string variable
on which I can then perform (1) above.
Do I need to initially split the text in say Excel, or is there a way
to still read all text in in Stata and then split it into as many
variables as necessary (but no more)?
Thanks so much and best regards,
JZ
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/