Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: strgroup
From
"Liao, Junlin" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: strgroup
Date
Mon, 21 Feb 2011 21:59:25 +0000
If you really do not like Excel, the Excel function can be simulated in Stata. Here's my approach, suppose you have s variable holds string data of this fashion.
by s, sort: gen s_unique=s if _n==1
drop if s_unique==.
Keep s_unique
Then you get a dataset with all unique s's. You can replicate s_unique in an s_correct field to save typing and correct everything in this dataset. Or alternatively you can code them with numbers (save lots of disk space this way).
Once you gets this definition dataset, you can merge back to correct your original dataset or recode your original dataset with data labels to show correct strings. The data labeling codes could be mass produced in Stata as well. It's not going to be any more complex than in Excel:
Suppose you get a final dataset with s_correction_unique, you codes to generate the value lables:
gen la_sta="la def s_lable " +string( _n) + `" ""' + s_correction_unique + `"", add"'
The simply list la_sta will give you all the coding codes. [If your coding differ from _n, then supply a coding variable will do the trick.] Paste it to a do file and instantly you get everything fixed.
Finding out the logic to fix such a problem might be much more challenging and less productive.
Junlin
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Monday, February 21, 2011 3:13 PM
To: '[email protected]'
Subject: RE: st: strgroup
Joking apart, it remains a factual point that many Stata users do not use Excel -- and may not even use machines on which Excel is installed and/or available.
At some point we all tend to use what we find familiar, naturally. But what horrifies me about Excel includes the apparent lack of an audit trail facility. The point of this thread is to make lots of irreversible changes to a dataset!
Nick
[email protected]
Liao, Junlin
I always admire people who can figure out programs to solve such messes. But sometimes that there are typos and stuff that are just difficulty to find a way to generalize. I get most raw data in Excel anyway. Sometimes I get files where data is labeled in Excel. It's just natural for me to use Excel to generate codes to quickly update labels in Stata or deal with situations like this one. Use Excel to write code is a fairly easy trick to learn.
Junlin
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Monday, February 21, 2011 2:39 PM
To: '[email protected]'
Subject: RE: st: strgroup
This answer presupposes that the original poster has a liking for using, MS Excel, which seems unnecessarily pessimistic. My suggestion is that instead of testing for equality you use -strpos()- and/or -subinstr()-.
Nick
[email protected]
Liao, Junlin
I had the same experiences before. One thing to look for is spaces trailing at beginning or end of string. For this kind of situation, I would use Excel to run a pivot table to find out all necessary changes and use Excel to generate the codes. Then paste the codes to a do file. Everything get fixed instantly.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error, then delete it. Thank you.
________________________________
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/