Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Doug Hess <douglasrhess@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: removing characters from string-formatted variables mixed in with numeric-formatted variables |
Date | Fri, 22 Jun 2012 11:03:17 -0400 |
Hello, I imported into Stata from text files a data set of survey responses for a large national survey. Many of the variables have single quotes around numeric values. For instance, a variable may include the values '-9', '1', '2' instead of simply -9, 1, 2. However, not every variable includes these characters for numeric values. (Not sure why!) Thus, Stata formats some variables as string and some as numeric during the import (using the import "text data from a spreadsheat" menu). However, the order of the variables is not strings first, numeric second. It's all hodgepodge. I want to remove all the stray single quote marks. So, after poking around on Statalist I tried using the -replace- command, the -subinstr- function, and a loop: local abc = "control bedrms region smsa metro3 lmed lmeda lmedb fmr" /* Note I truncated this list, there are dozens of variables in the dataset I wish to clean up. */ foreach varname of local abc { replace `varname'=subinstr(`varname',"'","",.) destring `varname', replace } However, this loop stops when it runs into a variable formatted as numeric. Given that there are dozens of these variables, I don't want to use the -order- command one by one to put the string variables first (or last). Is there a way to use the format of the variables with -if- to limit the -order- command or -replace- command? Or other ideas? Thank you. (Note: I subscribe to the list's digest mode, so cc'ing me on any responses would be helpful.) Doug douglasrhess@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/