Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: working with a 24-character string variable consisting of 0s and 1s |
Date | Tue, 11 Feb 2014 10:12:34 +0000 |
Regular expressions are great, just too often considered when there are more direct methods of getting what you want. Consider gen firstyear = 12 - length(subinstr(substr(myvar,1,12), "1", "", .)) Let's split the recipe into steps: substr(myvar, 1, 12) is the first 12 characters. subinstr(substr(myvar, 1, 12), "1", "", .) blanks out each "1", replacing it with "", an empty string. length() gives you the length of what's left. 12 minus that is the length of what we removed, and so the number of 1s in the substring. The second year is then gen secondyear = 12 - length(subinstr(substr(myvar,13,12), "1", "", .)) Once understood, the flavour is "Yes, of course", but it was spelled out within http://www.stata-journal.com/article.html?article=dm0056 Nick njcoxstata@gmail.com On 11 February 2014 02:46, Lisa Cook <hlthsrvcsphd@gmail.com> wrote: > Hi, > > I need help working with a cumbersome string variable. I'm using Stata/MP 13.0. > > I've inherited a dataset that includes several variables indicating > the number of months each person had specific kinds of health > insurance (Medicaid, Medicare, private, etc.). > > The variables are 24 characters long in string format. Each character > is either a 0 or 1, and represents whether the person had coverage in > that month. So, if one of these variables equals > "000000000000000000000000", the person had no coverage in any month of > that type, while if it equals "111111111111111111111111", they were > covered in every month by that kind of insurance. If the variable > equals, say, "101111111111111111111111", the person had 23 months of > coverage, but no coverage in the 2nd month. > > I would like to use these variables to generate, for each kind of > insurance, the total in year 1, the total in year 2, and the total > number of months of coverage in both years. > > I've used regexm before, but I can't figure out how to apply that code > to my situation. I'd be very grateful if anyone could suggest some > options. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/