Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: working with a 24-character string variable consisting of 0s and 1s
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: working with a 24-character string variable consisting of 0s and 1s
Date
Tue, 11 Feb 2014 10:12:34 +0000
Regular expressions are great, just too often considered when there
are more direct methods of getting what you want.
Consider
gen firstyear = 12 - length(subinstr(substr(myvar,1,12), "1", "", .))
Let's split the recipe into steps:
substr(myvar, 1, 12) is the first 12 characters.
subinstr(substr(myvar, 1, 12), "1", "", .)
blanks out each "1", replacing it with "", an empty string.
length() gives you the length of what's left. 12 minus that is the
length of what we removed, and so the number of 1s in the substring.
The second year is then
gen secondyear = 12 - length(subinstr(substr(myvar,13,12), "1", "", .))
Once understood, the flavour is "Yes, of course", but it was spelled out within
http://www.stata-journal.com/article.html?article=dm0056
Nick
[email protected]
On 11 February 2014 02:46, Lisa Cook <[email protected]> wrote:
> Hi,
>
> I need help working with a cumbersome string variable. I'm using Stata/MP 13.0.
>
> I've inherited a dataset that includes several variables indicating
> the number of months each person had specific kinds of health
> insurance (Medicaid, Medicare, private, etc.).
>
> The variables are 24 characters long in string format. Each character
> is either a 0 or 1, and represents whether the person had coverage in
> that month. So, if one of these variables equals
> "000000000000000000000000", the person had no coverage in any month of
> that type, while if it equals "111111111111111111111111", they were
> covered in every month by that kind of insurance. If the variable
> equals, say, "101111111111111111111111", the person had 23 months of
> coverage, but no coverage in the 2nd month.
>
> I would like to use these variables to generate, for each kind of
> insurance, the total in year 1, the total in year 2, and the total
> number of months of coverage in both years.
>
> I've used regexm before, but I can't figure out how to apply that code
> to my situation. I'd be very grateful if anyone could suggest some
> options.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/