Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: working with a 24-character string variable consisting of 0s and 1s


From   Lisa Cook <[email protected]>
To   [email protected]
Subject   Re: st: working with a 24-character string variable consisting of 0s and 1s
Date   Sat, 15 Feb 2014 14:27:11 -0500

Apologies for the delayed reply. Thanks very much to Nick and Eduardo
for the assist!

On Tue, Feb 11, 2014 at 8:31 AM, Nick Cox <[email protected]> wrote:
> There is a better way in this case, as removing "0"s is the complement
> of removing "1"s:
>
> gen firstyear = length(subinstr(substr(myvar,1,12), "0", "", .))
>
> The more general trick remains to count what you want by removing
> instances and seeing what difference that makes to the length. As
> here, don't remove it in the original variable, but just get Stata to
> do the same calculation.
>
> Nick
> [email protected]
>
>
> On 11 February 2014 10:12, Nick Cox <[email protected]> wrote:
>> Regular expressions are great, just too often considered when there
>> are more direct methods of getting what you want.
>>
>> Consider
>>
>> gen firstyear = 12  - length(subinstr(substr(myvar,1,12), "1", "", .))
>>
>> Let's split the recipe into steps:
>>
>> substr(myvar, 1, 12) is the first 12 characters.
>>
>> subinstr(substr(myvar, 1, 12), "1", "", .)
>>
>> blanks out each "1", replacing it with "", an empty string.
>>
>> length() gives you the length of what's left. 12 minus that is the
>> length of what we removed, and so the number of 1s in the substring.
>>
>> The second year is then
>>
>> gen secondyear = 12  - length(subinstr(substr(myvar,13,12), "1", "", .))
>>
>> Once understood, the flavour is "Yes, of course", but it was spelled out within
>>
>> http://www.stata-journal.com/article.html?article=dm0056
>>
>> Nick
>> [email protected]
>>
>>
>> On 11 February 2014 02:46, Lisa Cook <[email protected]> wrote:
>>> Hi,
>>>
>>> I need help working with a cumbersome string variable. I'm using Stata/MP 13.0.
>>>
>>> I've inherited a dataset that includes several variables indicating
>>> the number of months each person had specific kinds of health
>>> insurance (Medicaid, Medicare, private, etc.).
>>>
>>> The variables are 24 characters long in string format. Each character
>>> is either a 0 or 1, and represents whether the person had coverage in
>>> that month. So, if one of these variables equals
>>> "000000000000000000000000", the person had no coverage in any month of
>>> that type, while if it equals "111111111111111111111111", they were
>>> covered in every month by that kind of insurance. If the variable
>>> equals, say, "101111111111111111111111", the person had 23 months of
>>> coverage, but no coverage in the 2nd month.
>>>
>>> I would like to use these variables to generate, for each kind of
>>> insurance, the total in year 1, the total in year 2, and the total
>>> number of months of coverage in both years.
>>>
>>> I've used regexm before, but I can't figure out how to apply that code
>>> to my situation. I'd be very grateful if anyone could suggest some
>>> options.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index