Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: identifying highest number of consecutive variables where answer is consistent across observation |
Date | Thu, 20 Feb 2014 18:34:26 +0000 |
Joe Canner has developed a good strategy for looking at this. Here is another. Suppose we -reshape long-, something like gen id = _n reshape long var, i(id) j(question) tsset id question Then we can treat the blocks of observations as panel data. With ssc inst tsspell tsspell var With this syntax for -tsspell- a "spell" is automatically a sequence of identical values. The existence of spells 15 or longer will be summarized by egen fifteen_or_more = total((_seq >= 15) / _end), by(id) where division by the indicator variable -_end- (1 on end of spell, 0 otherwise) ensures that we look only at the ends of spells. If needed, we can then -reshape- back. On the other hand, it is quite likely that some questions of similar kind are more easily answered with this data structure. Nick njcoxstata@gmail.com On 20 February 2014 17:04, Alison El Ayadi <aelayadi@gmail.com> wrote: > I am doing some data cleaning on survey data and am looking to > identify observations where there are 15 or more of the same answers > in a row (across the variables in current order). All of the > variables are string. Does anyone have an easy automated way to do > this? I'm thinking that it could be done by generating a variable > that provided the maximum number of same responses in a row, but have > no idea how to code this. Variables are q1 - q94, and all string. > > Any suggestions on efficiently writing this code would be greatly appreciated. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/