Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: identifying highest number of consecutive variables where answer is consistent across observation
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: identifying highest number of consecutive variables where answer is consistent across observation
Date
Thu, 20 Feb 2014 18:51:16 +0000
Not quite. You'd need to -encode- first. Revised sketch, with another
simplification. (Every spell of length 16 or greater necessarily has a
15th value.)
gen id = _n
reshape long var, i(id) j(question)
tsset id question
ssc inst tsspell
encode var, gen(nvar)
tsspell nvar
egen fifteen_or_more = total(_seq == 15), by(id)
Nick
[email protected]
On 20 February 2014 18:34, Nick Cox <[email protected]> wrote:
> Joe Canner has developed a good strategy for looking at this. Here is another.
>
> Suppose we -reshape long-, something like
>
> gen id = _n
> reshape long var, i(id) j(question)
> tsset id question
>
> Then we can treat the blocks of observations as panel data. With
>
> ssc inst tsspell
> tsspell var
>
> With this syntax for -tsspell- a "spell" is automatically a sequence
> of identical values. The existence of spells 15 or longer will be
> summarized by
>
> egen fifteen_or_more = total((_seq >= 15) / _end), by(id)
>
> where division by the indicator variable -_end- (1 on end of spell, 0
> otherwise) ensures that we look only at the ends of spells. If needed,
> we can then -reshape- back.
>
> On the other hand, it is quite likely that some questions of similar
> kind are more easily answered with this data structure.
>
> Nick
> [email protected]
>
>
> On 20 February 2014 17:04, Alison El Ayadi <[email protected]> wrote:
>
>> I am doing some data cleaning on survey data and am looking to
>> identify observations where there are 15 or more of the same answers
>> in a row (across the variables in current order). All of the
>> variables are string. Does anyone have an easy automated way to do
>> this? I'm thinking that it could be done by generating a variable
>> that provided the maximum number of same responses in a row, but have
>> no idea how to code this. Variables are q1 - q94, and all string.
>>
>> Any suggestions on efficiently writing this code would be greatly appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/