Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: identifying highest number of consecutive variables where answer is consistent across observation
Nick Cox <[email protected]>
"[email protected]" <[email protected]>
Re: st: identifying highest number of consecutive variables where answer is consistent across observation
Thu, 20 Feb 2014 18:34:26 +0000
Joe Canner has developed a good strategy for looking at this. Here is another.
Suppose we -reshape long-, something like
gen id = _n
reshape long var, i(id) j(question)
tsset id question
Then we can treat the blocks of observations as panel data. With
ssc inst tsspell
tsspell var
With this syntax for -tsspell- a "spell" is automatically a sequence
of identical values. The existence of spells 15 or longer will be
summarized by
egen fifteen_or_more = total((_seq >= 15) / _end), by(id)
where division by the indicator variable -_end- (1 on end of spell, 0
otherwise) ensures that we look only at the ends of spells. If needed,
we can then -reshape- back.
On the other hand, it is quite likely that some questions of similar
kind are more easily answered with this data structure.
[email protected]
On 20 February 2014 17:04, Alison El Ayadi <[email protected]> wrote:
> I am doing some data cleaning on survey data and am looking to
> identify observations where there are 15 or more of the same answers
> in a row (across the variables in current order). All of the
> variables are string. Does anyone have an easy automated way to do
> this? I'm thinking that it could be done by generating a variable
> that provided the maximum number of same responses in a row, but have
> no idea how to code this. Variables are q1 - q94, and all string.
> Any suggestions on efficiently writing this code would be greatly appreciated.
* For searches and help try: