Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: cases to variables with automated timestamp-based unitizing
From
A Loumiotis <[email protected]>
To
[email protected]
Subject
Re: st: RE: cases to variables with automated timestamp-based unitizing
Date
Tue, 8 Jun 2010 11:02:17 +0300
I came up with a code that I think does what you asked:
clear*
inp byte case str2 Rater Start End byte(Var1 Var2)
1 R1 17.54 123.29 4 2
2 R2 18.02 123.76 4 3
3 R1 128.43 171.53 2 1
4 R2 130.13 148.21 2 1
end
list, noo
reshape wide Var?, i(case) j(Rater) string
list, noo
sort Start End
gen start=.
gen end=.
forvalues j=1/`=_N-1' {
forvalues i=0/`=_N-`j'-1' {
replace start=Start[`j'] in `=_N-`i'' if
abs(Start[`=_N-`i'']-Start[`j'])<5 & abs(End[`=_N-`i'']-End[`j'])<5
replace end=End[`j'] in `=_N-`i'' if abs(End[`=_N-`i'']-End[`j'])<5 &
abs(End[`=_N-`i'']-End[`j'])<5
}
}
replace start=Start if start==.
replace end=End if end==.
gen uniqid=start+end
foreach var of varlist Var* {
bysort uniqid: egen byte n_`var'=total(`var')
}
drop Var*
list, noo
duplicates drop uniqid, force
list, noo
Antonis Loumiotis
On Mon, Jun 7, 2010 at 2:13 AM, Martin Weiss <[email protected]> wrote:
>
> <>
>
> The " simple "cases to variables" procedure" is a -reshape wide-, in Stata parlance:
>
>
>
> ***********
> clear*
>
> inp byte case str2 Rater Start End byte(Var1 Var2)
> 1 R1 17.54 123.29 4 2
> 2 R2 18.02 123.76 4 3
> 3 R1 128.43 171.53 2 1
> 4 R2 130.13 148.21 2 1
> end
>
> list, noo
>
> reshape wide Var?, i(case) j(Rater) string
>
> list, noo
> ***********
>
>
> HTH
> Martin
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
> Sent: Montag, 7. Juni 2010 00:51
> To: [email protected]
> Subject: st: cases to variables with automated timestamp-based unitizing
>
> Dear all,
>
> I have a data transformation problem and would greatly appreciate any suggestions on how to solve it.
>
> I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec).
>
> The data look like this (this is a simplified version with only two raters when in actuality there are nine):
>
>
> Rater Start End Var1 Var2 ...
>
> case1 R1 17.54 123.29 4 2
>
> case2 R2 18.02 123.76 4 3
>
> case3 R1 128.43 171.53 2 1
>
> case4 R2 130.13 148.21 2 1
> .
> .
> .
>
>
>
> I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset.
>
> Consequently, this is what the data should look like in the end:
>
>
> Start End Var1_R1 Var1_R2 Var2_R1 Var2_R2 ...
>
> case1 17.54 123.29 4 4 2 3
>
> case2 128.43 171.53 2 . 1 .
>
> case3 130.13 148.21 . 2 . 1
> .
> .
> .
>
>
> For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit.
>
> I am unsure how to go about solving this transformation task in an automated fashion in Stata - hence any help is much appreciated.
>
> Thanks in advance,
> Eike
> ___________________________________________________________
> GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/