Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: cases to variables with automated timestamp-based unitizing
From
"Martin Weiss" <[email protected]>
To
<[email protected]>
Subject
st: RE: cases to variables with automated timestamp-based unitizing
Date
Mon, 7 Jun 2010 01:13:25 +0200
<>
The " simple "cases to variables" procedure" is a -reshape wide-, in Stata parlance:
***********
clear*
inp byte case str2 Rater Start End byte(Var1 Var2)
1 R1 17.54 123.29 4 2
2 R2 18.02 123.76 4 3
3 R1 128.43 171.53 2 1
4 R2 130.13 148.21 2 1
end
list, noo
reshape wide Var?, i(case) j(Rater) string
list, noo
***********
HTH
Martin
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
Sent: Montag, 7. Juni 2010 00:51
To: [email protected]
Subject: st: cases to variables with automated timestamp-based unitizing
Dear all,
I have a data transformation problem and would greatly appreciate any suggestions on how to solve it.
I am analyzing data from a rating task with multiple raters. The ratings concerned audiovisual material, i.e. continuous data which had to be properly segmented (unitized) by the raters. Each segment coded by the raters has a timestamp attached to it in the "start" and "end" variables in which start and end times of the segment are recorded (in sec).
The data look like this (this is a simplified version with only two raters when in actuality there are nine):
Rater Start End Var1 Var2 ...
case1 R1 17.54 123.29 4 2
case2 R2 18.02 123.76 4 3
case3 R1 128.43 171.53 2 1
case4 R2 130.13 148.21 2 1
.
.
.
I now intend to do analyses for which the data need to be set up differently. The ratings of the separate judges for all variables should be represented in individual columns while the rows should correspond to a single observed unit each. This means that a relatively simple "cases to variables" procedure is in order. However, the issue is complicated by the need to identify the units and match cases accordingly beforehand. I do not expect agreement on start and end times of the segments to be exactly the same for them to be considered a unit. Instead what is expected here is agreement between raters in a range of, say, 5 sec for both start and end time of the segment. That is, in the above data example, cases 1 and 2 should be counted as a unit and both ratings put into its single row, while for cases 3 and 4 raters disagree too much on the end time of the segment. Therefore, cases 3 and 4 should be kept as single units within the dataset.
Consequently, this is what the data should look like in the end:
Start End Var1_R1 Var1_R2 Var2_R1 Var2_R2 ...
case1 17.54 123.29 4 4 2 3
case2 128.43 171.53 2 . 1 .
case3 130.13 148.21 . 2 . 1
.
.
.
For the analyses intended it does not matter much whether start and end times of the cases (now "units") equal those set by the first rater (as is the case in the example data matrix) or (more elegant) the mean of all ratings then subsumed under the case/unit.
I am unsure how to go about solving this transformation task in an automated fashion in Stata - hence any help is much appreciated.
Thanks in advance,
Eike
___________________________________________________________
GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/