I'd call this a case of customised input structures, but there's no
accounting for terminology or taste.
I don't understand what t and d are here. I guess they are secondary. If
not, you'll have to say more.
An old Stata maxim is that just because a problem looks like a loop
doesn't mean that looping is the best way to tackle it. Or, you can
often execute a loop just by using basic Stata commands in the right
way.
Another old maxim is that while fancy footwork with files is appealing
to those so inclined, you can often do everything in place.
I made Daniel's example a smidgen more exciting and put everything in a
single string
variable.
. l
+------+
| var1 |
|------|
1. | frog |
2. | 1,2 |
3. | 3,4 |
4. | 5,6 |
5. | END |
|------|
6. | toad |
7. | 7,8 |
8. | 9,10 |
9. | END |
+------+
We have to work first on the identifiers. An identifier occurs just
after any "END", or
right at the beginning.
. gen byte holds_id = (_n == 1) | (var1[_n - 1] == "END")
. l
+-----------------+
| var1 holds_id |
|-----------------|
1. | frog 1 |
2. | 1,2 0 |
3. | 3,4 0 |
4. | 5,6 0 |
5. | END 0 |
|-----------------|
6. | toad 1 |
7. | 7,8 0 |
8. | 9,10 0 |
9. | END 0 |
+-----------------+
We don't need the "END"s any more.
. drop if var1 == "END"
(2 observations deleted)
Copy across the identifiers to a new variable:
. gen id = var1 if holds_id
(5 missing values generated)
. l
+------------------------+
| var1 holds_id id |
|------------------------|
1. | frog 1 frog |
2. | 1,2 0 |
3. | 3,4 0 |
4. | 5,6 0 |
5. | toad 1 toad |
|------------------------|
6. | 7,8 0 |
7. | 9,10 0 |
+------------------------+
And then fill in the gaps by a cascaded -replace-:
. replace id = id[_n-1] if missing(id)
(5 real changes made)
. l
+------------------------+
| var1 holds_id id |
|------------------------|
1. | frog 1 frog |
2. | 1,2 0 frog |
3. | 3,4 0 frog |
4. | 5,6 0 frog |
5. | toad 1 toad |
|------------------------|
6. | 7,8 0 toad |
7. | 9,10 0 toad |
+------------------------+
Now we can clean up a bit:
. drop if holds_id
(2 observations deleted)
. drop holds_id
. l
+-------------+
| var1 id |
|-------------|
1. | 1,2 frog |
2. | 3,4 frog |
3. | 5,6 frog |
4. | 7,8 toad |
5. | 9,10 toad |
+-------------+
Now you can work on -var1- using -split-.
Here's the code in one, assuming one string variable -var1- as a start.
gen byte holds_id = (_n == 1) | (var1[_n - 1] == "END")
drop if var1 == "END"
gen id = var1 if holds_id
replace id = id[_n-1] if missing(id)
drop if holds_id
drop holds_id
Nick
[email protected]
Daniel Exeter
I have the variables
Pid, x, y, t, d
For 1000 participants, and there are many observations for each
participant.
I'm using an archaic program that uses data in the format:
Pid
X,y
X,y
X,y
END
Pid
X,y
X,y
END
And so on.
I would think that I need to :
loop through for each pid
identify the value of pid
write this value to an outfile
loop through all observations of current pid
extract values for x and y
write x and y values to outfile
write "end" into outfile
I tried this logic, but I can't seem to append info to the outfile...
any ideas?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/