Re: st: Collapse

From   Philip Ryan <[email protected]>
To   [email protected]
Subject   Re: st: Collapse
Date   Thu, 12 Feb 2004 11:21:26 +1030


I am not sure you want the -collapse- command at all.

The command:

by Personid Referraldate, sort: keep if _n == 1

will -keep- the first observation of each Personid, and since you have sorted on Referraldate within Personid, that first observation will be the one with the earliest date. _n is Stata's built-in current-observation-counter (I used SPSS a very long, long time ago and I think it had a "seqnum" construct?).

I am assuming that your Referraldate variable is actually in Stata's elapsed date form, so it can be treated as any other continuous numeric variable. If not, check -help date- and especially -help datefcn-. Dates are almost always best stored in Stata's elapsed form.


At 11:18 AM 12/02/2004 +1100, you wrote:

Dear all,

I need help regading the collapse command.

I want to collapse by personid, and keep the values for all variables corresponding with the earliest(minimum) referralid. I want a final file that has the first referral for each person including the date and agency of that referral. The problem I am having is that the collapse is defaulting to give the mean of the date and agency (or the minimum).

For instanace

Personid Referralid Referraldate(d.m.y) Referringagency
25 1 21.12.03 291
25 24 31.01.04 290
25 75 25.02.04 292
78 4 10.11.03 290
78 22 12.12.03 275

I want to collapse by the minimum value of referralid but I dont want the minimum or mean value for referraldate or referringagency. I want the values that correspond to the minimum referralid.

The result I want is:
Personid Referralid Referraldate(d.m.y) Referringagency
25 1 21.12.03 291
78 4 10.11.03 290

This is probably very easy, and i apologise for the trouble, but having worked with SPSS for many years now, I am trying to find my way through STATA logic as best as possible.


Thank you for your help,

Is there a way to list multiple cases that dont fit a standard expression in the replace command. for example:

replace var1=1 if var2==1 or 3

I want cases 1 and 3 to have value 1 for var1, but not case 2 and rather than writing the whole syntax twice, i would like to shorten it:

replace var1=1 if caseid==1
replace var1=1 if caseid==3

Any clues?


At 11:43 AM 2/11/2004 +1100, Jason Payne wrote:
>Dear STATAlistservers
>I am trying to transform data in variables. What is the equivalent STATA>
syntax for this SPSS command
>if (var1=2) var2=1.

Stata does things kind of backwards from SPSS:

gen var2 = 1 if var1==2

Note the ==

If var2 already exists, instead say

replace var2 = 1 if var1==2

But, what do you want when var1 does not equal 2? Missing? A 0-1
dichotomy? There may be some other ways to approach this, depending on whaty
our ultimate goal is.

>This syntax only works if you want to create a new variable, however in
>some cases I might with to modify an existing variable for one case
>only. This is particularly useful in cleaning string variables where
>there are spelling mistakes. In SPSS I would just type:
>if (caseid=1) stringvar='SPSS'.
>if (caseid=2) stringvar='STATA'.

replace stringvar="SPSS" in 1
replace stringvar="STATA" in 2

I'm assuming these are the first and second cases in your data. If not
necessarily so, and caseid is a variable in your data set, then try

replace stringvar="SPSS" if caseid==1
replace stringvar="STATA" if caseid==2

Or, just go in to the data editor and do it! Lot easier sometimes really.

