Dear Statalist Users,
I have panel data and I would like to extract certain information for a
survival analysis. Conceptually, I have data on supplier, buyer and
quantity sold and I want to ask: when is the first date at which buyer
buys more than x amount.
My data is of the following form:
year sid pid bid quantity
1990 111 1 555 0
1990 111 1 777 10
1990 111 2 555 100
1990 111 2 777 0
1991 111 1 555 3
1991 111 1 777 25
1991 111 2 555 5
1991 111 2 777 5
sid: seller id
pid: product id
bid: buyer id
So this tells me for each seller, how much product they sold to which
buyer. Now I want to ask two questions:
(1) For each seller and for each product and buyer, when is the first year
at which quantity sold is positive?
So the output I would want would be:
sid pid bid answer
111 1 555 1991
111 1 777 1990
111 2 555 1990
111 2 777 1991
The complication is that in the real data, quantity sold may become
positive, zero, then again positive. How can I only pick up the first time
it becomes positive?
With this simple data, it's relative easy. I can just sort by sid pid bid
and pick the first year that comes up but I was wondering if there's a
more elegant way that's also quicker.
(2) This is trickier. For each selle and product, I want the first year at
which quantity sold is at least x for all buyers. For example, say x=50.
mThen my answer would be
sid pid answer
111 1 .
111 2 .
since in neither year does the seller sell more than 50 to ALL buyers.
However if x=5, then the output woud be
sid pid answer
111 1 .
111 2 1991
I would appreciate any hints. Any references or tricks for getting this
type of panel data ready for survival-type analysis would be helpful as
well. Thank you very much for your help.
Jason Hwang
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/