Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: data manipulation prob.
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: data manipulation prob.
Date
Thu, 7 Jun 2012 18:41:45 +0100
I am going to guess that there is a panel structure too, hidden from this example. Consider
bysort id (date) : gen sumhits = sum(hits)
by id : egen when_halfway = min(date / (sumhits >= (sumhits[_N] / 2)))
by id : gen time_halfway = when_halfway - date[1]
For more on the trick in the second line, see
SJ-11-2 dm0055 . . . . . . . . . . . . . . Speaking Stata: Compared with ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/11 SJ 11(2):305--314 (no commands)
reviews techniques for relating values to values in other
observations
With no panel structure, this could be
sort date
gen sumhits = sum(hits)
su date if sumhits >= (sumhits[_N] / 2)
di r(min) - date[1]
The underlying principle is tautological: the first date on which something is true is just the minimum date satisfying that condition.
Nick
[email protected]
tashi lama
You guessed that right. I could have selected my dataset little random. Yes, my dataset could be really random. I have an idea though, just can't think enough of stata to do it
date hits
|---------------------------|
1. | 10mar2011 01:07:18 2 |
2. | 10mar2011 01:09:48 3 |
3. | 10mar2011 01:54:00 1 |
4. | 10mar2011 02:03:37 8 |
5. | 10mar2011 02:11:00 9 |
|---------------------------|
6. | 10mar2011 02:26:00 5 |
7. | 10mar2011 02:46:00 12 |
8. | 10mar2011 02:47:00 34 |
9. | 10mar2011 02:51:09 14 |
10. | 10mar2011 02:51:24 80 |
+---------------------------+
gen runhits=sum(hits)
list
date hits runhits |
|-------------------------------------|
1. | 10mar2011 01:07:18 2 2 |
2. | 10mar2011 01:09:48 3 5 |
3. | 10mar2011 01:54:00 1 6 |
4. | 10mar2011 02:03:37 8 14 |
5. | 10mar2011 02:11:00 9 23 |
|-------------------------------------|
6. | 10mar2011 02:26:00 5 28 |
7. | 10mar2011 02:46:00 12 40 |
8. | 10mar2011 02:47:00 34 74 |
9. | 10mar2011 02:51:09 14 88 |
10. | 10mar2011 02:51:24 80 168
gen x=(runhits>ceil(runhits[_N]/2))
list
date hits runhits x
|-----------------------------------------|
1. | 10mar2011 01:07:18 2 2 0 |
2. | 10mar2011 01:09:48 3 5 0 |
3. | 10mar2011 01:54:00 1 6 0 |
4. | 10mar2011 02:03:37 8 14 0 |
5. | 10mar2011 02:11:00 9 23 0 |
|-----------------------------------------|
6. | 10mar2011 02:26:00 5 28 0 |
7. | 10mar2011 02:46:00 12 40 0 |
8. | 10mar2011 02:47:00 34 74 0 |
9. | 10mar2011 02:51:09 14 88 1 |
10. | 10mar2011 02:51:24 80 168 1 |
Now, I could do sth like
di date[n]-date[1] where n=obs number when x=1 the first time although we could generate another variable "indicator" which will have only single "1". In any case, I need a mechanish to get an obs no when x=1. Hope this helps...
Nick Cox
> On the last question first: the usual Stata way is to add observations
> at the end and then -sort-, although you could also -append- to a
> one-observation dataset.
>
> If -hits- is always 1, then
>
> sort date
> gen obs = _n
> su obs, meanonly
> di date[ceil(r(mean))] - date[1]
>
> I guess you will now tell us that the real data are more complicated.
On Wed, Jun 6, 2012 at 10:24 PM, tashi lama <[email protected]> wrote:
> > date hits |
> > |---------------------------|
> > 1. | 10mar2011 01:07:18 1 |
> > 2. | 10mar2011 01:09:48 1 |
> > 3. | 10mar2011 01:54:00 1 |
> > 4. | 10mar2011 02:03:37 1 |
> > 5. | 10mar2011 02:11:00 1 |
> > |---------------------------|
> > 6. | 10mar2011 02:26:00 1 |
> > 7. | 10mar2011 02:46:00 1 |
> > 8. | 10mar2011 02:47:00 1 |
> > 9. | 10mar2011 02:51:09 1 |
> > 10. | 10mar2011 02:51:24 1 |
> >
> > I need to find the time taken to get half of the total hits
> >
> > summ hits
> >
> > gen runsum=sum(hits)
> >
> > date hits x |
> > |---------------------------------|
> > 1. | 10mar2011 01:07:18 1 1 |
> > 2. | 10mar2011 01:09:48 1 2 |
> > 3. | 10mar2011 01:54:00 1 3 |
> > 4. | 10mar2011 02:03:37 1 4 |
> > 5. | 10mar2011 02:11:00 1 5 |
> > |---------------------------------|
> > 6. | 10mar2011 02:26:00 1 6 |
> > 7. | 10mar2011 02:46:00 1 7 |
> > 8. | 10mar2011 02:47:00 1 8 |
> > 9. | 10mar2011 02:51:09 1 9 |
> > 10. | 10mar2011 02:51:24 1 10 |
> >
> > Now, the prob I am having is I will be comparing r(sum) in var "x" but I need to compute in var "date". So, if r(sum)/2 is 5 then i know to subtract date[5]-date[1]. Any idea?
> >
> > Also, is it possible to add one date observation on top in date column programmatically. So, I need to add 07mar2011 03:00:00 in date column and because this date comes first than other obs in the dataset, I need to make this as my first obs.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/