Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: xt: unit-specific trends
From
"William Gould, StataCorp LP" <[email protected]>
To
[email protected]
Subject
Re: st: xt: unit-specific trends
Date
Thu, 19 Apr 2012 13:36:59 -0500
Laszlo <[email protected]> wrote,
> I used "if `touse'" because that is the official way to make a program
> byable (http://www.stata.com/help.cgi?byable). If there is any case
> where the -if- condition need not be checked for the entire dataset, a
> -by: - run is that, isn't it?
Laszlo is wrong in assuming that the data are necessarily sorted, and
thus -if `touse' is the official way to program this case.
The problem for -by- is that it is turning control over to a
user-written program, and it is not uncommon for user-written programs
to re-sort the data and then not put them back into the original
order. So -by- was written to accomondate that.
If you as a programmer know that the the data will still be sorted
you can convert the -if `touse'- into an -in- range by coding,
tempvar x
quietly gen long `x' = `touse'*_n
quietly sum `x', meanonly
local first = r(min)
local last = r(max)
drop `x'
In the rest of your code you can then code -in `first'/`last'- instead
of -if `touse'-.
There may be a quicker way to convert an -if `touse' into an -in- range.
This is just the first way that occurred to me.
I would still be hesitant to use -in- range instead of -if `touse'-
because I would need to be certain that every command I used in my
ado-file did not change the sort order.
Here's demonstration that of a by-able program that re-sorts the data
and yet still produces the expected results because it is coded using
-if `touse'-:
. program tryit, byable(recall)
1. di "hi"
2. syntax
3. marksample touse
4. list rep78 if `touse'
5. sort mpg
6. end
. sysuse auto, clear
(1978 Automobile Data)
. sort rep78
. by rep78: tryit
--------------------------------------
-> rep78 = 1
hi
+-------+
| rep78 |
|-------|
1. | 1 |
2. | 1 |
+-------+
--------------------------------------
-> rep78 = 2
hi
+-------+
| rep78 |
|-------|
3. | 2 |
14. | 2 |
15. | 2 |
22. | 2 |
24. | 2 |
|-------|
45. | 2 |
52. | 2 |
53. | 2 |
+-------+
<remaining output omitted>
. _
When -tryit- was called the first time to process rep78==1, the data
were in order, and we see that, as expected, the observations for
which rep78 is 1 are at the top of the dataset, namely in observations
1 and 2. Now look at the -tryit- code. -tryit-, just before exiting,
re-sorts the data!
So, the second time -tryit- is called, when -tryit- is called to
process the rep78 = 2 data, the observations will not be in order.
And we can see that iun the listing. The listing was produced by
coding -list rep78 if `touse'- and, just as one would hope, all the
observations for which `touse' contains 1 are rep78==2 observations.
This time, however, the data are no longer in order. The observations
for which `touse' is 1 are observations 3, 14, 15, 22, 24, 45, 52, and
53. It didn't matter, however, because we coded -if `touse'-.
-by- plust -tryit- still produced correct results.
Our thinking when we coded by and made the recommendation of using
-if `touse'- was that sometimes it is better to produce correct
results than to produce incorrect results more quickly.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/