Devra Golbe
>
> the following code fragment suggests that egen ends does
> not operate as
> the documentation (pasted at the bottom of my message)
> suggests. Specifically, with the tail option, if no punct(pchars)
> appears, the tail should be empty. However, my output
> suggests that under
> these circumstances (no appearance of pchars) the tail
> contains the entire
> "word".
>
> Thanks in advance for your insights.
>
> Devra
>
> *******
> . which egen
> C:\STATA7\ado\updates\e\egen.ado
> *! version 3.3.2 07may2001
>
>
> . egen consid1=ends(consid), punct(*) head;
>
> . egen considt=ends(consid), punct(*) tail;
>
> . egen consid2=ends(considt), punct(*) head;
>
> . egen considt2=ends(considt), punct(*) tail;
>
> . list consid consid1 consid2 considt2 in 1/10;
>
> consid consid1 consid2
> considt2
> 1. CASH*NOTE*LIA CASH NOTE
> LIA
> 2. CASH CASH CASH
> CASH
> 3. CASH CASH CASH
> CASH
>
> *******
>
>
>
> ends(strvar) [, punct(pchars) trim [head|tail|last]]
> may not be combined with by. It gives the first
> "word" or head (with
> the head option), last
> "word" (with the last option), or the remainder or
> tail (with the tail
> option) from string
> variable strvar.
>
> head, last and tail are determined by the occurrence
> of pchars, which
> is by default a single
> space " ".
>
> [snip]
>
> The remainder or tail is whatever follows the first
> occurrence of
> pchars, which will be the empty
> string "" if it does not occur. The tail of "frog
> toad newt" is "toad
> newt" and of "frog" is "".
> With punct(,), the tail of "frog,toad" is "toad".
I think Devra is right.
The function -egen, ends() tail- has a history.
There was (and is) a user-written function,
-egen, tail()-, published in STB-50 in 1999, and indeed
earlier as a function within the -egenodd- package
from SSC. I was the author.
In Stata 7, -tail()- was implemented as a part
of a new -egen- function, -ends()-, which Devra
is using.
However, the behavior was changed. I don't recall
any discussion of this. Whatever, the documentation
of this option for the function was copied
essentially unchanged from the STB.
I agree with Devra that the documentation does not
match the behaviour.
In a nutshell, here is the different behaviour:
mystr tail7 tailSTB
1. frog frog
2. frog toad toad toad
The code implementation of "tail" in Stata 7
is
substr(<strvar>,`index'+`plen',.)
where `index' gives the position
of the first occurrence of pchars,
by default a space, and `plen' is
the length of pchars, which will be 1
for a single space. Thus whenever
pchars does not occur within the
string variable <strvar>, this
evaluates as
substr(<strvar>,1,.)
as Devra observed.
I regard
substr(<strvar>,`index'+`plen',.)
as a nice defintion of a tail, except
that
1. it is not matched by the manual
documentation
2. it does not match the original
intent that "head" and "tail" are
disjoint, that head and tail
put together make up the string,
apart from the punctuation which
acts a kind of neck.
2 is just a matter of history,
except that the terminology I used, which
was adopted within official Stata,
was meant to be vivid and helpful
in indicating what the functions
to do.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/