Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: How to get rid of leading and trailing letters and symbols?
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: How to get rid of leading and trailing letters and symbols?
Date
Wed, 26 Oct 2011 13:38:44 +0100
I agree with Uli in recommending regular expression machinery. Given these data,
. l
+-------------------------------------+
| example |
|-------------------------------------|
1. | /profile/?id=9596986 |
2. | /profile/?id=9591886&reftype=detail |
+-------------------------------------+
-moss- (SSC) is, as mentioned very recently on this list, a wrapper for Stata's regex functions. It can give you more output than you need, but you just discard what you don't want. This finds numbers based on digits 0-9:
. moss example, match(([0-9]+)) regex
. l
+----------------------------------------------------------------+
| example _count _match1 _pos1 |
|----------------------------------------------------------------|
1. | /profile/?id=9596986 1 9596986 14 |
2. | /profile/?id=9591886&reftype=detail 1 9591886 14 |
+----------------------------------------------------------------+
and there are all sorts of ways of subdividing according to position, with or without regular expressions. A criterion for number at the end is that the last character of the string is numeric which is
. gen atend = !missing(real(substr(example,-1,1)))
. l
+-----------------------------------------------------------------------------------+
| example number~d _count _match1 _pos1 atend |
|-----------------------------------------------------------------------------------|
1. | /profile/?id=9596986 9596986 1 9596986 14 1 |
2. | /profile/?id=9591886&reftype=detail 1 9591886 14 0 |
+-----------------------------------------------------------------------------------+
Nick
[email protected]
Ulrich Kohler
you should get that using regular expressions (see help regexp). I don't
use regular expression very often in Stata, but in my favourite Editor,
Emacs, the regular expression to find a number of arbitrary length
would be
\(\[0-9]+\)
which would store the number in \1. The Stata regular expression should
work very similar.
Am Mittwoch, den 26.10.2011, 10:37 +0100 schrieb Ekaterina Hertog:
> I have got a dataset where the id variable is a part of a web-link. It
> can contain letters followed by the id number: (e.g.
> /profile/?id=9596986) or it can contain the id number in the middle
> (e.g. /profile/?id=9591886&reftype=detail). I need to create a variable
> which will only contain the number that is part of the id variable. I
> also need to be able to distinguish between the cases where the number
> is trailing vs. cases where it is in the middle. I looked at the advice
> available on removing leading or trailing 0s in Stata 11
> (http://www.stata.com/support/faqs/data/leadingzeros.html), but in my
> case I cannot actually specify the letters and symbols that lead or
> trail so I am stuck. I use Stata 11.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/