Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Dropping observations with similar names (same prefix)
From
"Ben Ammar" <[email protected]>
To
[email protected]
Subject
st: Dropping observations with similar names (same prefix)
Date
Mon, 07 Mar 2011 16:58:16 +0100
Hi everybody,
Sorry for bothering you but I was wondering how to drop observations (string)that merely differ in the last letter?
For example:
City Population #household
London A 400 34
London B 300 12
London F 600 66
Hamburg B 200 54
Hamburg G 400 59
... ...
... ...
How can I drop those rows in which the Prefix (London, Hamburg)is the same,
so that I only keep the first mentioned one(London A, Hamburg B)?
Currently I do have 30,000 obs making a hand collection pretty difficult.
First I tried
.drop if name==substr(name,-2,-10) & name[_n-1]==substr(name,-2,-10)
However 0 observations are deleted so I think the "&" sign is the problem (and in addition the length of the string differs from obs to obs...probably that's causing some problem, too)therfore, I tried:
.drop if name==substr(name,-2,-10) == name[_n-1]==substr(name,-2,-10)
but that resulted in a 'type mismatch'.
Also I tried an approach like in the FAQs by creating an index for
each suffix (" A"=1," B"=2,"C" etc.). However, I'm not sure if this does does necessarily exclude all possibilities how those observations could occur.
Thanks for your help!
Ben Ammar
--
Schon gehört? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/