Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Extract a letter between numbers
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Extract a letter between numbers
Date
Mon, 22 Nov 2010 17:59:23 +0000
This complements mine in so far as I hinted that there might be an regex solution. But why assume that typos in the number field are limited to a-zA-Z? They might as well be almost anything!
Nick
[email protected]
Eric Booth
Probably need to take a look at regular expression matching.
Take a look at these links:
http://www.stata.com/support/faqs/data/regex.html
http://www.stata.com/meeting/wcsug07/medeiros_reg_ex.pdf
Here's a start:
********!
clear
inp str40(address)
"12e3 Main St"
"1144Re5 Oak St 77844"
"1a Broadway Ave., College Station, TX."
"11 Test St."
end
gen address2 = regexs(0) if /*
*/ regexm(address, "^[0-9a-zA-Z]*")
destring address2, replace force ignore("`c(alpha)'`c(ALPHA)'")
li
********!
On Nov 22, 2010, at 11:07 AM, Patrick McNamara wrote:
> I'm new to stata coding (been using drop-down menus for a few years),
> and I'm working on an address parser to pull apart and put back
> together people's real address apart from the mess they enter online
> :) Right now I'm trying to figure out a way to take out any letters in
> between two numbers that people have accidentally typed into their
> house address field (i.e. for 123 Main St, they types 12e3 Main St).
> The letters are not in the same position and there are multiples. I've
> tried strpos() but it won't allow me to use a range [A-Z] or [0-9].
> Any help would be greatly appreciated!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/