Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Looping through entries in csv file
From
David Elliott <[email protected]>
To
[email protected]
Subject
Re: st: Looping through entries in csv file
Date
Mon, 13 Sep 2010 13:37:21 -0300
Another method would be to use a regular expression to extract the
postal code information from across all the address fields:
*==========================
// Extract postal code from address lines
gen str8 postcode = ""
gen str4 postdistrict = ""
quietly forvalues i = 1/`c(N)' {
if regexm(add1[`i']+add2[`i']+add3[`i']+add4[`i']+add5[`i']+add6[`i'],
!! ".*([a-zA-Z][a-zA-Z][0-9][0-9])[ ]?([0-9][a-zA-Z][a-zA-Z]).*") != 0 {
replace postcode = upper(regexs(1))+" "+upper(regexs(2)) in `i'
replace postdistrict = upper(regexs(1)) in `i'
}
}
*===========================
(Note: long line wrapped at !! )
My experience with postal code information is that codes can be
embedded in address lines and may variably be run together ( SK179JN )
irregularly capitalized ( sk17 9jn ) or subjected to some other abuse.
Regular expressions, once you have them tuned properly, are very good
at extracting patterned text and turning it into consistent data.
As an aside
It would be nice if one could -generate- directly using a -cond()-
function with the -regexm()- instead of running through the file
record by record but I have never found a way to make it work with the
regular expressions functions.
DC Elliott
On 13 September 2010 11:44, Neil Shephard <[email protected]> wrote:
> On Mon, Sep 13, 2010 at 2:37 PM, Ada Ma <[email protected]> wrote:
>> Open up the postcode-county file using stata or excel.
>>
>> I suppose the postcode sectors are all in a column. Select and copy it.
>>
>> Go into excel if you aren't already in excel. Right click, choose
>> "paste special". Transpose the postcode. Now the postcodes all in
>> one row. Copy it and paste into a do file. Edit them so that they
>> looks nice (this step is optional).
>
> A more direct method would be to do this in Stata. Something along
> the lines of....
>
> File with full Postcodes in is called postcodes.dta with the full
> postcode held in the variable 'postcode'
>
> use postcodes, clear
> /* Get the first 3 characters of the postcode into a new variable */
> gen sub_postcode = substr(postcode, 1, 3)
> /* Store all of these unique values in a local macro using -levelsof- */
> levelsof sub_postcode, local(postcode_districts)
> di "`postcode_districts'"
> /* Store all of these unique values in a global macro using -levelsof- */
> levelsof sub_postcode, global(postcode_districts)
> di "$postcode_districts"
>
> (Use local or global macros depending on what you want to do with the
> macrolist and how you're structuring/organising your do-files).
>
> Neil
>
> --
> "One should as a rule, respect public opinion in so far as is
> necessary to avoid starvation and to keep out of prison, but anything
> that goes beyond this is voluntary submission to an unnecessary
> tyranny, and is likely to interfere with happiness in all kinds of
> ways..." - Bertrand Russell
>
> Email - [email protected]
> Website - http://slack.ser.man.ac.uk/
> Photos - http://www.flickr.com/photos/slackline/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
David Elliott
Everything is theoretically impossible, until it is done.
Progress is made by lazy men looking for easier ways to do things.
-- Robert A. Heinlein (American science-fiction Writer, 1907-1988)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/