Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Looping through entries in csv file
From
David Elliott <[email protected]>
To
[email protected]
Subject
Re: st: Looping through entries in csv file
Date
Tue, 14 Sep 2010 09:26:41 -0300
Gillian, so glad that the code could help. If you had to remove the
.* (which means 0 to many characters before, the '.' meaning anything)
it is probably because it is difficult for the expression to clearly
find the start of the sequence in question. I gave this some more
thought overnight and I believe that starting the regexm expression
with ".*[ ] ... would probably work if you add a space between each of
the add# lines as in `"" "+add1+" "+add2+..." "+add6"' (remember to
use compound double quotes) to ensure that the expression would always
have a space before the postal code.
Your postal code situation is the same as in Canada, then. We have a
Postal Code Conversion File
[ http://www.statcan.gc.ca/bsolc/olc-cel/olc-cel?catno=82F0086X&lang=eng ]
that does exactly that - it randomly assigns persons according to the
probability of being in one county or the other for postal codes that
straddle boundaries.
Good luck!
DC Elliott
On 14 September 2010 05:37, <[email protected]> wrote:
> All,
>
> Thank you so much for your help!
>
> David, the below code works like a dream. I had to remove the ".*" from
> the start of the expression
> ".*([a-zA-Z]([0-9]|[0-9][0-9]|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[a-zA-Z][0-9][0-9]|[a-zA-Z][0-9][a-zA-Z]))
> ...", for some reason it wasn't picking up the first letter of the
> postcode district, but it works with this removed. I was not aware of
> these regular expressions, and can see how useful they could be for future
> problems.
>
> Ada was correct in that I needed to create a list of postcode districts to
> identify and extract the postcode from the address variables, but your
> method means that this is no longer necessary and I can now just merge
> with the postcode data file.
>
> David, in answer to your question, UK postcodes are not fully within
> county boundaries. What we have done is taken the number of households in
> each postcode district and looked at the proportion that is in each
> county. We have then assigned the postcode district to the county where
> the greatest proportion of households appear. For example, 74% of
> households in the postcode district SK17 are in the East Midlands region,
> so this is the county we have assigned to SK17. There is scope for some
> misclassification using this method. However, for the actual analysis, we
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/