Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping through entries in csv file

From	David Elliott <[email protected]>
To	[email protected]
Subject	Re: st: Looping through entries in csv file
Date	Tue, 14 Sep 2010 09:26:41 -0300

Gillian, so glad that the code could help.  If you had to remove the
.* (which means 0 to many characters before, the '.' meaning anything)
it is probably because it is difficult for the expression to clearly
find the start of the sequence in question.  I gave this some more
thought overnight and I believe that starting the regexm expression
with ".*[ ] ... would probably work if you add a space between each of
the add# lines as in `"" "+add1+" "+add2+..." "+add6"' (remember to
use compound double quotes) to ensure that the expression would always
have a space before the postal code.

Your postal code situation is the same as in Canada, then.  We have a
Postal Code Conversion File
[ http://www.statcan.gc.ca/bsolc/olc-cel/olc-cel?catno=82F0086X&lang=eng ]
that does exactly that - it randomly assigns persons according to the
probability of being in one county or the other for postal codes that
straddle boundaries.

Good luck!

DC Elliott

On 14 September 2010 05:37,  <[email protected]> wrote:
> All,
>
> Thank you so much for your help!
>
> David, the below code works like a dream.  I had to remove the ".*" from
> the start of the expression
> ".*([a-zA-Z]([0-9]|[0-9][0-9]|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[a-zA-Z][0-9][0-9]|[a-zA-Z][0-9][a-zA-Z]))
> ...", for some reason it wasn't picking up the first letter of the
> postcode district, but it works with this removed.  I was not aware of
> these regular expressions, and can see how useful they could be for future
> problems.
>
> Ada was correct in that I needed to create a list of postcode districts to
> identify and extract the postcode from the address variables, but your
> method means that this is no longer necessary and I can now just merge
> with the postcode data file.
>
> David, in answer to your question, UK postcodes are not fully within
> county boundaries.  What we have done is taken the number of households in
> each postcode district and looked at the proportion that is in each
> county.  We have then assigned the postcode district to the county where
> the greatest proportion of households appear.  For example, 74% of
> households in the postcode district SK17 are in the East Midlands region,
> so this is the county we have assigned to SK17.  There is scope for some
> misclassification using this method.  However, for the actual analysis, we

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Looping through entries in csv file
  - From: David Elliott <[email protected]>
- Re: st: Looping through entries in csv file
  - From: [email protected]

Prev by Date: st: problems with Hausman Test
Next by Date: Re: st: How to draw axes in the interior of the plot?
Previous by thread: Re: st: Looping through entries in csv file
Next by thread: Re: st: Looping through entries in csv file
Index(es):
- Date
- Thread