Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Formatting a string variable (UK postcodes) to always be seven characters in length
From
"Seed, Paul" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Formatting a string variable (UK postcodes) to always be seven characters in length
Date
Wed, 5 Feb 2014 12:33:01 +0000
Dear Statalist,
I happen to have a complete file of UK residential postcodes (no business codes).
As Nick Cox surmises, they remain unique after omitting central blank(s).
This implies that his advice to Catherine Tisch is sound : generate a new variable
without the central blank in both data sets and match on these.
It's also a lot quicker than what I have been doing up until now.
Thanks again Nick.
(In case anyone wants to know, the file was obtained from http://ukbsrv-at.edina.ac.uk/ukborders/action/restricted/index;
but access is restricted to bona fide academic institutions.)
**********************
. set mem 1g
(1048576k)
. use "C:\Paul Seed\Big data sets\IMD\Postcodes_SOA\pc_soa1_dzone1.dta"
. isid pcd
. assert wordcount( pcd_nospace) == 1
. gen pcd_nospace = subinstr( pcd, " ", "", .)
. isid pcd_nospace
**************************
Paul T Seed, Senior Lecturer in Medical Statistics,
Division of Women's Health, King's College London
Women's Health Academic Centre, King's Health Partners
(+44) (0) 20 7188 3642.
> From: Nick Cox <[email protected]>
> Subject: Re: st: Formatting a string variable (UK postcodes) to always be seven
> characters in length
>
> For readers worldwide, know that (e.g.) my home postcode is DH1 2NJ
> with a space.
>
> If some postcodes have already lost internal spaces, putting it back
> in each case will be tricky. So, I wonder about taking them all out,
>
> gen postcode2 = subinstr(postcode, " ", "", .)
>
> but I have no idea whether that matches what is outside Stata,
>
> Nick
> [email protected]
>
>
> On 4 February 2014 15:43, <[email protected]> wrote:
>
> > I'm hoping someone can help me with a formatting issue with regard to UK
> > postcodes whereby I want to create a new postcode variable that is always
> > seven characters in length from a postcode variable that has between five
> > and eight characters (e.g. 7 numbers/letters with a space = 8). I want to
> > ultimately join my cleaned postcode variable created in Stata to a postcode
> > shapefile in ArcGIS. The postcode shapefile variable is always seven
> > characters in length, with various combinations of letters, numbers and
> > spaces, e.g.
> >
> > LN NLL
> > LLN NLL
> > LNN NLL
> > LLNNNLL
> > LLNLNLL
> > LNL NLL
> >
> > Where L represents a letter and N a number. I need to clean my original
> > postcode variable as there is little consistency (e.g. LN NLL (two spaces)
> > vs LNNLL (no space) vs LN NLL (one space).
> >
> > Are there any suggestions on how I can create a new postcode variable in
> > Stata that is always seven characters in length bearing in mind that if a
> > space/s is required, there always needs to be three characters at the end
> > after the split? I hope this all makes sense?!
> >
> > I'm using Stata 12.1, any help much appreciated.
> *
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/