Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Formatting a string variable (UK postcodes) to always be seven characters in length
From 
 
"Seed, Paul" <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
RE: st: Formatting a string variable (UK postcodes) to always be seven characters in length 
Date 
 
Wed, 5 Feb 2014 12:33:01 +0000 
Dear Statalist, 
I happen to have a complete file of UK residential postcodes (no business codes).  
As Nick Cox surmises, they remain unique after omitting central blank(s).
This implies that his advice to Catherine Tisch is sound : generate a new variable 
without the central blank in both data sets and match on these.
It's also a lot quicker than what I have been doing up until now.
Thanks again Nick.
(In case anyone wants to know, the file was obtained from http://ukbsrv-at.edina.ac.uk/ukborders/action/restricted/index; 
but access is restricted to bona fide academic institutions.)
**********************
. set mem 1g
(1048576k)
. use "C:\Paul Seed\Big data sets\IMD\Postcodes_SOA\pc_soa1_dzone1.dta"
. isid pcd
. assert wordcount( pcd_nospace) == 1
. gen pcd_nospace = subinstr( pcd, " ", "", .)
. isid  pcd_nospace
**************************
Paul T Seed, Senior Lecturer in Medical Statistics, 
Division of Women's Health, King's College London
Women's Health Academic Centre, King's Health Partners 
(+44) (0) 20 7188 3642.
> From: Nick Cox <[email protected]>
> Subject: Re: st: Formatting a string variable (UK postcodes) to always be seven
> characters in length
> 
> For readers worldwide, know that (e.g.) my home postcode is DH1 2NJ
> with a space.
> 
> If some postcodes have already lost internal spaces, putting it back
> in each case will be tricky. So, I wonder about taking them all out,
> 
> gen postcode2 = subinstr(postcode, " ", "", .)
> 
> but I have no idea whether that matches what is outside Stata,
> 
> Nick
> [email protected]
> 
> 
> On 4 February 2014 15:43,  <[email protected]> wrote:
> 
> > I'm hoping someone can help me with a formatting issue with regard to UK
> > postcodes whereby I want to create a new postcode variable that is always
> > seven characters in length from a postcode variable that has between five
> > and eight characters (e.g. 7 numbers/letters with a space = 8).  I want to
> > ultimately join my cleaned postcode variable created in Stata to a postcode
> > shapefile in ArcGIS.  The postcode shapefile variable is always seven
> > characters in length, with various combinations of letters, numbers and
> > spaces, e.g.
> >
> > LN  NLL
> > LLN NLL
> > LNN NLL
> > LLNNNLL
> > LLNLNLL
> > LNL NLL
> >
> > Where L represents a letter and N a number.  I need to clean my original
> > postcode variable as there is little consistency (e.g. LN  NLL (two spaces)
> > vs LNNLL (no space) vs LN NLL (one space).
> >
> > Are there any suggestions on how I can create a new postcode variable in
> > Stata that is always seven characters in length bearing in mind that if a
> > space/s is required, there always needs to be three characters at the end
> > after the split?  I hope this all makes sense?!
> >
> > I'm using Stata 12.1, any help much appreciated.
> *
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/