Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
From
"Nick Cox" <[email protected]>
To
<[email protected]>
Subject
st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
Date
Tue, 6 Jul 2010 13:07:10 +0100
Note that Richard Boylan asked essentially the same question on 30 June:
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.1006/date/article-1650.html>
Richard's question was about string variables, not numeric variables,
but that difference is quite secondary to the main problem.
See the subsequent thread for suggestions by Martin Weiss and myself,
most conveniently
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.1007/date/article-11.html>
Two very simple morals arise:
1. Reading Statalist as well as writing to it will reveal tricks useful
to you.
2. Regardless of that, the underlying techniques are already covered in
two FAQs, named in the posting just referred to, so going directly to
the FAQs would identify a solution.
Nick
[email protected]
Martin Weiss
" I
think that the only cases where prefecture, towncode and areacode vary
while zipcodes are identical are when prefecture, towncode and areacode
are sometimes missing and sometimes not, but I would like to check that
before I do the necessary replacements."
You have to check those conditions one by one:
***********
clear*
input str10(zipcode prefecture) int(towncode areacode)
"0010027" "hokkaido" 100 1100
"0010029" "hokkaido" 100 1100
"0010029" "" .
.
"0010030" "hokkaido" 100 1100
"0200822" "iwate" 201 3201
"0200823" "" . .
"0200823" "iwate" 201 3201
"0200831" "iwate" 201 3201
end
compress
li, noo sepby(zipcode)
bys zipcode: gen byte prefvaries=prefecture[1]!=prefecture[_N]
by zipcode: gen byte townvaries=towncode[1]!=towncode[_N]
by zipcode: gen byte areavaries=areacode[1]!=areacode[_N]
by zipcode: egen missings=total(mi(prefecture,towncode, areacode))
by zipcode: gen byte onlysomemiss=missings!=_N & missings!=0
drop missings
//all conditions fulfilled?
gen byte complies=prefvaries+townvaries+areavaries+onlysomemiss==4
li, noo sepby(zipcode) ab(15)
***********
Ekaterina Hertog
I have some data which looks like this
zipcode prefecture towncode areacode
0010027 hokkaido 100 1100
0010029 hokkaido 100 1100
0010029 . . .
0010030 hokkaido 100 1100
0200822 iwate 201 3201
0200823 . . .
0200823 iwate 201 3201
0200831 iwate 201 3201
I use Stata 11.
I would like to make my observations identical in terms of prefecture,
towncode and areacode when they are identical in terms of zipcode. I
think that the only cases where prefecture, towncode and areacode vary
while zipcodes are identical are when prefecture, towncode and areacode
are sometimes missing and sometimes not, but I would like to check that
before I do the necessary replacements.
I looked into duplicate commands, but did not seem to find a good
solution. I would be most grateful for any pointers.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/