|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: AW: Merging database
On Apr 29, 2009, at 12:03 PM, Nick Cox wrote:
I have various comments on this code.
1. -foreach x in V*- won't work. Eric is probably thinking of
-foreach x of var V*- but in this case -foreach v in V1 V2 V3- takes
no more thought.
-foreach x in V*- works for fine for me. It loops over all the
variables V1-V3.
2. Eric wants to -recode- missings and also concatenate the
identifiers. If so, it is easier to go
egen V_combined = concat(V1 V2 V3), p(_)
replace V_combined = subinstr(V_combined, ".", "x", .)
...
Note that using 99, even temporarily, is dangerous unless one can be
sure that 99 is not a legitimate identifier. In any case, why
recode? A variable with values like "1 . ." is a satisfactory
composite -- if that is what is needed.
Thanks, Nick...this is very helpful.
I wasn't aware of subinstr() and I couldn't get regexr() to work with
the missing value "." (hence the use of 99...but you're right about
that too, using 99 is dangerous, and in the past I've used a much
larger number that was clearly out of the range of the dataset, but
using substr() without the recode is a far better approach).
EAB
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249
Eric A. Booth
To add to Jochen's comment:
If you were hoping to have a new 'ID' variable that keeps the
information from all the ID variables V1, V2, and V3, you could create
a string variable...here are some examples:
******************
clear
input V1 V2 V3
1 . 1
2 . 2
3 3 3
4 4 . . 5 5
6 . 6
end
//
foreach x in V* {
recode `x' (.=99) // <-- So that -regexr- isn't tripped up later
tostring `x', replace
}
gen str10 v_combined = V1+"_"+V2+"_"+V3
gen v_combined2 = regexr(v_combined, "99", "x")
sencode v_combined2, gene(uniqueID) gsort(+v_combined2) label(id)
list
On Apr 29, 2009, at 11:36 AM, Jochen Späth wrote:
Hello Sergio,
I'm not quite sure of what your problem is, maybe it would help if
you were a little more precise.
Below, I assumed that the example you gave is AFTER your three data
sets have been merged, with v1 coming from the first, v2 from the
second and v3 from the third and with v1, v2 and v3 denoting all the
same ID. If this is the case you could
-replace v1 = v2 if v1 == . & v2 != .-
-replace v1 = v3 if v1 == . & v2 ==. & v3 != .-
-count if v1 == .- /* should return 0, otherwise there are
observations in your data that are not uniquely determined by either
of your three ID variables.*/
-drop v2 v3- /* of course, only if you got all IDs caught in v1 */
HTH,
Jochen
-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]
] Im Auftrag von "SERGIO M. AFCHA CHÁVEZ"
Gesendet: Mittwoch, 29. April 2009 17:55
An: [email protected]
Betreff: st: Merging database
Dear Statlisters,
I have a little problem merging a data base. I have variables for 3
years showing an ID:
V1 V2 V3
1 . 1
2 . 2
3 3 3
4 4 .
. 5 5
6 . 6
I need only one ID variable. How can I obtain one column with all
the
ID numbers?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/