Dear all,
Thank you very much for all the comments.
The real structure of my data is as I stated originally, rather than as
Martin described it:
e.g.
input id mindesinc_500_999 mindesinc_1000_1499 mindesinc_1500_1999
101 1 1 1 1
102 0 1 1 1
103 0 0 1 1
104 0 0 1 1
105 0 1 1 1
Basically once the person finds a certain income acceptable he or she
finds every income above acceptable too and puts 1s, rather than 0s.
At the moment I used one of the earlier advices I got with a bit of a
modification
gen mindesinc=0
replace mindesinc=7000 if desired_income_above_7000==1
replace mindesinc=6000 if desired_income_6000_6999==1
replace mindesinc=5000 if desired_income_5000_5999==1
replace mindesinc=4500 if desired_income_4500_4999==1
replace mindesinc=4000 if desired_income_4000_4499==1
replace mindesinc=3500 if desired_income_3500_3999==1
replace mindesinc=3000 if desired_income_3000_3499==1
replace mindesinc=2500 if desired_income_2500_2999==1
replace mindesinc=2000 if desired_income_2000_2499==1
replace mindesinc=1500 if desired_income_1500_1999==1
replace mindesinc=1000 if desired_income_1000_1499==1
replace mindesinc=0 if desired_income_0_999==1 | no_desired_income==1
I know it is not very elegant, but I thought this would pick up the
lowest acceptable income.
I was wondering if anyone would have any thoughts on a related question
which is not specifically on Stata.
Apart from variable on minimum desired income my dataset (it is a
dataset from a marriage agency) contains a host of variables on desired
height, weight, marital status etc.
In particular I am now thinking about the variables of minimum desired
height. They are different from the income variables in that for many
people they describe a desired minimum and a maximum. So the data looks
something like that:
input id mindesheigh_below_150 mindesheigh_150_154
mindesheight_155_159 mindesheight_160_164
mindesheight_165_169 ... mindesheight_above_180
101 1 1 1 1 0
102 0 1 1 1 0
103 0 0 1 0 0
104 0 0 1 1 0
105 0 1 1 1 1
I am restructuring these variables into two: minimum desired height and
maximum desired height. I am not sure how to treat the minimum desired
height below 150 in the variable for minimum desired height and maximum
desired height of above 180 in the variable for maximum desired height
since I cannot really input metric values into them. There are few
people who have such preferences so if I cannot find a good way of
dealing with those I could consider simply dropping the observations in
question, but I was wondering if anyone has a good idea or knows of a
paper which came up with a good solution to such an issue?
I would be very grateful for advice,
sincerely yours,
Ekaterina
Jeph Herrin wrote:
Well, on inspection, I see that her data have multiple
tags per record, so that 1s are filled to the right after
the first (left to right) 1; I was misled by
Martin's faux dataset. Her stated logic would then
require:
gen min=mininc1*500+(mininc2-mininc1)*1000+(mininc3-mininc2)*1500
Pending clarificaiton from Ekaterina about the _real_ structure
of her data..
Jeph
Nick Cox wrote:
Good!
I didn't spell out that I feared that there are yet other variables in
what might be Ekaterina's _real_ problem, lurking behind her stated
problem, making a more general approach attractive too.
Nick [email protected]
Jeph Herrin
Briefer yet:
gen min=mininc1*500+mininc2*1000+mininc3*1500
which also traps the missings Nick cautions about.
Nick Cox wrote:
A variation on the same idea:
gen min = 500
foreach v in 1000 1500 2000 { replace min = `v' if mindesinc_`v'
== 1 }
To be careful, check
egen row = rowtotal(mindesinc*) assert row == 1
Nick [email protected]
Martin Weiss
*reconstruct Ekaterina`s data
clear*
input id mindesinc_500_999 mindesinc_1000_1499 mindesinc_1500_1999
101 1 0 0 0
102 0 1 0 0
103 0 0 1 0
104 0 0 1 0
105 0 1 0 0
end
*construct the minimum desired income
g mindesinc=500 if mindesinc_500_999
replace mindesinc=1000 if mindesinc_1000_1499
replace mindesinc=1500 if mindesinc_1500_1999
l
Ekaterina Hertog
I am dealing with a dataset from a private company and so my data
often
comes in rather strange format and I now came against the following
problem:
I have a set of individuals who answered questions about desired
income.
It looks as follows:
Individ nmb | Min desired income 500 - 999 | 1000 - 1499 | 1500 -
2000
|
101 | 0
| 0 | 1 |
102 | 0
| 1 | 1 |
103 | 0
| 0 | 1 |
104 | 1
| 1 | 1 |
105 | 0
| 1 | 1 |
Is there a way to automatically recode these binary minimum desired
income variables into a numerical variable which would state the
minimum
acceptable figure for each individual?
That is some routine which would check "Min desired income 500 -
999" and if it equals 1 then would input 500 for the individual in
question
into a newly constructed variable "Minimum acceptable income" and move
on to the next person and if it equals 0 would look at the value of
"1000 - 1499" variable and if it equals 1 would input 1000 for that
person and move on to the next person and if it is 0 would look at
"1500
- 2000" variable?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Ekaterina Hertog (née Korobtseva)
Career Development Fellow
Department of Sociology and Nissan Institute of Japanese Studies
University of Oxford
27 Winchester Road
Oxford
OX2 6NA
United Kingdom
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/