Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: RE: Copying the contents of existing variables into a new variable
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE: RE: Copying the contents of existing variables into a new variable
Date
Tue, 29 May 2012 18:48:01 +0100
Quite so: -max()- does not work for strings.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Lucie Vlach
Sent: 29 May 2012 18:39
To: [email protected]
Subject: RE: st: RE: RE: Copying the contents of existing variables into a new variable
Yes, the numbers are just identification numbers for various subjects, so we will never use them to calculate anything and string would be the best bet to keep the numbers looking as they are. That said, I still need to create new variables using them. I imagine then, the commands would be much different, utilizing what works well with strings.
Thank you for your points. Makes sense!
Lucie
________________________________________
From: [email protected] [[email protected]] On Behalf Of Nick Cox [[email protected]]
Sent: May 29, 2012 11:20 AM
To: '[email protected]'
Subject: RE: st: RE: RE: Copying the contents of existing variables into a new variable
What are these numbers like 699659108 and 571000128 anyway? If they are identifiers, they are probably better handled as string variables.
At the risk of rehearsing what you know:
0. Nothing restores bits that were lost somehow, except going back and entering the original data correctly into variables of a storage type sufficient to hold them correctly. People want to believe otherwise, but there's no way to put the toothpaste back in the tube.
1. -format- affects only how values are displayed. It does absolutely nothing to change what is stored. (I lobbied Bill Gould to put that in his precision guide, because it's a common misconception.)
2. -recast- just changes storage type. It can't change what is stored (without -force-). A fortiori, it's not a way of solving loss of precision.
3. As already emphasised, anything that should be -double- that ends up as -float- is likely to have lost bits. But this is under your control. It's not regarded as Stata's fault if you -generate newvar = ...- and forget to specify -double-. The punishment is that you get what you ask for, as in many folktales.
4. You could -set type double- and I've seen a view on this list that that's a good idea, but I disagree, as do all experienced users I've ever discussed this with.
Nick
[email protected]
Lucie Vlach
Well, I guess I am just anticipating problems into the future.
In the specific data we were working with here, var1 (699659108) came from the master dataset and var2 (571000128) came from an appended dataset.
The problem with accuracy was only in the master set (var1), and that is now resolved. The appended dataset had var2 and there were no problems with generating newvar. However, I noticed that the var2 (from the appended set) was a float, so I did this before generating the newvar:
recast double var2
format var2 %13.0f
But I worry that appending datasets which end up as float by default can affect the accuracy down the road.
That's why I wanted some safeguard for the append command, not just the normal:
append using "file\dataset.dta"
Nick Cox [[email protected]]
That's a rather general question. What specific difficulties have you noticed?
My experience is that -append- looks out for differences of storage types, so that -float-s will be promoted to -double-. If you have had bad experiences, what happened and what might you have (reasonably) expected?
Lucie Vlach
Thank you very much, Nick!
This: gen double newvar = max(var1, var2) if missing(var1, var2)
works beautifully!
[...]
Is there a way to assure that when appending new datasets, it will come in as double? I tried it, but it did not let me state double in the append command.
[...]
Nick Cox [[email protected]]
-generate- without a variable type uses the default numeric storage
type, which in turn defaults to -float-. (That is true unless the
argument is clearly string, which does not apply here.)
So, you have the solution, although you seem unsure about it. You need
-gen double ...- to make one double variable from another.
gen double newvar = max(var1, var2) if missing(var1, var2)
Nick
On Mon, May 28, 2012 at 8:09 PM, Lucie Vlach
<[email protected]> wrote:
> Hello both!
>
> I foolishly did not use the real data in my original example, thinking that I am simplifying, but that was a problem. Next time, real data!
>
> I do not have control over the data I am using, the data comes from another place as an extraction from someone's database, so I am using:
> insheet using "file\abc.000", delim("|") clear
>
> I tried this "file\abc.000", delim("|") clear double -- because double holds more accuracy (as per the referred article).
>
> Then just to make sure:
> recast double var1... var2
> format var1 %13.0f
>
> But that did not work either when I try to generate the new variable, it's still changing the number to 699659136 again. [Using this command: gen newvar = max(var1, var2) if missing(var1, var2)]
>
> I think my numbers are not getting imported/appended into Stata accurately in the first place? Are there problems with the insheet and anything special that needs to specified with appending datasets?
>
> However, this works, even though it may not be the right way about it:
> gen double newvar =.
> replace newvar=699659108 if var1==699659108
> replace newvar=571000128 if var2==571000128
> format md %13.0f
>
> I will need to do this with more appended datasets soon, so any advice of the best route is really appreciated.
Nick Cox [[email protected]]
> What I think you mean is that the value that was supposedly copied
> from -var1- is not identical to what is in -newvar-.
>
> As Ronnie says, this is a precision problem. You should read Bill's
> blog and consider this demonstration:
>
> . set obs 1
>
> . gen newvar1 = 699659108
>
> . gen double newvar2 = 699659108
>
> . format newvar* %13.0f
>
> . l
>
> +-----------------------+
> | newvar1 newvar2 |
> |-----------------------|
> 1. | 699659136 699659108 |
> +-----------------------+
>
>
> . format newvar* %21x
>
> . l
>
> +-----------------------------------------------+
> | newvar1 newvar2 |
> |-----------------------------------------------|
> 1. | +1.4d9f9c0000000X+01d +1.4d9f9b2000000X+01d |
> +-----------------------------------------------+
>
>
> There aren't enough bits in a -float- to hold your number exactly.
>
> I note that your original sample data did not show your problem.
On Thu, May 24, 2012 at 10:42 PM, Ronnie Babigumira <[email protected]> wrote:
>> This may be useful http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
> On Thursday, May 24, 2012 at 10:31 PM, Lucie Vlach wrote:
>>> Hello again!
>>> Thank you Nick and Ronnie!
>>> egen newvar = rowmax(var1 var2) (from Ronnie) and your confirmation of my command is very helpful!
>>>
>>> But for some reason the var1 values are changed once they end up in var3. Var1 changes from the actual value of 699659108 (correct one) to 699659136. But the var2 (value of 571000128) stays correct in var 3.
>>>
>>> Var1 and var2 come from 2 appended 2 datasets (var 1 from dataset1 and var2 from dataset2), but I made sure that both variables are recasted as double before I try the var3 creation command.
>>>
>>> I also tested this all on someone else's computer using their Stata, just to make sure my Stata is working fine. The same problem happened there, too.
>>>
>>> Do I need the vars to be in certain format before I create var3?
Nick Cox [[email protected] (mailto:[email protected])]
>>> The -generate- command you give is fine.
>>>
>>> . clear
>>>
>>> . input var1 var2
>>>
>>> var1 var2
>>> 1. 345 .
>>> 2. 345 .
>>> 3. 345 .
>>> 4. 345 .
>>> 5. . 678
>>> 6. . 678
>>> 7. . 678
>>> 8. . 678
>>> 9. end
>>>
>>> . gen newvar = max(var1, var2) if missing(var1, var2)
>>>
>>> . l
>>>
>>> +----------------------+
>>> | var1 var2 newvar |
>>> |----------------------|
>>> 1. | 345 . 345 |
>>> 2. | 345 . 345 |
>>> 3. | 345 . 345 |
>>> 4. | 345 . 345 |
>>> 5. | . 678 678 |
>>> |----------------------|
>>> 6. | . 678 678 |
>>> 7. | . 678 678 |
>>> 8. | . 678 678 |
>>> +----------------------+
>>>
>>> It should not change the value of -var1-. If that happens, your copy of Stata is corrupted.
>
> Lucie Vlach
>
>>> I need to create a new variable that will copy data from other variables (2 or more) and combine them into the new one. The existing vars will only have a number or missing value.
>>> I found something similar on this list, and I tried:
>>>
>>> gen newvar = max(var1, var2) if missing(var1, var2)
>>>
>>> But it's changing the value of var1.
>>> I would like to see the newvar look like this:
>>>
>>> SAMPLE DATA:
>>> var1 var 2 newvar
>>> 345 . 345
>>> 345 . 345
>>> 345 . 345
>>> 345 . 345
>>> . 678 678
>>> . 678 678
>>> . 678 678
>>> . 678 678
>>> ETC
>>> (I use Stata/IC 11.2)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/