I agree with Nick and would personally treat the identifiers as
"string". You can separate var1 and var2 using the -substr-
function. Perhaps the failure of -infix- to read your data and the
StatTransfer's conversion of the "first column" to string are both
caused by the presence of an alpha or hidden character.
-Steve
On Jul 1, 2008, at 2:00 PM, Nick Cox wrote:
> You would be much better off reading in your identifiers either as
> string variables or as doubles. Stata can't hold 14 digit variables
> exactly in floats. This is documented in several places: -search
> precision- for some.
>
> If you input the identifiers as string, I can see no reason why you
> should also want to -encode- them.
>
> -format-ting after input will never put back precision that was
> lost on
> input. That is shutting the stable door after the horse has bolted.
>
> Nick
> [email protected]
>
> Gisella Young
>
> Thank you for the replies. I have been unable to resolve the
> problem, so
> am copying more details below as requested.
>
> The data in the original text dataset looks as follows
> 1010100100050101112101 var3 var 4...
> 1010100100050101112102 var3 var 4...
> 1010100100050101112104 var3 var 4...
> 1010100100050101112303 var3 var 4...
> 1010100100050101113101 var3 var 4...
>
> The number in the first column is actually the first 2 variables, var1
> is 14 digits and var2 is 8 digits. In the text dataset there is no
> space
> between them. Actually neither var1 nor var2 are supposed to be
> unique,
> but the combination of them is (and is in the original data).
> (Although
> they do need to be analysed separately - var1 is the person identifier
> and var2 is the activity).
>
> I am now using stat transfer to convert the file (specifying the
> option
> ASCII - Delimited). When I look at the data in the "view" option in
> stat
> transfer it looks fine. One relevant point might be that in the
> 'variables' window of stat transfer, the first variable (which is
> actually var1 and var2 which it is treating as one) is listed as
> string
> while the others are floats.
>
> The good news is that I can now make the transfer and the col1
> variable
> that comes up in Stata (of 22 digits, combining var1 and var2) is
> unique. One problem however is that when I try to encode this variable
> 'col1', it does not work as I get error message 134 (that I have tried
> to encode too many values). There are just under 1.5 million
> observations.
>
> I then tried specifying 'col1' in stat transfer as either a float or
> long variable, but neither or these work - with long all the variables
> come up in Stata as 0, and with float they are no longer unique (no
> matter how many digits I allow for when formatting the variable).
>
> I guess one option would be to convert them using Stattransfer in the
> original string format, and then find a way of encoding the variables
> (despite the problem of too many observations) and then somehow
> splitting the 'col1' variable into the 2 variables var1 (first 14
> digits) and var2 (next 8 digits).
>
>
> When I try using infix, my command is:
> ..infix var1 1-14 var2 15-22 using "filename"
>
> I then format the variables to give them enough places (format %16.0g
> var1 var2). When I sort by var1 var2, my first 3 observations are as
> follows - clearly the combination of var1 and var2 is not unique:
>
> var1 var2
> 10101000765440 1111101
> 10101000765440 1111101
> 10101000765440 1111101
>
>
> Any suggestions would be highly appreciated.
>
> regards,
> Gisella
>
>
> --- On Tue, 7/1/08, Steven Samuels <[email protected]> wrote:
>
>> From: Steven Samuels <[email protected]>
>> Subject: Re: st: problem in uploading data into Stata - data
>> "changes"
>> To: [email protected]
>> Date: Tuesday, July 1, 2008, 3:18 PM
>> Gisella,
>>
>> Show us an example of a data line and your -infix-
>> statements Also,
>> what are the item separators in your text file (commas,
>> tabs,..) ?
>> If Excel can figure out the variable columns, then
>> StatTransfer can
>> also (see ASCII input options); there is no need to go
>> through Excel.
>>
>> -Steve
>> On Jul 1, 2008, at 11:05 AM, Gisella Young wrote:
>>
>>> Dear all,
>>>
>>> I am trying to load a datafile in text format into
>> Stata. I am
>>> using the infix command. The problem is that 1 column
>> of data (the
>>> firm column, which is the unique identification number
>> for each
>>> observation, is different when I open it in Stata as
>> from what I
>>> can see in the original text file. In fact I have
>> several such text
>>> files for various years, and in every case the problem
>> is the same:
>>> all variables upload correctly except for the first
>> one. Not only
>>> is that number different but it is no longer unique to
>> each
>>> observation. It is however the same number of digits
>> as the
>>> original. I have checked that the infix command is
>> specified
>>> correctly (eg correct number of digits).
>>>
>>> I have also tried saving the text file into excel (and
>> applying
>>> text-to-columns) and then converting it into a stata
>> file using
>>> Stat-transfer. When I do this all the variable upload
>> correctly
>>> into Stata. The problem is that I cannot do this for
>> the entire
>>> files because of their size (the limits of Excel mean
>> that only a
>>> small fraction of each file can be accommodated), so
>> this is not a
>>> solution.
>>>
>>> I realise that it may be difficult for someone to
>> suggest an
>>> explanation/solution without seeing the actual data,
>> but I wonder
>>> whether there are any suggestions as to what the
>> problem might
>>> potentially be, and how to get around it?
>>>
>>> Many thanks,
>>> Gisella
>>>
>>>
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/support/faqs/res/findit.html
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/support/faqs/res/findit.html
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/