Thank you very much Nick for your further comment.
I will go with that approach, and then split the string variable into 2 (as it is actually 2 separate variables to begin with). In case this is useful to anyone else (unlikely I know as this is probably dead obvious to other people), I split the string variable 'col1' into the 2 original variables as follows:
..gen str var1 = substr(col1,-8,8)
..gen str var2 = substr(col1,-22,14)
And since I do need the second one numerically, I then said
..destring var2, generate(var3)
best,
Gisella
--- On Tue, 7/1/08, Gisella Young <[email protected]> wrote:
> From: Gisella Young <[email protected]>
> Subject: Re: st: problem in uploading data into Stata - data "changes"
> To: [email protected]
> Date: Tuesday, July 1, 2008, 5:47 PM
> Thank you for the replies. I have been unable to resolve the
> problem, so am copying more details below as requested.
>
> The data in the original text dataset looks as follows
> 1010100100050101112101 var3 var 4...
> 1010100100050101112102 var3 var 4...
> 1010100100050101112104 var3 var 4...
> 1010100100050101112303 var3 var 4...
> 1010100100050101113101 var3 var 4...
>
> The number in the first column is actually the first 2
> variables, var1 is 14 digits and var2 is 8 digits. In the
> text dataset there is no space between them. Actually
> neither var1 nor var2 are supposed to be unique, but the
> combination of them is (and is in the original data).
> (Although they do need to be analysed separately - var1 is
> the person identifier and var2 is the activity).
>
> I am now using stat transfer to convert the file
> (specifying the option ASCII - Delimited). When I look at
> the data in the "view" option in stat transfer it
> looks fine. One relevant point might be that in the
> 'variables' window of stat transfer, the first
> variable (which is actually var1 and var2 which it is
> treating as one) is listed as string while the others are
> floats.
>
> The good news is that I can now make the transfer and the
> col1 variable that comes up in Stata (of 22 digits,
> combining var1 and var2) is unique. One problem however is
> that when I try to encode this variable 'col1', it
> does not work as I get error message 134 (that I have tried
> to encode too many values). There are just under 1.5 million
> observations.
>
> I then tried specifying 'col1' in stat transfer as
> either a float or long variable, but neither or these work
> - with long all the variables come up in Stata as 0, and
> with float they are no longer unique (no matter how many
> digits I allow for when formatting the variable).
>
> I guess one option would be to convert them using
> Stattransfer in the original string format, and then find a
> way of encoding the variables (despite the problem of too
> many observations) and then somehow splitting the
> 'col1' variable into the 2 variables var1 (first 14
> digits) and var2 (next 8 digits).
>
>
> When I try using infix, my command is:
> ..infix var1 1-14 var2 15-22 using "filename"
>
> I then format the variables to give them enough places
> (format %16.0g var1 var2). When I sort by var1 var2, my
> first 3 observations are as follows - clearly the
> combination of var1 and var2 is not unique:
>
> var1 var2
> 10101000765440 1111101
> 10101000765440 1111101
> 10101000765440 1111101
>
>
> Any suggestions would be highly appreciated.
>
> regards,
> Gisella
>
>
> --- On Tue, 7/1/08, Steven Samuels
> <[email protected]> wrote:
>
> > From: Steven Samuels <[email protected]>
> > Subject: Re: st: problem in uploading data into Stata
> - data "changes"
> > To: [email protected]
> > Date: Tuesday, July 1, 2008, 3:18 PM
> > Gisella,
> >
> > Show us an example of a data line and your -infix-
> > statements Also,
> > what are the item separators in your text file
> (commas,
> > tabs,..) ?
> > If Excel can figure out the variable columns, then
> > StatTransfer can
> > also (see ASCII input options); there is no need to go
> > through Excel.
> >
> > -Steve
> > On Jul 1, 2008, at 11:05 AM, Gisella Young wrote:
> >
> > > Dear all,
> > >
> > > I am trying to load a datafile in text format
> into
> > Stata. I am
> > > using the infix command. The problem is that 1
> column
> > of data (the
> > > firm column, which is the unique identification
> number
> > for each
> > > observation, is different when I open it in Stata
> as
> > from what I
> > > can see in the original text file. In fact I have
> > several such text
> > > files for various years, and in every case the
> problem
> > is the same:
> > > all variables upload correctly except for the
> first
> > one. Not only
> > > is that number different but it is no longer
> unique to
> > each
> > > observation. It is however the same number of
> digits
> > as the
> > > original. I have checked that the infix command
> is
> > specified
> > > correctly (eg correct number of digits).
> > >
> > > I have also tried saving the text file into excel
> (and
> > applying
> > > text-to-columns) and then converting it into a
> stata
> > file using
> > > Stat-transfer. When I do this all the variable
> upload
> > correctly
> > > into Stata. The problem is that I cannot do this
> for
> > the entire
> > > files because of their size (the limits of Excel
> mean
> > that only a
> > > small fraction of each file can be accommodated),
> so
> > this is not a
> > > solution.
> > >
> > > I realise that it may be difficult for someone to
> > suggest an
> > > explanation/solution without seeing the actual
> data,
> > but I wonder
> > > whether there are any suggestions as to what the
> > problem might
> > > potentially be, and how to get around it?
> > >
> > > Many thanks,
> > > Gisella
> > >
> > >
> > >
> > >
> > > *
> > > * For searches and help try:
> > > *
> http://www.stata.com/support/faqs/res/findit.html
> > > * http://www.stata.com/support/statalist/faq
> > > * http://www.ats.ucla.edu/stat/stata/
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/