The idea with replacing separators in a text editor is good, but might
be quite difficult for 600 files that Denisa has. It can be automated
of course, say with a macro in Word,
but why not to write the whole conversion in another programming environment?
The final files will not be used in Stata, will they?
2Friedrich:
insheet does not solve the problem in my view, beacuse Stata will
still be limited to 244 symbols for strings.
2Denisa: please specify exactly, what are the rules for commas and "|"
and missings in the input file.
Best regards, Sergiy
On 8/24/07, Friedrich Huebler <[email protected]> wrote:
> Denisa,
>
> Is your example an accurate representation of your data? If so, you
> have a problem because there are no delimiters around fields with
> missing data. Here is a partial answer to your question that will read
> the data into Stata, but the columns won't line up.
>
> Step 1: Open the file in a text editor and replace all occurrences of
> " comma " by "|" (without quotes). This will yield the following file:
>
> Row1
> Name1|Name2|Address1|Address2|PatClass1|PatClass2|PatClass3
> Row 2
> Name3|Name4|Name5|Address3|Address4|Address5|PatClass4
>
> Step 2: Read the file into Stata with -insheet-
>
> . insheet using test.txt, delimit("|")
> . clist, noobs
>
> v1 v2 v3 v4 v5 v6 v7
> Row1
> Name1 Name2 Address1 Address2 PatClass1 PatClass2 PatClass3
> Row 2
> Name3 Name4 Name5 Address3 Address4 Address5 PatClass4
>
> Step 3: Delete the "Row" entries.
>
> . drop if mod(_n,2)>0
> (2 observations deleted)
>
> . clist, noobs
>
> v1 v2 v3 v4 v5 v6 v7
> Name1 Name2 Address1 Address2 PatClass1 PatClass2 PatClass3
> Name3 Name4 Name5 Address3 Address4 Address5 PatClass4
>
> Step 4: Save the data as a comma-separated file.
>
> . outsheet using test.csv, comma
>
> When you open the CSV file in a text editor you see this:
>
> v1,v2,v3,v4,v5,v6,v7
> "Name1","Name2","Address1","Address2","PatClass1","PatClass2","PatClass3"
> "Name3","Name4","Name5","Address3","Address4","Address5","PatClass4"
>
> Variable v3 should have a missing value in the first observation.
> Instead it contains Address1. Variables v4 to v7 also contain wrong
> data. I do not know how you can address this problem without
> information on missing values in your original data.
>
> Friedrich
>
> On 8/23/07, Mindruta, Denisa Constanta <[email protected]> wrote:
> > Greetings!
> > I would appreciate any help on the following problem: I need to import a (.cvs) file containing several string variables that go well beyond stata limits. Is there a way to import the file, and at the same time, parse these string variables in constituent words (delimited by "|") before saving it as a stata file ?
> >
> > A simple example might help:
> > Row1
> > Name1|Name2 comma Address1|Address2 comma PatClass1|PatClass2|PatClass3
> > Row 2
> > Name3|Name4|Name5 comma Address3|Address4|Address5 comma PatClass4
> >
> > Want to get the following structure:
> > Row1
> > Name1 comma Name2 comma "missing info" comma Address1 comma Address2 comma "missing info" comma PatClass1 comma PatClass2 comma PatClass3
> > Row 2
> > Name3 comma Name4 comma Name5 comma Address3 comma Address4 comma Address5 comma PatClass4 comma "missing info" comma "missing info"
> >
> > Any suggestion on how to approach this problem? (here is just a simpe example, the text in a cell could go up to 200 words of 30 characters each, and I have 15 of these variables, and 600 files...)Thanks !
> >
> > Denisa
> > University of Illinois Urbana-Champaign
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/