Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Append multiple files from .txt file with "file read"
From
Nicole Boyle <[email protected]>
To
[email protected]
Subject
Re: st: Append multiple files from .txt file with "file read"
Date
Mon, 9 Dec 2013 12:06:53 -0800
Thanks so much for the thorough explanation, Sergiy! That actually
helps tremendously. Many thanks.
Best,
Nicole
On Fri, Dec 6, 2013 at 3:18 PM, Sergiy Radyakin <[email protected]> wrote:
> Nicole,
>
> the 'enigma' you mentioned is really very simple. Locals are only
> visible within the same 'context' - e.g. within the same program or
> do-file. What happens is that when you execute your example from the
> do-editor line-by-line, Stata creates temporary do-files each time,
> and hence a new context. You can probably see these files' names in
> the output window:
> . do "C:\Users\nboyle\AppData\Local\Temp\STD1c000000.tmp"
> (or something similar)
> Hence the second line can't see the results of the first line. When
> you execute two lines together, they are in the same context, and
> hence the result of the first line is available to the second.
>
> Hope this helps,
> Best Sergiy Radyakin
>
> PS: (there is however an undocumented command c_local that allows one
> to jump across this boundary,
> http://www.stata.com/statalist/archive/2003-12/msg00385.html
> use of this command is discouraged)
>
> On Fri, Dec 6, 2013 at 5:47 PM, Nicole Boyle <[email protected]> wrote:
>> Thanks to David Radwin, Sergiy Radyakin, David Kantor, and Matt Vivier
>> for your very helpful replies!
>> I think I've identified one area where I went wrong. When I was
>> initially attempting to run my original code yesterday, I was trying
>> to run the first few lines "line-by-line" (since I'm not yet confident
>> in programming, I wanted to make sure what I wanted to happen was
>> _actually_ happening). However, it seems that the error that I
>> originally noted below:
>> ...
>> ! ls *.dta >filelist.txt
>> file open myfile using "filelist.txt", read
>> file read myfile line
>> use `line' /* ERROR HERE */
>> ...
>>
>> didn't occur today when executing the code as a single block.
>>
>> The lesson I'd LIKE to take away from this that local macros can only
>> be used within the same block of code from which they're created.
>> However, I'm not sure this is truly the case, since something simple
>> as this:
>>
>> local x="whatever"
>> display "`x'"
>>
>> CAN, in fact, be run successfully line-by-line.
>>
>>
>> Apart from this enigma, I played around with the codes each of you
>> kindly posted and it was extremely helpful. It seems that there are
>> multiple ways of accomplishing the same goal, which is great to know.
>> I ended up using David Kantor's code and replaced -append- with
>> -merge- along with options -nogenerate- and -update-.
>>
>> ...
>> ! ls *.dta >filelist.txt
>> local jj = 0
>> file open myfile using "filelist.txt", read
>> file read myfile line
>> while ~r(eof) {
>> if `"`line'"' ~= "" {
>> disp `"`line'"'
>> if ~`jj++' {
>> use `"`line'"'
>> }
>> else {
>> merge 1:1 id using `"`line'"', nogenerate update
>> }
>> }
>> file read myfile line
>> }
>>
>>
>> Thanks so much for your help and patience!
>>
>> Best,
>> Nicole
>>
>> On Thu, Dec 5, 2013 at 5:12 PM, Matt Vivier <[email protected]> wrote:
>>> Hi Nicole,
>>>
>>> If -merge- is what you're trying to do, then you were on the right track
>>> with your initial attempt to use loops. This is something I find myself
>>> doing more often than I'd like, and typically using a structure like this:
>>>
>>> drop _all
>>> local filelist : dir . files "*.dta"
>>> foreach file of local files {
>>> if _N==0{
>>> use `file'
>>> }
>>> else{
>>> merge 1:1 ID using `file'
>>> drop _merge
>>> }
>>> }
>>>
>>> Three things to look out for:
>>> 1. Make sure you -drop- _merge each time, or Stata gets very upset very
>>> quickly. I'm guilty of this a little too often.
>>> 2. After 25 of these, your screen will become a mess. Once you're
>>> comfortable with it working correctly you might think about using -qui- to
>>> suppress the output, and maybe just show a count of rows that didn't match.
>>> 3. If you have variables with the same name (but different values) in the
>>> datasets you may find yourself with some unexpected results. You would want
>>> to go through and rename the variables in each file if they matter to your
>>> end result.
>>>
>>> Best,
>>> Matt Vivier
>>> Data Analyst
>>> (203) 541-4665
>>> Remedy Partners, Inc
>>>
>>>
>>> On Thu, Dec 5, 2013 at 7:49 PM, David Kantor <[email protected]> wrote:
>>>> Hello Nicole,
>>>>
>>>> You may want to display `line' to see what you are getting.
>>>> Put in...
>>>> disp "`line'"
>>>> just before
>>>> use `line'
>>>>
>>>> How many words does it comprise?
>>>> You could be failing because there is nothing there, or because there are
>>>> multiple words.
>>>> If there are multiple words, and the file name is all of `line' (there are
>>>> embedded spaces), then you need quotation marks:
>>>> use "`line' "
>>>>
>>>> If there are embedded quotation marks, then use compound quotation marks
>>>> use `"`line' "'
>>>> -- and that is the safest way, in general.
>>>>
>>>> But if only the first word is the desired filename, then you need to select
>>>> that:
>>>> use "`=word("`line'",1)'"
>>>>
>>>> (Compound quotes may be safer:
>>>> use `"`=word(`"`line'"',1)'"'
>>>> )
>>>>
>>>> Possibly this is an important consideration; you construct the file using -!
>>>> ls-. Does that write information other that the names?
>>>> (You are presumably on Unix; I don't recall exactly what you get from -ls-.)
>>>>
>>>>
>>>> If there are blank lines in the file, you may want a filter to skip them:
>>>>
>>>> file open myfile using "filelist.txt", read
>>>> file read myfile line
>>>> while ~r(eof) & `"`line'"' == "" {
>>>> file read myfile line
>>>> }
>>>> if `"`line'"' ~= "" {
>>>> disp `"`line'"'
>>>> use `"`line'"'
>>>> file read myfile line
>>>> }
>>>> while ~r(eof) {
>>>>
>>>> append using `"`line'"'
>>>> file read myfile line
>>>> }
>>>>
>>>> I might write it a bit differently; this may be simpler:
>>>>
>>>> local jj = 0
>>>>
>>>> file open myfile using "filelist.txt", read
>>>> file read myfile line
>>>> while ~r(eof) {
>>>> if `"`line'"' ~= "" {
>>>> disp `"`line'"'
>>>> if ~`jj++' {
>>>> use `"`line'"'
>>>> }
>>>> else {
>>>>
>>>> append using `"`line'"'
>>>> }
>>>> }
>>>> file read myfile line
>>>> }
>>>>
>>>> That is, the -use- or -append- both appear inside the loop; -use- occurs on
>>>> the first pass, -append- on all subsequent passes.
>>>>
>>>> Again, pay attention to what is in `line'; you may want only part of it. The
>>>> code above presumes you want all of `line' as the filename; you will need to
>>>> modify it if you need only part.
>>>>
>>>> As for why your test loop displays the second but not the first line, I
>>>> cannot say. (I've heard of failing to get the final line, but you don't seem
>>>> to have that problem.)
>>>>
>>>> Note that your first -save master_data- is unnecessary.
>>>> HTH
>>>> --David
>>>>
>>>>
>>>>
>>>> At 06:30 PM 12/5/2013, you wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>> First and foremost, I have yet to fully understand how to use macros,
>>>>> so please forgive me if the solution to this problem is painfully
>>>>> obvious. I actually hope it's painfully obvious.
>>>>>
>>>>> I'm trying to combine multiple .dta files (1:1 horizontally appended)
>>>>> by calling several .dta filenames stored in a .txt file. However, in
>>>>> the process of doing this, whenever I try to run:
>>>>>
>>>>> . use `line'
>>>>>
>>>>> Stata returns the error:
>>>>>
>>>>> . invalid file specification
>>>>>
>>>>>
>>>>> Here's the code I'm trying to execute (sourced from here*). To start,
>>>>> I'm trying to execute this code on a .txt file containing just two
>>>>> lines (aka: two .dta filenames), but the final file will have 25
>>>>> lines:
>>>>>
>>>>> pwd
>>>>> cd ~/Desktop/merge
>>>>> ! ls *.dta >filelist.txt
>>>>> file open myfile using "filelist.txt", read
>>>>> file read myfile line
>>>>> use `line' /* ERROR HERE */
>>>>> save master_data, replace
>>>>> file read myfile line
>>>>> while r(eof)==0 {
>>>>> append using `line'
>>>>> file read myfile line
>>>>> }
>>>>> file close myfile
>>>>> save master_data, replace
>>>>>
>>>>>
>>>>> I first thought the problem was that "filelist.txt" wasn't being read.
>>>>> However, I believe it IS being read, since running the following:
>>>>>
>>>>> ! ls *.dta >filelist.txt
>>>>> file open myfile using "filelist.txt", read
>>>>> file read myfile line
>>>>> while r(eof)==0 {
>>>>> display "`=word("`line'",1)'"
>>>>> file read myfile line
>>>>> }
>>>>>
>>>>> only displays the second (but not the first) line of the two-line .txt
>>>>> file.
>>>>>
>>>>> Perhaps my issue has something to do with Stata overlooking the first
>>>>> line of the .txt file? Or perhaps my general macro-incompetence (more
>>>>> likely)?
>>>>>
>>>>> Any help will be greatly appreciated. Thanks so much for your
>>>>> consideration.
>>>>>
>>>>> Nicole
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> --
>>> The information contained in this transmission and any attachments may be
>>> confidential, proprietary or privileged, and may be subject to protection
>>> under applicable law. This transmission is intended for the sole use of the
>>> individual or entity to whom it is addressed. If you think you have
>>> received this transmission in error, please alert
>>> [email protected] and then delete this e-mail immediately.
>>> Thank you.
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/