Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to reference results from a big dataset within a program
From
Richard Williams <[email protected]>
To
[email protected], <[email protected]>
Subject
Re: st: How to reference results from a big dataset within a program
Date
Wed, 28 Aug 2013 08:26:16 -0500
At 06:06 AM 8/28/2013, Phil Schumm wrote:
On Aug 27, 2013, at 4:25 PM, "Chen,Minxing" <[email protected]> wrote:
> Basically, in the program I submitted, I had to reference results
from a big pre-simulated dataset (four variables, but around
400,000 observations). In my previous submission, I simply
submitted the pre-simulated dataset with my program, and within the
program I called up that simulated dataset by using code such as "
use c:\ado\personal\simudata". I was hoping when people download
the program from SSC, the pre-simulated dataset will be also
downloaded to the directory "c:\ado\personal\".
>
> Now my reviewer indicated that I can't expect users to do that, I
can't even tell the user to place the file there because such a
directory may not be creatable for the user (e.g. they might not
have a C: drive). The reviewer suggested me to find some other way
to get the information in my pre-simulated dataset, such as
incorporating the data into the program.
>
> I tried to copy of the simulated data within my program by using
syntax such as "input x y z k", however, since there are so many
observations (a little more than 400,000), and there are system
limit for the maximum lines of syntax within a program (around
3500), I was not able to do this way. The reviewer also mentioned
that I may use "Mata library" function, but I am pretty new to
Stata Mata. Is there anyone that may be able to help regarding this issue?
Basically you have two options. The first would be to deliver the
dataset (i.e., .dta file) automatically along with the package. See
-help usersite- or [R] net for the complete details, but essentially
you'll want to use "F mydata.dta" rather than "f mydata.dta" to
force the dataset to be installed in the system directories rather
than the user's current working directory. You then call the dataset with
sysuse mydata
This way, everything will "just work" regardless of the user's local
setup, and users don't need to know (or worry) about where the file
is located. This also makes it easy for you to update the file at a
later date, if necessary.
The alternative would be to place the dataset on the web somewhere,
and access it from within your code using the URL. The downside to
this is that your command won't work unless the user has an internet
connection, which would be annoying.
You learn something new every day. I would add that (a) give the data
set a name that is somewhat esoteric and unlikely to be otherwise
used, and (b) give it a name that will associate it with the program
so that people don't wonder where it came from, e.g. myprog_data. Of
course, I would make the same advice for all the files that will be installed.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/