Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to reference results from a big dataset within a program

From	Richard Williams <[email protected]>
To	[email protected], <[email protected]>
Subject	Re: st: How to reference results from a big dataset within a program
Date	Wed, 28 Aug 2013 08:26:16 -0500

At 06:06 AM 8/28/2013, Phil Schumm wrote:

On Aug 27, 2013, at 4:25 PM, "Chen,Minxing" <[email protected]> wrote:
> Basically, in the program I submitted, I had to reference resultsfrom a big pre-simulated dataset (four variables, but around400,000 observations). In my previous submission, I simplysubmitted the pre-simulated dataset with my program, and within theprogram I called up that simulated dataset by using code such as "use c:\ado\personal\simudata". I was hoping when people downloadthe program from SSC, the pre-simulated dataset will be alsodownloaded to the directory "c:\ado\personal\".
>
> Now my reviewer indicated that I can't expect users to do that, Ican't even tell the user to place the file there because such adirectory may not be creatable for the user (e.g. they might nothave a C: drive). The reviewer suggested me to find some other wayto get the information in my pre-simulated dataset, such asincorporating the data into the program.
>
> I tried to copy of the simulated data within my program by usingsyntax such as "input x y z k", however, since there are so manyobservations (a little more than 400,000), and there are systemlimit for the maximum lines of syntax within a program (around3500), I was not able to do this way. The reviewer also mentionedthat I may use "Mata library" function, but I am pretty new toStata Mata. Is there anyone that may be able to help regarding this issue?
Basically you have two options. The first would be to deliver thedataset (i.e., .dta file) automatically along with the package. See-help usersite- or [R] net for the complete details, but essentiallyyou'll want to use "F mydata.dta" rather than "f mydata.dta" toforce the dataset to be installed in the system directories ratherthan the user's current working directory. You then call the dataset with
    sysuse mydata
This way, everything will "just work" regardless of the user's localsetup, and users don't need to know (or worry) about where the fileis located. This also makes it easy for you to update the file at alater date, if necessary.
The alternative would be to place the dataset on the web somewhere,and access it from within your code using the URL. The downside tothis is that your command won't work unless the user has an internetconnection, which would be annoying.

You learn something new every day. I would add that (a) give the dataset a name that is somewhat esoteric and unlikely to be otherwiseused, and (b) give it a name that will associate it with the programso that people don't wonder where it came from, e.g. myprog_data. Ofcourse, I would make the same advice for all the files that will be installed.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: How to reference results from a big dataset within a program
  - From: "Chen,Minxing" <[email protected]>

References:
- st: How to reference results from a big dataset within a program
  - From: "Chen,Minxing" <[email protected]>
- Re: st: How to reference results from a big dataset within a program
  - From: Phil Schumm <[email protected]>

Prev by Date: SV: st: Relative survival using matched controls
Next by Date: st: RE: outreg2 combined with xtivreg2
Previous by thread: Re: st: How to reference results from a big dataset within a program
Next by thread: RE: st: How to reference results from a big dataset within a program
Index(es):
- Date
- Thread