Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to Correctly Structure a CSV before Loading it into STATA
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: How to Correctly Structure a CSV before Loading it into STATA
Date
Thu, 27 May 2010 10:08:32 -0400
Stephen R. Clark <[email protected]> :
No, you don't need to add any rows (AKA observations) to your file.
-tsset- your data after you load it, and use time-series operators to
define lags etc. (help tsvarlist).
On Thu, May 27, 2010 at 12:42 AM, Stephen R. Clark
<[email protected]> wrote:
> Dear Statalisters:
>
> Hello. I am a long-time member, but a first-time writer.
>
> I am using STATA/IC 10.1.
>
> I have primarily used STATA for cross-sectional analysis, but I now need to
> use it to engage in panel data analysis. Thankfully, from my reading of
> posts to this forum, I have learned that STATA has very powerful panel data
> analysis features.
>
> Now, let me get to my question. I have an unbalanced panel of data that
> consists of 20 cross-sectional units (markets). Each of these markets
> contains a different number of time-series (daily) observations. These range
> from 31 days for the shortest market to 48 days for the longest market.
>
> I currently have the data in stacked (long) form in a CSV file. I am
> dealing with "relative dates," so I am just using integer values (not actual
> dates) for the date variable. The data are, somewhat arbitrarily, organized
> in this stacked format according to alphabetical order of the cross-section
> name. To be as clear as possible, please let me specify in more detail how
> the data is arranged in the CSV file:
>
> Relative-Day Market (# of observations) Dependent Variable Independent
> Variables
>
> Under the relevant headings, I have 43 observations for "Market A." I then
> have 41 observations for "Market B," and so on until "Market T" (the 20th
> and final market), which has 40 observations.
>
> The missing data values can arguably be considered as randomly missing, so I
> am not concerned about any potential inferential problems associated with
> having an unbalanced panel. What I am concerned with is how to structure the
> data in the CSV file before importing it into STATA.
>
> Since the longest market has 48 observations, do I need to have 48 rows for
> each cross-section with blank cells where the data is missing? In other
> words, do I need to "artificially balance" the data before importing it into
> STATA? If not, then will I be fine leaving the data in stacked (long)
> format, given an unequal number of days for each of the cross-sections?
>
> In considering my question, please be advised that my analysis will involve
> the use of lagged values of the dependent variable. In other words, I will
> be conducting dynamic panel data analysis. As such, I need STATA to
> recognize the panel structure of the data and not "lag into" the values for
> the preceding cross-section.
>
> Finally, if I need to "artificially balance" the data prior to importing it
> into STATA, then should I enter the NA values at the beginning or at the end
> of the respective markets? For instance, say that I am dealing with Market
> A, which has 43 observations. With the maximum number of observations at 48,
> I would need to enter 5 NA values. Should I do this as:
>
> NA
> NA
> NA
> NA
> NA
> 43 values
>
> or as
>
> 43 values
> NA
> NA
> NA
> NA
> NA
>
> Thanks in advance for your help.
>
> Stephen Clark
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/