Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Your paper on Stata,SAS and SPSS

From	Alan Acock <[email protected]>
To	[email protected]
Subject	Re: st: Re: Your paper on Stata,SAS and SPSS
Date	Tue, 10 Aug 2010 17:41:53 -0700

On Aug 10, 2010, at Tue Aug 5 12:48 , John F Hall wrote:
> Alan
> 
> I only joined the list two days ago, so I haven't had a chance to find much Stata syntax to set alongside SPSS.  Listers have sent one or two one-liners, but with no accompanying output examples.
> I'm talking about reading from a raw data matrix, adding variable and value labels, declaring missing values, data transformations, index construction and the like (possibly via correlation) followed by simple analysis like frequency counts, barcharts and contingency tables using %%, not fancy multivariate inferential statistics.  Had I still been teaching, that would have come much later in my course, but far too late for the survey report that had to be on the client's desk by yesterday.
> You're welcome to download data sets and tutorials from my site and offer Stata examples to set alongside the SPSS syntax and output (no GUI for me: far too cumbersome, complex and tiresome).
> John Hall
> http://surveyresearch.weebly.com

John, 

To read the following you should have a fixed font, e.g., courier, and may have some problems if your email system raps lines around.

I sent one line commands because that is how simple the syntax is. Here is a complete program. The dataset is installed on your PC when you install Stata. It is called auto.dta. 

Here is the entire program:
********begin*********
sysuse auto
tab foreign
fre foreign
ttest mpg, by(foreign)
tab rep78 foreign, col chi2 V
pwcorr weight trunk headroom length price, obs sig
regress price weight trunk headroom length, beta
********end***********

Let me elaborate.

The sysuse auto installs the sample datasets that come with the Stata program.
The tab foreign does a frequency distribution--
==========
. tab foreign

  Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
  Domestic |         52       70.27       70.27
   Foreign |         22       29.73      100.00
------------+-----------------------------------
     Total |         74      100.00
==========

I prefer the frequency distribution output that SPSS has. A user wrote a command, fre, that does this. From the Stata command line you can say findit fre and follow the link to install it (with one click). Here is what you get with that command: As an SPSS person you probably also prefer this output
===========
. fre foreign

foreign -- Car type
----------------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
-------------------+--------------------------------------------
Valid   0 Domestic |         52      70.27      70.27      70.27
       1 Foreign  |         22      29.73      29.73     100.00
       Total      |         74     100.00     100.00           
---------------------------------------------------------------
===========

As an example of an independent t-test you may want to know if price is significantly different depending on whether the car is domestic (U.S.) or foreign (not U.S.). The ttest command gives you this
===========
. ttest mpg, by(foreign)

Two-sample t test with equal variances
------------------------------------------------------------------------------
  Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic |      52    19.82692     .657777    4.743297    18.50638    21.14747
Foreign |      22    24.77273     1.40951    6.611187    21.84149    27.70396
---------+--------------------------------------------------------------------
combined |      74     21.2973    .6725511    5.785503     19.9569    22.63769
---------+--------------------------------------------------------------------
   diff |           -4.945804    1.362162               -7.661225   -2.230384
------------------------------------------------------------------------------
   diff = mean(Domestic) - mean(Foreign)                         t =  -3.6308
Ho: diff = 0                                     degrees of freedom =       72

   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
Pr(T < t) = 0.0003         Pr(|T| > |t|) = 0.0005          Pr(T > t) = 0.9997

In order to do a what SPSS calls a crosstabulation of two variables and get a chi-square test and Cramer's V
you use the next one line command:
===========

. tab rep78 foreign, col chi2 V

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

   Repair |
   Record |       Car type
     1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
        1 |         2          0 |         2 
          |      4.17       0.00 |      2.90 
-----------+----------------------+----------
        2 |         8          0 |         8 
          |     16.67       0.00 |     11.59 
-----------+----------------------+----------
        3 |        27          3 |        30 
          |     56.25      14.29 |     43.48 
-----------+----------------------+----------
        4 |         9          9 |        18 
          |     18.75      42.86 |     26.09 
-----------+----------------------+----------
        5 |         2          9 |        11 
          |      4.17      42.86 |     15.94 
-----------+----------------------+----------
    Total |        48         21 |        69 
          |    100.00     100.00 |    100.00 

         Pearson chi2(4) =  27.2640   Pr = 0.000
              Cramér's V =   0.6286
==============

If you want a correlation matrix with the pairwise N and the level of significance you use the next line
==============
. pwcorr weight trunk headroom length price, obs sig

            |   weight    trunk headroom   length    price
-------------+---------------------------------------------
     weight |   1.0000 
            |
            |       74
            |
      trunk |   0.6722   1.0000 
            |   0.0000
            |       74       74
            |
   headroom |   0.4835   0.6620   1.0000 
            |   0.0000   0.0000
            |       74       74       74
            |
     length |   0.9460   0.7266   0.5163   1.0000 
            |   0.0000   0.0000   0.0000
            |       74       74       74       74
            |
      price |   0.5386   0.3143   0.1145   0.4318   1.0000 
            |   0.0000   0.0064   0.3313   0.0001
            |       74       74       74       74       74
============== 

If you want to do a simple multiple regression and get R-square, B's, beta's, etc.
==============
. regress price weight trunk headroom length, beta

     Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  4,    69) =   10.20
      Model |   236016580     4    59004145           Prob > F      =  0.0000
   Residual |   399048816    69  5783316.17           R-squared     =  0.3716
-------------+------------------------------           Adj R-squared =  0.3352
      Total |   635065396    73  8699525.97           Root MSE      =  2404.9

------------------------------------------------------------------------------
      price |      Coef.   Std. Err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
     weight |   4.753066   1.120054     4.24   0.000                 1.252435
      trunk |   114.0859   109.9488     1.04   0.303                 .1654491
   headroom |  -711.5679   445.0204    -1.60   0.114                -.2040968
     length |  -101.7092   42.12534    -2.41   0.018                -.7678236
      _cons |   11488.47   4543.902     2.53   0.014                        .
------------------------------------------------------------------------------

==============

All of these are very basic commands for a beginning course. Stata has menus where you can point and click, but you can see why many users don't bother with these. In my book on Stata I reproduce most of the sorts of commands you cover in your tutorials. The fact that you make these available at no charge for SPSS people is very nice of you. 

There are some areas where SPSS has an advantage. People doing traditional ANOVA find SPSS easier to use, for example. As far as data management goes it is a mixed thing. I work with some complex datasets so the added power of Stata is important for data management. Michael Mitchell has a great book on data management (Stata Press). Stata does use the two step process of labeling variables and some find this awkward. The advantage is that the same value labels, once defined in step one, can be applied broadly to appropriate variables.

The extensibility of Stata by users is remarkable. Some of what you see on Statalist is the code they wrote and this can be complicated even though the command is simple. For example, a user wrote a command revrs.
If I say revrs varlist (after installing the command the first time), Stata will reverse code each of the variables and reassign the value labels for them, then generate new variables with rev at the start while keeping the original variables unchanged. Some of these user written commands are extremely powerful. Scott Long, also a sociologist, wrote a one line command that runs a Poisson regression, a negative binomial regression, a zero inflated Poisson regression, and a zero inflated Negative Binomial regression. The output includes the results for each of these and a table helping you decide which model fits the data best. This would not be of much use for a beginning student, but illustrates the power of the extensibility of Stata. 

Michael mentioned the price difference and it is really dramatic. When you buy (not lease) Stata you get everything. The price is not an annual fee.

Many people still use SPSS and I hope IBM invests enough to make it a more competitive product for social science researchers. I'm concerned that their primary interest may be in the predictive analytics applications for marketing researchers, but I hope this is a mistaken concern.

--alan
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Re: Your paper on Stata,SAS and SPSS
  - From: "John F Hall" <[email protected]>
- Re: st: Re: Your paper on Stata,SAS and SPSS
  - From: Richard Williams <[email protected]>

References:
- st: Re: Your paper on Stata,SAS and SPSS
  - From: "John F Hall" <[email protected]>
- Re: st: Re: Your paper on Stata,SAS and SPSS
  - From: Alan Acock <[email protected]>
- Re: st: Re: Your paper on Stata,SAS and SPSS
  - From: "John F Hall" <[email protected]>

Prev by Date: st: data manipulation question
Next by Date: Re: st: Re: Your paper on Stata,SAS and SPSS
Previous by thread: Re: st: Re: Your paper on Stata,SAS and SPSS
Next by thread: Re: st: Re: Your paper on Stata,SAS and SPSS
Index(es):
- Date
- Thread