Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: expandcl and not small datasets

From	Aljar Meesters <[email protected]>
To	[email protected]
Subject	Re: st: expandcl and not small datasets
Date	Wed, 19 Dec 2012 17:34:32 +0100

Hi Thomas,

I have the same result in Stata 11.2. Apparently a variable has a type
that is not wide enough. I have modified program expandcl, which
solves your issue, but maybe you should file a bug report to Stata.
Best,

Aljar

---- Begin expcandclAjd.ado
*! Adjusted version of expandcl
program expandclAdj, sortpreserve
        version 9
        gettoken equal : 0, parse("=")
        if "`equal'" != "=" {
                local 0 `"= `0'"'
        }
        syntax =exp [if] [in],          ///
                GENerate(name)          ///
                CLuster(varlist)

        confirm new variable `generate'

        marksample touse, novarlist

        tempvar vexp oid cid eid

quietly {

        gen `vexp' `exp'

        // generate cluster id variable that contains the contiguous integers
        // 1, ..., `ncl'; where `ncl' is the number of clusters

        sort `touse' `cluster', stable
        capture by `touse' `cluster': ///
                assert int(`vexp') == int(`vexp'[1]) if `touse'
        if c(rc) {
                di as err "expression is not constant within clusters"
                exit 198
        }

        by `touse' `cluster': gen `cid' = _n==1
        replace `cid' = sum(`cid')
        local ncl = `cid'[_N]
        gen `oid' = _n

        noisily expand `exp' if `touse'

        // generate the cluster id variable that is unique between the copies
        // of the original clusters

        sort `touse' `cluster' `oid', stable
	 // changed to long!!
        by `touse' `cluster' `oid': gen long `eid' = (`cid'-1)*`ncl'+_n
        sort `touse' `cluster' `eid', stable
        drop `oid'
        by `touse' `cluster' `eid': gen `oid' = _n==1 if `touse'
        replace `eid' = sum(`oid') if `touse'
        rename `eid' `generate'

} // quietly

end

--- end expcandclAjd.ado


2012/12/19 Tomas Lind <[email protected]>:
> Hi all,
>
> Problem with -expandcl- when expanding a dataset with 30 000 rows.
>
>
> I惴 preparing a dataset for a case-crossover analysis. A CC-group may have 4 or 5 rows of data. Depending on the number of cases in this CC-group (as given by a variable "tot") I duplicate this CC-group tot number of times using -expandcl- (because each of these groups have a case in different points of time). This doesn愒 work in a large dataset. The fake data generated below illustrates the problem. Stata doesn愒 give any error message and I can愒 find any warnings or limits in the help and documentation. Does anyone have more information on this? One solution may be to split the dataset in smaller parts. Maybe there are other solutions?
>
>
> ** Generate some fake data ------------------
> clear*
> set obs 30000
> egen  double id = fill(1,1,1   2,2,2)
> gen tot=3
>
> expandcl tot, generate(id2) cluster(id)
>
> sort id2 id  , stable
> order id  id2
>
>
> ** Look at data -----------------------------
> * In the list below everything looks fine
> * id=1 has been split into 3 groups and are identified by id2
> * The same applies to id=2
> list  in 1/18          , sepby(id2)
>
> * But here id=6335 have not been split into 3 groups identified by id2
> * and the same applies to a lot of other id愀.
> list  in 57000/57018   , sepby(id2)
>
>
>
> I run Stata 11.1 on a PC with Windows
>
> /Tomas
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: expandcl and not small datasets
  - From: Tomas Lind <[email protected]>

Prev by Date: st: expandcl and not small datasets
Next by Date: Re: st: Margins in STATA 12 - how to use at () for dichotomous variables?
Previous by thread: st: expandcl and not small datasets
Next by thread: Ang: Re: st: expandcl and not small datasets
Index(es):
- Date
- Thread