It is not what was asked for but you
could hold the same information in another
way:
collapse (sum) revenue, by(customer)
gen mult999 = floor(revenue/999)
gen surplus = mod(revenue,999)
There doesn't seem any special virtue
in holding lots of 999s with arbitrary
identifiers.
Nick
[email protected]
Phil Schumm
>
> At 5:29 PM -0400 9/27/04, MITRA PINAKI (MAR1PXM) wrote:
> > I have a dataset on multiple customer's revenue
> information like the
> >following:
> >
> >Id customer revenue
> >Z11 xx1 y1
> >Z21 xx1 y2
> >Z31 xx1 y3
> >Z12 xx2 k1
> >Z22 xx2 k2
> >Z32 xx2 k3 and so on.
> >
> >I need to create a new data based on the following criteria.
> If the sum of
> >revenue by customer is <=999, I need to write
> >Customer revenue
> >xx1 sum of y
> >xx2 sum of k
> >
> >But if the sum of revenue by customer is >999 and <=1998, I
> need to write
> >Id Customer revenue
> >001 xx1 999
> >002 xx1 sum of y-999
> >001 xx2 999
> >002 xx2 sum of k-999
> >
> >and if the sum of revenue by customer is >1998 and <=2997, I
> need to write
> >Id Customer revenue
> >001 xx1 999
> >002 xx1 999
> >003 xx1 sum of y-1998
> >001 xx2 999
> >002 xx2 999
> >003 xx2 sum of k-1998
> >
> >Can anyone please provide me any help on this?
>
>
> Here's one alternative:
>
>
> collapse (sum) revenue, by(customer)
> expand ceil(revenue/999)
> bys customer: replace revenue = ((_n==_N)*mod(revenue,999)) +
> ((_n!=_N)*999) if 999<revenue
>
>
> Briefly, -collapse- computes the total revenue for each customer,
> resulting in a dataset with one record per customer. -expand- then
> creates the additional records (e.g., if a customer's total is 2000,
> this will create two additional records for a total of 3). Finally,
> the last line replaces revenue either with the value 999 (all records
> but the last one for each customer) or with the remainder once the
> multiples of 999 are accounted for (last record for each customer).
> If you wish, you could then generate your ids for each record
> (consecutive integers within customer in leading-zero format) using
> the following:
>
>
> gsort + customer - revenue
> by customer: gen id = string(_n,"%03.0f")
>
>
> Note that with this solution, records with missing values for revenue
> in the original dataset will essentially be ignored. However, if a
> customer has nothing but missing values in the original dataset, the
> resulting dataset will contain a single record with a revenue of zero
> (this is because -collapse- treats missing values as zeros). If this
> is not what you want, you'll need to modify the code accordingly.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/