[email protected]
> I have a dataset which has customer's payment amount by
> month by year. Month
> ranges from 01 to 12 for years 2000 & 2001 and from 01 to
> 09 for the 2002. But
> all customers don't have data for each month. The dataset
> looks like the
> following.
>
> customer month year amount
> x1 01 2001 50.45
> x1 03 2001 60.00
> x2 04 2001 70.00
> x2 06 2001 80.00
>
> I would like to create a data set where each customer will have 12
> observations for years 2000 & 2001 and 9 obs. for 2002, and
> amount will be
> zero for the months they don't have any original data. I
> tried with couple of
> different ways, but didn't work. Could anyone please help me?
>
First you can create
. gen time = 12 * (year - 2000 ) + month
which runs 1 ... 33 for January 2000 ... September 2002.
Then you can
. tabulate time
and see whether all values 1 ... 33 are represented
in your data. If they aren't, you should make sure
that they are. One way to do this is to add a
pseudocustomer, whose records are complete,
. local N = _N + 33
. set obs `N'
. replace customer = "Stata Corporation" in -33/l
. replace time = _N - _n + 1 in -33/l
and then
. fillin customer time
. replace amount = 0 if missing(amount)
. drop if customer == "Stata Corporation"
The pseudocustomer should of course not have
the same name as an actual customer.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/