[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: a specific data management problem

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	RE: st: RE: a specific data management problem
Date	Wed, 23 Dec 2009 10:32:56 +0100

<>

True, my solution depended critically on the assumption that

1) every orphan, i.e. group with only one observation, should be kept. 

2) groups with more than one observation have the "total" observation on
number 1 (_n==1)

Any departure from this rule will indeed cause problems. What is the rule in
your data?

Re strings, what is the problem there? I split the strings into tokens and
used the first one to form my groups. Where does this approach lead to
errors?


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Mittwoch, 23. Dezember 2009 10:19
To: statalist
Subject: Re: st: RE: a specific data management problem

Martin,
thank you very much for you help.
There is something wrong with the solution. It seems that the variable order
generated in each id is not correct if I change the order of the data I
input.

clear
input id str20 item amount
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
1 material 1000
end

The result is as follows,which is not what I expect.


  +-----------------------------------+
  | id                  item   amount |
  |-----------------------------------|
  |  1   material includes:A      550 |
  |  1                 labor      400 |
  |  1         manufacturing      200 |
  |-----------------------------------|
  |  2      labor includes:a      300 |
  |  2      labor includes:b      200 |
  |  2              material      800 |
  |-----------------------------------|
  |  3                 labor      600 |
  |  3              material      700 |
  +-----------------------------------+



By the way,another problem is how to judge whether the value of a string
variable is the same in every group.

Thank you for any help.
Best regards,
Rose

----- Original Message -----
From: Martin Weiss <[email protected]>
To: <[email protected]>
Subject: st: RE: a specific data management problem
Date: 2009-12-23 15:51:57


<>

*******
clear
input id str20 item amount
1 material 1000
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
end


bys id: gen order=_n
split item
bys id item1 (order): egen subtotal=total((_n>1)*amount)
bys id item1:gen byte keepobs=_N==1
bys id item1: replace keepobs=_n==1 & amount!=subtotal
bys id item1 (order): gen byte first=amount[1]==subtotal[1]
bys id item1 (order): gen byte dummy=(_n!=1) & (first)
keep if keepobs | dummy
sort id order
drop item1 item2 subtotal keepobs first dummy order
l, noo sepby(id)


*******


HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Mittwoch, 23. Dezember 2009 06:48
To: statalist
Subject: st: a specific data management problem

ear statalists,
I encountered a data management problem. Let take a exerpt of my data to
clarify my problem.

clear
input id str20 item amount
1 material 1000
1 "material includes:A" 550
1 "material includes:B" 300
1 labor 400
1 manufacturing 200
2 material 800
2 labor 500
2 "labor includes:a" 300
2 "labor includes:b" 200
3 labor 600
3 material 700
end

The characteristic of the data is that in every id the item(s) for which
there are details is(are) variational.
What I expect is as follows. By id, if the sum of the detailed item equals
the related total,drop the total observation and keep the detailed ones.
Otherwise,keep the total observation and drop the detailed ones.

Specifically, the result of the above data is 
1 material 1000
1 labor 400
1 manufacturing 200
2 material 800
2 labor includes:a 300
2 labor includes:b 200
3 labor 600
3 material 700

Could anyone help me ? Thank you very much.

Best regards,
Rose.


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: RE: a specific data management problem
  - From: [email protected]

Prev by Date: Re: st: RE: a specific data management problem
Next by Date: Re: RE: st: RE: a specific data management problem
Previous by thread: Re: st: RE: a specific data management problem
Next by thread: Re: RE: st: RE: a specific data management problem
Index(es):
- Date
- Thread