Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Deleting Duplicates based on criteria
From
Katie Farrin <[email protected]>
To
[email protected]
Subject
Re: st: Deleting Duplicates based on criteria
Date
Thu, 11 Jul 2013 10:56:45 -0400
Could you change your dummies into a single variable, with the most severe
charge taking the a value of 1 and lesser charges taking higher values?
Then you could sort by these values and run:
sort caseid charge
quietly by caseid: gen dup = cond(_N==1,0,_n)
drop if dup>1
I'm sure there are more efficient ways to do this but just a suggestion.
Katie
On Thu, Jul 11, 2013 at 10:43 AM, Dirlam, Jonathan C.
<[email protected]> wrote:
> Highest charge determined by this order: 1. Homicide, 2. Sex offense, 3. Robbery, 4. Agg Assault, 5. Drug Trafficking, 6. Burglary, 7. Larceny Theft, 8. Motor Vehicle Theft, 9. Drug Sales, 10. Weapon, 11. DUI, 12. Drug Possession, 13. Other
>
> Example of data with 3 of 13 dummies:
> Court case number id robberydummy burglarydummy homicidedummy
> 000000038CFMA 6 1 0 0
> 000000038CFMA 6 1 0 0
> 000000038CFMA 6 0 1 0
> 000000045CFMA 8 1 0 0
> 000000045CFMA 8 0 0 1
>
> In this example, I want one of the robbery observations for id=6 and the homicide observation for id=8.
> Thanks.
>
> ________________________________________
> From: [email protected] [[email protected]] on behalf of Nick Cox [[email protected]]
> Sent: Thursday, July 11, 2013 10:23 AM
> To: [email protected]
> Subject: Re: st: Deleting Duplicates based on criteria
>
> Yes, but tell us the rules for determining the highest charge and
> give us a realistic example of a block of observations for some court
> case. (Need not be real, just realistic.)
> Nick
> [email protected]
>
>
> On 11 July 2013 15:18, Dirlam, Jonathan C. <[email protected]> wrote:
>> Dear Statalist,
>> I have duplicate observations where the duplicates are the same court case number. I want to eliminate all the observations for a court case except for the observation that has the highest charge (homicide, robbery, etc.) I have 12 dummy variables that capture charges and used the duplicates command to get unique ids for each court case number. Is there a way to write a program that eliminates or keeps duplicates based on criteria you give it (Example, homicidedummy==1) and stops once all but one observation are eliminated?
>> Thanks.
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/