Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: Easy Question? Counting cases based on a "target" case
From
"David Radwin" <[email protected]>
To
<[email protected]>
Subject
st: RE: RE: RE: Easy Question? Counting cases based on a "target" case
Date
Wed, 26 Dec 2012 16:52:40 -0800 (PST)
OK, I'm not sure I understand, but how about this?
sysuse auto, clear
keep in 1/20
g id=_n
g pricek=int(price/1000) //I am simplifying the levels of price to
the1000's
keep id price pricek //to clean out unwanted variables
gen countnear=.
levelsof pricek, local(pricesk)
foreach p of local pricesk {
quietly count if inrange(pricek, `=`p'-2', `=`p'+2')
replace countnear = `r(N)' if pricek == `p'
display as result _newline "There are `r(N)' obs with value near
`p'."
}
The result is:
. table pricek countnear
----------------------------------
| countnear
pricek | 2 5 15 16
----------+-----------------------
3 | 5
4 | 6
5 | 4
7 | 1
10 | 1
11 | 1
14 | 1
15 | 1
----------------------------------
If you want a dataset where each observation is a different value of
pricek and the count of observations near to that value of pricek, you
could -collapse- the data afterward like this:
. collapse (first) countnear, by(pricek)
. list
+-------------------+
| pricek countn~r |
|-------------------|
1. | 3 15 |
2. | 4 15 |
3. | 5 16 |
4. | 7 5 |
5. | 10 2 |
|-------------------|
6. | 11 2 |
7. | 14 2 |
8. | 15 2 |
+-------------------+
I admit it will be harder if you want to use more than one criterion.
David
--
David Radwin
Senior Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794
www.mprinc.com
> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Ben Hoen
> Sent: Wednesday, December 26, 2012 11:58 AM
> To: [email protected]
> Subject: st: RE: RE: Easy Question? Counting cases based on a "target"
> case
>
> Thanks David, but that does not answer the question I had intended to
ask
> (though your answer gives me additional insights)
>
> Let me try to be clearer. Assume you simplified the example to the
> following, and using your suggestion for coding:
>
> sysuse auto, clear
> keep in 1/20
> g id=_n
> g pricek=int(price/1000) //I am simplifying the levels of price to the
> 1000's
> keep id price pricek //to clean out unwanted variables
> levelsof pricek, local(pricesk)
> foreach p of local pricesk {
> gen near`p' = inrange(pricek, `=`p'-2', `=`p'+2')
> }
> egen countneark = rowtotal(near*)
> drop near*
> tab pricek
> *==========================end
>
> In the above example the correct totals can be calculated based on the
> tabulate output. For the various levels of pricek I should have the
> following counts of "near" cases (assuming the individual case is
counted)
>
> pricek count of near cases
> 3 15
> 4 15
> 5 16
> 7 5
> 10 2
> 11 2
> 14 2
> 15 2
>
> This is not what is generated by the countneark variable.
>
> Further, in my real application I have over 170,000 values of the
variable
> that is being used as the criterion, and therefore it seems like it will
> be
> inefficient to develop all of the levels based on them. Finally, I
should
> add, I had envisioned using more than one criteria based on more than
one
> variables, all relative to the respective case, with which to evaluate
the
> cases to be counted. So, for example, I would use price +/- 2000 and
mpg
> +/- 3.
>
> Any additional insight would be much appreciated.
>
> Ben
>
> Ben Hoen
> LBNL
> Office: 845-758-1896
> Cell: 718-812-7589
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of David Radwin
> Sent: Wednesday, December 26, 2012 1:25 PM
> To: [email protected]
> Subject: st: RE: Easy Question? Counting cases based on a "target" case
>
> Ben,
>
> I don't think you need to loop over observations, but you can loop over
> values which is fairly efficient. Something like this:
>
>
> levelsof price, local(prices)
> foreach p of local prices {
> gen near`p' = inrange(price, `=`p'-2000', `=`p'+2000')
> }
> egen countnear = rowtotal(near*)
>
>
> In the example above I use all prices, but you could substitute the
> following line for the first and second line above:
>
> foreach p of numlist 1900 2500 4000 6500 10000 {
>
> David
> --
> David Radwin
> Senior Research Associate
> MPR Associates, Inc.
> 2150 Shattuck Ave., Suite 800
> Berkeley, CA 94704
> Phone: 510-849-4942
> Fax: 510-849-0794
>
> www.mprinc.com
>
>
> > -----Original Message-----
> > From: [email protected] [mailto:owner-
> > [email protected]] On Behalf Of Ben Hoen
> > Sent: Wednesday, December 26, 2012 10:06 AM
> > To: [email protected]
> > Subject: st: Easy Question? Counting cases based on a "target" case
> >
> > I want to perform a function that I think would be easy but I can't
wrap
> > my
> > head around how to perform it without looping through each case.
> >
> > I want to create a count of the number of records in the file that
meet
> a
> > certain criteria based on a respective case's value. So for example
> using
> > the auto dataset:
> >
> > *====================begin
> > sysuse auto, clear
> > g id=_n
> > egen nearprice2000=count(id) if... //count the number of other cases
in
> > the
> > dataset if the price of the car is within $2000 of the price of this
> > cases'
> > (i.e., target) car's price
> >
> > *====================end
> >
> > The egen command is how I thought I would resolve this, but I can't
> figure
> > it out exactly. The nearprice2000 would equal the count for each case
> of
> > the number of other cases in the dataset that have a price that is
> either
> > +/- $2000 from the particular case's price. So if the full dataset
had
> > only
> > 5 prices: 1900, 2500, 4000, 6500, and 10000, their respective
> nearprice200
> > values would be: 2, 3, 2, 2, and 1 (if itself would be included in the
> > count) or 1, 2, 1, 1, and 0 (if itself would NOT be included in the
> count)
> >
> > I might be able to do this by looping through the cases, but I know
that
> > is
> > not encouraged by other more experienced users.
> >
> > Any advice would be greatly appreciated.
> >
> > Ben
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/