Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify observations within a variable
From
Jeph Herrin <[email protected]>
To
[email protected]
Subject
Re: st: Identify observations within a variable
Date
Mon, 22 Apr 2013 10:40:30 -0400
Sounds like what you want is:
by artist isrc_country : egen country_sales=total(sales)
by artist (country_sales) : gen origin=isrc_country[_N]
this will assign the country that has the most sales, not the most observations, as the -origin-.
hth,
Jeph
On 4/22/2013 10:08 AM, Estrella Gomez wrote:
Hi, Nick
One additional question: I guess that
by artist: egen origin = mode (isrc_country)
where isrc_country is the country of origin, shows for each artist the
origin country that is more frequent. Is it possible to weight this
measure according to the number of total sales?
This is an example of what I mean: there are many observations in
which the number of sales is only one. Then if there are, say, 200
observations with one sale for Shakira originally recorded as from
Colombia, but one observation with 300 sales for Shakira originally
recorded as from USA, the egen command would interpret that Shakira is
from Colombia, when it is more reasonable to attribute an US origin in
this case.
Thanks a lot,
Estrella
2013/4/22 Nick Cox <[email protected]>:
As I understand it you want to replace differing values by the most
commonly occurring value. This is just the mode and the -mode()-
function of the -egen- command should suffice.
It has supported string arguments too since birth.
That aside your question is an FAQ
FAQ . . . . . . Listing observations in a group that differ on a variable
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
11/01 How do I list observations in a group that differ
on a variable?
http://www.stata.com/support/faqs/data-management/listing-observations-in-group/
Although understanding the principles there will do no harm, my guess
is that you don't need it given -egen-'s -mode().
Nick
[email protected]
On 22 April 2013 11:26, Estrella Gomez <[email protected]> wrote:
I am cleanning a music dataset and one of the problems I have is that
there are many cases in which there two different origin countries for
the same artist. For instance, Shakira appears as from USA, Colombia,
Netherlands and UK.
I want to assign one unique origin country to each artist based on the
number of records. I have 94,330,173 observations, so I can't do it
manually.
My problem is that I don't know how to tell Stata that I want to see
those cases in which there are different countries for the same
artist. Both are string variables. Once I identify those
"wrong"observations, I would select one unique country for each
artist according to the number of Total sales, which is numerical
variable
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/