Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: upper limit on fweights? overflowing into missing values?
From
László Sándor <[email protected]>
To
[email protected]
Subject
Re: st: upper limit on fweights? overflowing into missing values?
Date
Thu, 1 Aug 2013 18:13:00 -0400
Thanks, Richard.
An "observation" was one security for one year. Total holdings of each
would have been the weights. I wanted to calculate how one data
source's prices were correlated with another. Lines/observations with
larger holdings were more important for me in this regard, but as the
larger holdings don't imply a more precise "average price" coming from
a larger sample, these still don't strike me as a case for analytical
weights. So I thought -fweights- would do the trick for -pwcorr-.
Again, I don't find it helpful to think of each dollar in these assets
being individual observations or not. To me this is a straw man.
But yes, there are assets with more than 2 billion held in them, many.
And I still think my confusion and ask for help was legitimate even if
the calculation obviously works after a conversion into millions (and
some rounding, which of course shows why "millions" were not really
the unit/level of observation here). Getting missing values back
confused me, as I thought there were some missing prices somewhere
(sure there are) and maybe -casewise- was not doing its job or I
missed something. I still don't follow if you think it is obvious that
a user should have known that this will only work in millions, so
Stata does not warn them.
I hope this is settled, I am glad StataCorp will help future users
with similarly underdeveloped intuition for implicit limits and thus
the right scale for the unit of observation…
Laszlo
On Thu, Aug 1, 2013 at 4:26 PM, Richard Williams
<[email protected]> wrote:
> I still don't understand what the fweights are supposed to represent, i.e.
> what is an observation in these data? If it is the dollar value of the
> portfolio, you could simply measure the value in millions of dollars rather
> than dollars. Or, if it is the number of shares of stock, it could be
> measured in 1000s of shares rather than shares. If you can be clear on what
> an observation is that might help.
>
> Like Nick, I could also see where aweights might be right. According to the
> docs, "Analytic aweights are typically appropriate when you are dealing with
> data containing averages. For instance, you have average income and average
> characteristics on a group of people. The weighting variable contains the
> number of persons over which the average was calculated (or a number
> proportional to that amount)." aweights might be used for things like states
> of the United States. Or maybe even countries. The world population is in
> the 7 billion range, so if you had one record per country with things like,
> say, average income, the aweights could be the population size. Stata should
> be able to handle that fine even thought it couldn't handle 7 billion
> records for every person in the world. If, say, you have 100 portfolios with
> a total of 100 billion shares of stock, and for each portfolio you have the
> average value of a share of stock, along with other characteristics of the
> portfolio (e.g. how managed) aweights would sound right to me.
>
> I agree that the documentation should be better and I am glad that Stata
> says it is going to work on it. But, this seems like a wildly esoteric
> problem to me. How many people have 4 billion cases? I don't think many do.
> I can see how this one has slipped through the cracks for the 25+ years
> Stata has been around.
>
> And in this case, I am not sure that you have 4 billion cases either. Again,
> if you can clarify what an observation is, that may help. If it is something
> like dollar value, that doesn't really strike me as being cases, but even if
> it is it seems easy enough to rescale into millions or thousands or whatever
> in order to make the problem manageable.
>
>
>
> At 01:35 PM 8/1/2013, László Sándor wrote:
>>
>> Thanks, Nick.
>>
>> Then maybe I have a terrible understanding of what aweights are. My
>> larger portfolios are not simply more precisely priced, they are,
>> well, larger. I think that enters a pwcorr calculation differently,
>> though maybe not.
>>
>> On semantics: I think an observation is anchored in the actual data in
>> Stata. But whether the weighting is sensible should not depend on
>> whether my dollar-by-dollar comparison uses larger numbers than an
>> investor-by-investor comparison. And I definitely disagree with the
>> notion that the current (undocumented) limits are fine because no one
>> would have this many "observations." Yes, no one would have this many
>> lines in Stata, but fweights are exactly there to talk about larger
>> populations than the aggregates in the data, and the dollar values can
>> easily get this large, even without "genetics." I would push back on
>> monetary amounts not being populations/observations so it is fine that
>> Stata silently overflows if it encounters them.
>>
>> So let's root for more documentation soon.
>>
>> On Tue, Jul 30, 2013 at 8:54 AM, Nick Cox <[email protected]> wrote:
>> > On the contrary, it seems to me that "what is an observation?" is more
>> > than semantic here: it is the nub of the issue!
>> >
>> > It's your problem but this sounds to me like a case for analytic
>> > weights. The use of frequency weights is also suspect unless the
>> > weights are integers (without artifice or rounding).
>> >
>> > As I've said or implied in earlier posts, this all should be a bit
>> > better documented.
>> > Nick
>> > [email protected]
>> >
>> >
>> > On 30 July 2013 13:34, László Sándor <[email protected]> wrote:
>> >> Thanks, Richard.
>> >>
>> >> Stata tech support got back to me and suggested something similar:
>> >> that some operations with fweights do overflow with such large
>> >> weights, others don't. I am not sure whether we shall call it
>> >> hard-coded as a restriction on some number somewhere or simply the C
>> >> implementation of -mf_quadcross- or something.
>> >>
>> >> I think I tried to describe my use case: I wanted to calculate stats
>> >> on portfolios, and it makes sense to weight by the size of them. As
>> >> pwcorr does not allow iweights, and pweights and aweights do something
>> >> completely different, I thought I'd use fweights. It blows up unless I
>> >> rescale the portfolios into thousands, millions or billions.
>> >>
>> >> Not a big deal, but Stata's (non-existent) error message, help and
>> >> documentation were not exactly helpful in resolving this. StataCorp
>> >> says they will address this.
>> >>
>> >> I think what an observation is is a semantic issue here, not very
>> >> helpful. Is an entire portfolio "one observation" or a single share in
>> >> each, or each dollar behind each? I am not sure this should matter
>> >> neither for us nor Stata.
>> >>
>> >> Best,
>> >>
>> >> Laszlo
>> >>
>> >> On Mon, Jul 29, 2013 at 9:53 AM, Richard Williams
>> >> <[email protected]> wrote:
>> >>> Just to sum up my current thinking/guesses on this:
>> >>>
>> >>> * the maximum number of observations in Stata is 2,147,483,647
>> >>> * Nonetheless, fweighted data sets can have more observations than
>> >>> that
>> >>> * However, not all routines will work when the fweighted data has more
>> >>> than
>> >>> 2,147,483,647 cases. You can do some simple descriptive things, but
>> >>> you
>> >>> can't do more complicated things like regression or correlations.
>> >>> * As to why that is, I am guessing that some routines have the
>> >>> 2,147,483,647
>> >>> limit hardcoded in. Or, maybe there just isn't enough precision to
>> >>> handle
>> >>> calculations when the N is larger than that.
>> >>> * Given that most people don't have more than 2,147,483,647 cases (and
>> >>> even
>> >>> if they did, their computer memory couldn't handle them) StataCorp
>> >>> probably
>> >>> hasn't spent a lot of time worrying about this.
>> >>> * Still, an added sentence or two in the fweights documentation or
>> >>> elsewhere
>> >>> warning about limits might be a good idea.
>> >>>
>> >>> I am curious what the original author is doing that requires analyzing
>> >>> 4
>> >>> billion+ cases. Some sort of genetic research maybe? I've certainly
>> >>> never
>> >>> heard of any kind of Survey research having an N that large.
>> >>>
>> >>>
>> >>>
>> >>> At 06:53 PM 7/28/2013, Nick Cox wrote:
>> >>>>
>> >>>> This is interesting, but in principle I don't see that Stata's limit
>> >>>> on # of observations has any bearing on how big frequency weights can
>> >>>> be. I can imagine people wanting to use frequency weights to subvert
>> >>>> the limit on number of observations.
>> >>>>
>> >>>> A different point is that if there is a limit on how big weights can
>> >>>> be it should be documented e.g. at -help limits-.
>> >>>> Nick
>> >>>> [email protected]
>> >>>>
>> >>>>
>> >>>> On 29 July 2013 00:46, Richard Williams
>> >>>> <[email protected]>
>> >>>> wrote:
>> >>>> > According to -help limits-, the maximum number of observations is
>> >>>> > 2,147,483,647. Your weights give you more than 4 billion cases,
>> >>>> > well above
>> >>>> > that. Further, the help also says that this is a theoretical
>> >>>> > maximum; memory
>> >>>> > availability will certainly impose a smaller maximum.
>> >>>> >
>> >>>> > On my computer, I specified [fw = 1073741823] on the pwcorr command
>> >>>> > and
>> >>>> > it ran. Then I specified [fw = 1073741824] and it did not run.
>> >>>> > These numbers
>> >>>> > put you just below and just above the maximum number of cases that
>> >>>> > Stata
>> >>>> > allows.
>> >>>> >
>> >>>> > So in short, it appears that your fweighted cases can't exceed the
>> >>>> > 2
>> >>>> > billion+ that Stata allows, and memory restrictions may hold you to
>> >>>> > even
>> >>>> > less than that.
>> >>>> >
>> >>>> > Also, you probably need to specify that the fweight variable is
>> >>>> > type
>> >>>> > long, e.g.
>> >>>> >
>> >>>> > input y x long fw
>> >>>> >
>> >>>> > Sent from my iPad
>> >>>> >
>> >>>> > On Jul 27, 2013, at 12:36 PM, László Sándor <[email protected]>
>> >>>> > wrote:
>> >>>> >
>> >>>> >> Hi,
>> >>>> >> If you care, here is an example that silently produces missing
>> >>>> >> values.
>> >>>> >> I notified Stata Support.
>> >>>> >>
>> >>>> >> input y x fw
>> >>>> >> 2 1 2147483621
>> >>>> >> 1 2 2147483621
>> >>>> >> end
>> >>>> >> de
>> >>>> >> pwcorr y x [fw=fw]
>> >>>> >> exit
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >>
>> >>>> >> Laszlo
>> >>>> >>
>> >>>> >> On Sun, Jul 21, 2013 at 5:08 PM, Nick Cox <[email protected]>
>> >>>> >> wrote:
>> >>>> >>> I'd suggest documenting your problems with a reproducible example
>> >>>> >>> and
>> >>>> >>> sending Stata tech support.
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> Nick
>> >>>> >>> [email protected]
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> On 21 July 2013 21:55, László Sándor <[email protected]> wrote:
>> >>>> >>>> Hi,
>> >>>> >>>> in Stata/MP 12.1 I am getting missing values with using -pwcorr-
>> >>>> >>>> with
>> >>>> >>>> -fweights- though the feature works fine with other data or if I
>> >>>> >>>> scale
>> >>>> >>>> my weights down. Is it possible to simply have too large
>> >>>> >>>> fweights,
>> >>>> >>>> e.g. if they cannot be of type -long- anymore?
>> >>>> >>>>
>> >>>> >>>> If so, why doesn't Stata warn me about this?
>> >>>> >>>>
>> >>>> >>>> I vaguely remember some Statalist of Stata blog discussion of
>> >>>> >>>> this,
>> >>>> >>>> but I could not even Google it up, and Stata still did not warn
>> >>>> >>>> me…
>> >>>> >>>>
>> >>>> >>>> Actually, why didn't Stata complain that I did not have integer
>> >>>> >>>> fweights if obviously the variable wasn't of type byte, int or
>> >>>> >>>> long?
>> >>>> >>>>
>> >>>> >>>> Thanks,
>> >>>> >>>>
>> >>>> >>>> Laszlo
>> >>>> >>>>
>> >>>> >>>> *
>> >>>> >>>> * For searches and help try:
>> >>>> >>>> * http://www.stata.com/help.cgi?search
>> >>>> >>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >>>> >>>> * http://www.ats.ucla.edu/stat/stata/
>> >>>> >>>
>> >>>> >>> *
>> >>>> >>> * For searches and help try:
>> >>>> >>> * http://www.stata.com/help.cgi?search
>> >>>> >>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >>>> >>> * http://www.ats.ucla.edu/stat/stata/
>> >>>> >>
>> >>>> >> *
>> >>>> >> * For searches and help try:
>> >>>> >> * http://www.stata.com/help.cgi?search
>> >>>> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >>>> >> * http://www.ats.ucla.edu/stat/stata/
>> >>>> >
>> >>>> > *
>> >>>> > * For searches and help try:
>> >>>> > * http://www.stata.com/help.cgi?search
>> >>>> > * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >>>> > * http://www.ats.ucla.edu/stat/stata/
>> >>>>
>> >>>> *
>> >>>> * For searches and help try:
>> >>>> * http://www.stata.com/help.cgi?search
>> >>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >>>> * http://www.ats.ucla.edu/stat/stata/
>> >>>
>> >>>
>> >>> -------------------------------------------
>> >>> Richard Williams, Notre Dame Dept of Sociology
>> >>> OFFICE: (574)631-6668, (574)631-6463
>> >>> HOME: (574)289-5227
>> >>> EMAIL: [email protected]
>> >>> WWW: http://www.nd.edu/~rwilliam
>> >>>
>> >>>
>> >>>
>> >>> *
>> >>> * For searches and help try:
>> >>> * http://www.stata.com/help.cgi?search
>> >>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >>> * http://www.ats.ucla.edu/stat/stata/
>> >>
>> >> *
>> >> * For searches and help try:
>> >> * http://www.stata.com/help.cgi?search
>> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> * http://www.ats.ucla.edu/stat/stata/
>> >
>> > *
>> > * For searches and help try:
>> > * http://www.stata.com/help.cgi?search
>> > * http://www.stata.com/support/faqs/resources/statalist-faq/
>> > * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: (574)289-5227
> EMAIL: [email protected]
> WWW: http://www.nd.edu/~rwilliam
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/