Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: picking the closest source from multiple — nested cond fail

From	László Sándor <[email protected]>
To	[email protected]
Subject	st: picking the closest source from multiple — nested cond fail
Date	Thu, 8 Aug 2013 18:46:21 -0400

Hi all,

Please let me ask for some help because I cannot figure out why my
code works for some observations but not others. Maybe this is a
-cond- or precision issue and we all learnt something.

I need to use financial prices from various sources, but maddeningly,
the sources don't line up completely (there is some ambiguity about
the assets in question). I do have some reference prices for a few
years for a part of the panel. I want to price at least those assets
right.

I think have some code that calculates the average distance from each
source, and fills a variable about which came closest. This could be
subject to rounding errors, but this seems to fill in impressively
many values. And manual checks verify that the indicated source is
indeed the closest to the reference.

The next round of -cond-s should pick the value from the right source
but produces many missing values. The conditions are all the
source-indicator being equal to a single-digit integer value, which
seems to be true for the variable, e.g. the label apply to it, I can
tab it etc.

What is going on? I do have further details, but would confuse you
more only if necessary.

* Code to generate the closest source for an asset, with panel id isin

foreach v in bb ds ms fs mm {
gen fakerawpriceornav_`v'=cond(mi(rawpriceornav_`v'),2*rawpriceornav_skv,rawpriceornav_`v')
// I need this to punish a source missing when "SKV" reference still
has value, otherwise a missing difference would benefit the source by
the logic of -egen mean-
gen `v'skvdiff = abs(fakerawpriceornav_`v'-rawpriceornav_skv)
bys isin: egen mean`v'skvdiff = mean(`v'skvdiff)
}
egen minmeanskvdiff = rowmin(mean*skvdiff)

g byte closest = cond(minmeanskvdiff==meanbbskvdiff & !mi(minmeanskvdiff),1, ///
cond(minmeanskvdiff==meanmmskvdiff & !mi(minmeanskvdiff),2, ///
cond(minmeanskvdiff==meanmsskvdiff & !mi(minmeanskvdiff),3, ///
  cond(minmeanskvdiff==meanfsskvdiff & !mi(minmeanskvdiff),4, ///
  cond(minmeanskvdiff==meandsskvdiff & !mi(minmeanskvdiff),5, ///     )
.)))))

* So far not many missing values generated, though vast majority of
"closest" is simply the first value, I think this is still correct
(There are many ties, and I break the tie in favor of Bloomberg.)

la def closest 1 "Bloomberg" 3 "Morningstar" 5 "Datastream" 2
"MoneyMate" 4 "FactSet"
la val closest closest
* This verifies that labels are picked up by the integer values.

g rawpriceornav_pick = ///
cond(closest==1,rawpriceornav_bb, ///
cond(closest==2,rawpriceornav_mm, ///
cond(closest==3,rawpriceornav_ms, ///
cond(closest==4,rawpriceornav_fs, ///
cond(closest==5,rawpriceornav_ds, ///
.)))))
* But here I get back a ton of missing values. Yes, it can happen that
in my panel a source is the closest for an ISIN but has missing values
for a few years, so closest will always have less missing than this
variable. But only 20% or so of this variable gets filled here, which
is not reasonable.

Thanks!

Laszlo
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: Re: st: picking the closest source from multiple — nested cond fail
  - From: Sergiy Radyakin <[email protected]>

Prev by Date: st: Accessing Stata's parameter estimates from Python
Next by Date: st: Arellano and Bond estimation with System of Equations
Previous by thread: st: Accessing Stata's parameter estimates from Python
Next by thread: st: Re: st: picking the closest source from multiple — nested cond fail
Index(es):
- Date
- Thread