Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: comparing xtdes-like patterns for variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: comparing xtdes-like patterns for variables
Date
Thu, 1 Nov 2012 12:46:00 +0000
Sorry for previous premature send.
If you had several variables you could try something like this
local y = 0
gen long obsno = _n
qui foreach v of var <whatever> {
local ++y
gen y`y' = `y' if missing(`v')
local which : var label `v'
if "`which'" == "" local which "`v'"
local call `call' `y' "`which'"
local Y `Y' y`y'
}
scatter `Y' obsno, ms(dh ..) yla(`call', ang(h) noticks) legend(off)
>
> On Thu, Nov 1, 2012 at 1:10 AM, Nick Cox <[email protected]> wrote:
>> You could create variables like
>>
>> gen yxmiss = missing(y) - missing(x)
>> gen long obs = _n
>>
>> scatter yxmiss obs if missing(y, x)
>>
>> On Wed, Oct 31, 2012 at 7:39 PM, László Sándor <[email protected]> wrote:
>>> Thanks, Nick.
>>>
>>> The values definitely don't line up that neatly, but that's a worry
>>> for another day.
>>>
>>> Basically my problem is, if I know I can expect differences between
>>> the variables, is there a neat way to compare their missing patterns
>>> (one always starting early, or one mistakenly having the years in
>>> reverse order)?
>>>
>>> On Wed, Oct 31, 2012 at 3:15 PM, Nick Cox <[email protected]> wrote:
>>>> If # different versions of the same data should be the same, there
>>>> will be # duplicates of everything in a combined dataset.
>>>>
>>>> This applies to missings too.
>>>>
>>>> -duplicates- is therefore something that springs to mind. Panels are
>>>> no problem, as panel identifiers are just other variables
>>>>
>>>> Naturally, if the combined dataset is extremely large, this won't be
>>>> very practical. .
>>>>
>>>> Nick
>>>>
>>>> On Wed, Oct 31, 2012 at 7:02 PM, László Sándor <[email protected]> wrote:
>>>>
>>>>> I have a panel-data cleaning problem that probably has some neat
>>>>> solution, probably already out there. I am happy to try any solutions
>>>>> for Stata 12.1 MP.
>>>>>
>>>>> Background: I had to try to look up supposedly the same data from
>>>>> multiple sources. (Financial data for the same securities, but
>>>>> different data sources were expected to cover different subsets of my
>>>>> universe, or for different time periods.)
>>>>>
>>>>> But now I have a panel where I would like to cross-check different
>>>>> version of the same data, and most crucially, I would like to verify
>>>>> that I got the years correctly for each version. (FYI: financial data
>>>>> sources can be opaque about how they handle missing data if you ask
>>>>> for "end-of-year prices for the last 15 calendar years", and whether
>>>>> they give years in ascending or descending order). For this, I would
>>>>> like to compare what periods I have non-missing values for a family of
>>>>> variables, say, bloomberg_price and reuters_price.
>>>>>
>>>>> Presumably, if I got the start and the end years right, I could hope
>>>>> -compare- those, (e.g. -compare *_price_first- ). And hope that the
>>>>> patterns will be clear.
>>>>>
>>>>> That said, I'm afraid some more nuanced analysis of missing value
>>>>> patterns might be justified. What are good tools for that? (How can I
>>>>> "xtdes by variable"? Or "misstable pattern in a panel"?)
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/