Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: specifying all values different condition
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: specifying all values different condition
Date
Thu, 13 Feb 2014 12:05:24 +0000
Not so with your syntax, I think. If all possible pairs are unequal
then no pair of values can be equal.
With three variables,
... if (var1 == var2) | (var1 == var3) | (var2 == var3)
is a way of checking for any matches.
The more general (and interesting!) question remains. First off,
presumably either all the variables are numeric or all are string;
otherwise comparison makes no sense, to Stata at least. (Exceptions
surely require -destring- to -tostring- to put variables into
comparable storage types.)
In my files I find programs defining two -egen- functions, one for
numeric and one for string variables.
Here is _grownvals.ado for finding the number of distinct values in
each observation of a bunch of numeric variables. If the number of
distinct values is fewer than the number of variables, matches have
been found. I have hacked at the indentation to reduce the chance of
misreading this.
* program begins
* NJC 1.0.1 28 Jan 2009
* NJC 1.0.0 7 Jan 2009
program _grownvals
version 9
gettoken type 0 : 0
gettoken h 0 : 0
gettoken eqs 0 : 0
syntax varlist(numeric) [if] [in] [, BY(string) MISSing]
if `"`by'"' != "" {
_egennoby rownvals() `"`by'"' /* NOTREACHED */
}
marksample touse, novarlist
local miss = "`missing'" != ""
quietly {
mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss')
}
end
mata :
void row_nvals(string scalar varnames,
string scalar tousename,
string scalar nvalsname,
string scalar type,
real scalar miss)
{
real matrix y
real colvector nvals, row
st_view(y, ., tokens(varnames), tousename)
nvals = J(rows(y), 1, .)
if (miss) {
for(i = 1; i <= rows(y); i++) {
row = y[i,]'
nvals[i] = length(uniqrows(row))
}
}
else {
for(i = 1; i <= rows(y); i++) {
row = y[i,]'
nvals[i] = length(uniqrows(select(row, (row :< .))))
}
}
st_addvar(type, nvalsname)
st_store(., nvalsname, tousename, nvals)
}
end
* program ends
You would need to put this in a file _grownvals.ado along your
-adopath- and call by
egen nrowvals = rownvals(<varlist>)
and then compare the number of distinct values with the number of
variables. The algorithm here is pretty lousy and I would be very
happy to learn of a smarter one.
Numeric missings are ignored by default. The option -missing- overrides that.
This function was mentioned in a review of technique in this
territory, which should be of interest to anyone interested in this
question.
SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
(help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
Q1/09 SJ 9(1):137--157
shows how to exploit functions, egen functions, and Mata
for working rowwise; rowsort and rowranks are introduced
That is easy to access at
http://www.stata-journal.com/article.html?article=pr0046 (except at
the moment I write I find no connection).
It's stated in that paper that the functions are in -egenmore- (SSC).
That was an intention never carried out, but no one has asked for this
kind of thing before now.
If you want the function for string variables, please ask.
Nick
[email protected]
On 13 February 2014 11:12, Viktor Emonds <[email protected]> wrote:
> I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like:
>
> if var1!=var2 & var1!=var3 & var2!=var3
>
> How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/