A split line in the code was indeed the problem. The files you sent
work nicely. Thank you Nick.
Jessica
On Thu, Jan 8, 2009 at 2:04 PM, Nick Cox <[email protected]> wrote:
> My -egen- function works for me. Any alleged pickyness with Mata and
> -if- cannot possibly bite you as my Mata code makes no use of -if-.
>
> What's much (enormously) more likely is that the code has got mangled
> somewhere en route. For example, the version in the Statalist archives
> at Harvard has a split line, as has the version below.
>
> As SSC is frozen in Kit's absence (see earlier today) I will send copies
> of files directly to Jessica.
>
> Nick
> [email protected]
>
> Jessica Looze
>
> Thank you Nick and Scott for your suggestions. I tried Nick's
> suggestion first, as an egen command seems the more efficient of the
> two. However, when I entered the command
>
> egen nvals = rownvals(emp1_97 emp2_97 emp3_97 emp1_98 emp2_98 emp_98)
>
> (after saving Nick's ado files of course) I received the error message
>
> unexpected end of line
> <istmt> incomplete
> r(3000);
>
> Unsure what this meant, I did a search and found a reference to this
> message in an archived Statalist coversation.
>
> http://www.stata.com/statalist/archive/2006-04/msg00434.html
>
> This discussion seems to indicate that this message has to do with the
> pickyness of Mata when "if" is involved. I am not very advanced at
> writing programs, so looking through your programs Nick, I am
> uncertain how to tweak it (if tweaking is even the issue). Maybe there
> is something else I need to be doing here?
>
> On Wed, Jan 7, 2009 at 10:58 AM, Nick Cox <[email protected]> wrote:
>> The problem is that of counting duplicate _values_ across a varlist
> and
>> within each observation. (The terminology of duplicate observations
>> would imply a problem for -duplicates-, but that command does not help
>> here.)
>>
>> Jessica's code borrowed from the -egenmore- package is to do with
>> counting values that are positive and non-missing. That won't help
>> either, as the values would be counted regardless of whether they are
>> distinct, as Jessica realises. There isn't a very easy way to go
> further
>> down that path, although it would be possible.
>>
>> Note that the -egenmore- package is on SSC. (Please remember to
> explain
>> where programs you use come from.)
>>
>> The problem is however very close to that discussed in an FAQ
>>
>> FAQ . . . . . . . . . Counting distinct strings across a set of
>> variables
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
>> J. Cox
>> 7/04 How do I count the number of distinct strings
>> across a set of variables?
>>
>> <http://www.stata.com/support/faqs/data/distinctstrings.html>
>>
>> One strategy discussed there starts with a -reshape-. Scott Merryman
> has
>> followed a similar line in his suggestions.
>>
>> Since that FAQ was written writing an -egen- function based on a Mata
>> workhorse has come to seem a good way to do this. In fact, the
>> -rowmedian()- function for -egen- in the -egenmore- package has most
> of
>> the code needed. As the problem arises for numeric variables as well
> for
>> string variables, two functions could be useful.
>>
>> * -------------------- put in _grownvals.ado on your adopath
>> * number of distinct non-missing numeric values in each observation
>> * NJC 1.0.0 7 Jan 2009
>> program _grownvals
>> version 9
>> gettoken type 0 : 0
>> gettoken h 0 : 0
>> gettoken eqs 0 : 0
>>
>> syntax varlist(numeric) [if] [in] [, BY(string)]
>> if `"`by'"' != "" {
>> _egennoby rownvals() `"`by'"'
>> /* NOTREACHED */
>> }
>>
>> marksample touse, novarlist
>> quietly {
>> mata : row_nvals("`varlist'", "`touse'", "`h'",
>> "`type'")
>> }
>> end
>>
>> mata :
>>
>> void row_nvals(string scalar varnames,
>> string scalar tousename,
>> string scalar nvalsname,
>> string scalar type)
>> {
>> real matrix y
>> real colvector nvals, row
>>
>> st_view(y, ., tokens(varnames), tousename)
>> nvals = J(rows(y), 1, .)
>>
>> for(i = 1; i <= rows(y); i++) {
>> row = y[i,]'
>> nvals[i] = length(uniqrows(select(row, (row :< .))))
>> }
>>
>> st_addvar(type, nvalsname)
>> st_store(., nvalsname, tousename, nvals)
>> }
>>
>> end
>> * end of _grownvals.ado
>>
>> * -------------------- put in _growsvals.ado on your adopath
>> * number of distinct non-missing string values in each observation
>> * NJC 1.0.0 7 Jan 2009
>> program _growsvals
>> version 9
>> gettoken type 0 : 0
>> gettoken h 0 : 0
>> gettoken eqs 0 : 0
>>
>> syntax varlist(string) [if] [in] [, BY(string)]
>> if `"`by'"' != "" {
>> _egennoby rowsvals() `"`by'"'
>> /* NOTREACHED */
>> }
>>
>> marksample touse, novarlist
>> quietly {
>> mata : row_svals("`varlist'", "`touse'", "`h'",
>> "`type'")
>> }
>> end
>>
>> mata :
>>
>> void row_svals(string scalar varnames,
>> string scalar tousename,
>> string scalar svalsname,
>> string scalar type)
>> {
>> string matrix y
>> string colvector row
>> real colvector nvals
>>
>> st_sview(y, ., tokens(varnames), tousename)
>> svals = J(rows(y), 1, .)
>>
>> for(i = 1; i <= rows(y); i++) {
>> row = y[i,]'
>> svals[i] = length(uniqrows(select(row, (row :!= ""))))
>> }
>>
>> st_addvar(type, svalsname)
>> st_store(., svalsname, tousename, svals)
>> }
>>
>> end
>> * end of _growsvals.ado
>>
>>
>> You can invoke these functions, once the program files are in place,
> by
>>
>> egen nvals = rownvals(<numeric varlist>)
>>
>> egen svals = rowsvals(<string varlist>)
>>
>> I'll add those functions to -egenmore- in due course.
>>
>> Nick
>> [email protected]
>>
>> Jessica Looze
>>
>> I am trying to create a variable that indicates the number of jobs an
>> individual has held during a period of years. The dataset I am using,
>> NLSY97, records each respondents' work history in a roster format.
>> This roster assigns each job a unique ID indicating the year the job
>> began. For example, the roster for respondent #1 might look like:
>>
>> ID Year Job 1 Job2 Job3
>> 1 1997 9701 9702 9703
>> 1 1998 9801 9701 .
>>
>> So, during these two years, this respondent held four different jobs
>> (9701 extending over into 1998).
>>
>> My data looks something like this:
>>
>> ID EMP1_97 EMP2_97 EMP3_97 EMP1_98 EMP2_98
>> EMP3_98
>> 1 9701 9702 9703 9801
>> 9701 .
>> 2 9701 . . 9701
>> . .
>>
>> I have been working with the row operations suggested in the egenmore
>> help entry. My current working code looks like that on this manual
>> page:
>>
>> gen any = 0
>> gen all = 1
>> gen count = 0
>> foreach v of varlist emp1_97 emp2_97 emp3_97 emp1_98 emp2_98
>> emp3_98 {
>> replace any = max(any, inrange(`v', 0, .))
>> replace all = min(all, inrange(`v', 0, .))
>> replace count = count + inrange(`v', 0, .)
>> }
>>
>> From here, I cannot figure out how to modify the variable count, so
>> that it disregards duplicate IDs.
>>
>> Any suggestions would be much appreciated.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/