Stata | FAQ: Creating variables recording whether any or all members of a group possess some characteristic

Home / Resources & support / FAQs / Creating variables recording whether any or all members of a group possess some characteristic

How do I create a variable recording whether any members of a group (or all members of a group) possess some characteristic?

Title		Creating variables recording whether any or all members of a group possess some characteristic
Author		Nicholas J. Cox, Durham University, UK

In the simplest case, we have a binary variable recording whether, for example, persons are male or female, unemployed or employed, or whatever, and some group variable, like a variable recording a family identifier. For example,

          family     person       female 
  1.         1          1          1  
  2.         1          2          1  
  3.         1          3          1  

  4.         2          1          0  
  5.         2          2          0  
  6.         2          3          0  

  7.         3          1          0  
  8.         3          2          0  
  9.         3          3          0  
 10.         3          4          1  
 11.         3          5          1  
 12.         3          6          1

Suppose that female is recorded as 1 for female and 0 for male. Such 0–1 coding is in a sense arbitrary but makes life easier, especially for statistical modeling in which the response is a binary variable.

Imagine various families:

contains 3 females, so values of female are 1, 1, 1
contains 3 males, so values of female are 0, 0, 0
contains 3 males and 3 females, so values of female are 0, 0, 0, 1, 1, 1

From these examples, we can see a correspondence between two ways of thinking about such families:

If all members of a family are female, the minimum value of female is 1 in that family and vice versa.
If no members of a family are female, the maximum value of female is 0 in that family and vice versa.
If any member of a family is female, the maximum value of female is 1 in that family and vice versa.

Thus egen provides a one-line answer here to each part of the question:

        . egen anyfem = max(female), by(family) 
        . egen allfem = min(female), by(family)

anyfem or allfem will be 1 or 0 according to whether it is true (1) or false (0) that any or all in a family are female.

Real examples could be more complicated than this.

First, what if the characteristic of interest is not coded as a 0–1 variable? This approach is only barely more difficult. The syntax of egen, min() and egen, max() is that each feeds on an expression; see [D] egen. We could have typed

 . egen anymale = max(female == 0), by(family) 

 . egen allmale = min(female == 0), by(family) 

 . egen anyDemo = max(pty == "D"), by(family) 

 . egen allDemo = min(pty == "D"), by(family)

In other words, we can use any expression that is true or false. That expression, fed to max() or min(), will be evaluated observation by observation with a result of 1 if true or 0 if false. The expression can refer to numeric or string variables or to a combination of the two.

Second, what if missing values are present? For numeric variables, missing counts as higher than any other numeric value, but egen, max() is smart enough to ignore it. Only if all values in a group are missing will the result variable be missing.

Occasionally, you may want a strict definition of all—that literally all values in a group must possess the characteristic, with no missing values allowed. Here is one approach:

 . egen anymiss = max(missing(female)), by(family)

 . egen allfem = min(female) if !anymiss, by(family)

Here is another:

 . egen anymiss = max(female), by(family)

 . egen allfem = min(female), by(family)

 . replace allfem = 0 if anymiss

The difference is, in the first case, any family with a member with unknown sex will be coded as missing, whereas, in the second case, any family with such a member will be coded as 0.

In expressions, for example, female==0 is false (0) if female is missing (that is, female==0 does not evaluate to missing). If we had another variable in our data—grade taking on values 1, 2, 3, 4, ...—then grade>3 is true even if grade is missing. Think of missing values as positive infinity. In some instances, excluding missing values explicitly is the most appropriate specification.

 . egen anyhigh = max(grade > 3 & grade < .), by(group) 

 . egen allhigh = min(grade > 3 & grade < .), by(group)

Acknowledgement

Thanks to Tom Rogers for highlighting an incorrect detail in an earlier version.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I create a variable recording whether any members of a group (or all members of a group) possess some characteristic?

Acknowledgement

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do I create a variable recording whether any members of a group (or all members of a group) possess some characteristic?

Acknowledgement

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies