Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Assigning new values to group variables

From	Florian Seliger <[email protected]>
To	[email protected]
Subject	Re: st: Assigning new values to group variables
Date	Wed, 11 May 2011 14:41:49 -0700

Dear Robert,

it took me a while to understand the logic behind your code, but it seems to work perfectly.

Thank you very much!


Am 09.05.2011 um 08:23 schrieb Robert Picard:

> There are many issues here but I assume that you want to preserve the
> relationship found in each observation. The following example creates
> a variable called rel_id that identifies each relationship. Your main
> issue of having consistent Group values is done by converting the data
> to long form. Then I create a new variable called gid that identifies
> groups of companies based on the relationships stated in the initial
> dataset. This requires a program of mine called -group_id-, available
> from SSC. Just in case you needed it, I convert back to wide form.
> 
> Hope this helps,
> 
> Robert
> 
> * --------------------- begin example ---------------------
> clear
> input Group1 str10 Var1 Group2 str10 Var2
> 1 companyABC 1 companyABD
> 1 companyABC . .
> 2 companyABD . .
> 3 companyABE . .
> 4 companyABF 2 companyCCC
> 5 companyACF 3 companyDDD
> 6 companyACG . .
> 6 companyACG 4 companyADK
> 7 companyADK . .
> 8 companyADL 5 companyCCD
> 8 companyADL . .
> end
> 
> * Assign a unique identifier to each observation
> * These identify a relationship
> 
> gen rel_id = _n
> 
> * Reshape to long form; drop obs with no company
> 
> reshape long Group Var, i(rel_id) j(j)
> drop if Var == "."
> 
> * Disregard Group values if they are not Group1
> 
> replace Group = . if j > 1
> 
> * Each company should have the same Group value
> 
> sort Var Group
> by Var: replace Group = Group[1]
> 
> * Assign new Group values for companies that were
> * not part of Group1
> 
> by Var: gen first = _n == 1
> sum Group, meanonly
> replace Group = r(max) + sum(first) if Group == .
> drop first
> 
> * Group co_id when they are part of the same
> * relationship. This requires -group_id-, available
> * from SSC. To install, type ssc install group_id
> 
> gen gid = Group
> group_id gid, matchby(rel_id)
> sort gid Var
> list, sepby(gid) noobs
> 
> * If desired, convert back to wide
> 
> sort rel_id
> reshape wide Var Group gid, i(rel_id) j(j)
> list, noobs sep(0)
> * --------------------- end example -----------------------
> 
> 
> 
> 
> On Mon, May 9, 2011 at 7:35 AM, Florian Seliger <[email protected]> wrote:
>> Dear Stalalist,
>> 
>> I have a dataset from a firm survey containing several thousand observations.
>> 
>> There are six variables with company names (Var1-Var6) where firms are asked to indicate to which other firms they have relationships.
>> 
>> Similar companies may occur within Var1-Var6. These are grouped as indicated by the variables group1-group6.
>> 
>> Var2-Var6 contain many missing values because many firms answer to have only a relationship to a single firm.
>> 
>>  The variables group1-group6 have different numbers although the companies are the same in var1 and var2 (and var3…), e.g., group1 may take on value 2 whereas group2 takes on value 1 for the same company. The problem is that there may also occur other companies in var2-var6 than in var1.
>> 
>> Please see the example below for a few companies.
>> 
>> 
>> 
>> Group1          Var1                       Group2          Var2
>> 
>> 1                     companyABC            1                  companyABD
>> 
>> 1                     companyABC            .                       .
>> 
>> 2                     companyABD            .                       .
>> 
>> 3                     companyABE            .                       .
>> 
>> 4                     companyABF            2                  companyCCC
>> 
>> 5                     companyACF            3                  companyDDD
>> 
>> 6                     companyACG            .                       .
>> 
>> 6                     companyACG            4                  companyADK
>> 
>> 7                     companyADK            .                       .
>> 
>> 8                     companyADL            5                  companyCCD
>> 
>> 8                     companyADL            .                       .
>> 
>> 
>> 
>> At the end, all similar companies across Var1-Var6 should have the same value as in group1. In addition, companies that do not occur in Var1 should be assigned another number. Please look below for an example.
>> 
>> 
>> 
>> 
>> 
>> Group1          Var1                        Group2          Var2
>> 
>> 1                     companyABC            1                     .
>> 
>> 1                     companyABC            1                     .
>> 
>> 2                     companyABD            2                   companyABD
>> 
>> 3                     companyABE            3                     .
>> 
>> 4                     companyABF            4                     .
>> 
>> 5                     companyACF            5                     .
>> 
>> 6                     compaynACG            6                     .
>> 
>> 6                     companyACG            6                     .
>> 
>> 7                     companyADK            7                   companyADK
>> 
>> 8                     companyADL            8                     .
>> 
>> 8                     companyADL            8                     .
>> 
>> 9                     .                     9                   companyCCC
>> 
>> 10                   .                      10                  companyDDD
>> 
>> 11                   .                      11                  companyCCD
>> 
>> 
>> 
>> As I did not find the right approach to assign new numbers with STATA if a company does not occur in var1, I would like to ask you if you have any ideas.
>> 
>> 
>> 
>> Thank you.
>> 
>> 
>> 
>> Best,
>> 
>> Florian
>> --
>> NEU: FreePhone - kostenlos mobil telefonieren und surfen!
>> Jetzt informieren: http://www.gmx.net/de/go/freephone
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Assigning new values to group variables
  - From: "Florian Seliger" <[email protected]>
- Re: st: Assigning new values to group variables
  - From: Robert Picard <[email protected]>

Prev by Date: Re: st: using weights for a robust regression
Next by Date: Re: st: Labelling output in 12 month rolling loops
Previous by thread: Re: st: Assigning new values to group variables
Next by thread: st: bar graphs
Index(es):
- Date
- Thread