Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Assigning new values to group variables
From
Florian Seliger <[email protected]>
To
[email protected]
Subject
Re: st: Assigning new values to group variables
Date
Wed, 11 May 2011 14:41:49 -0700
Dear Robert,
it took me a while to understand the logic behind your code, but it seems to work perfectly.
Thank you very much!
Am 09.05.2011 um 08:23 schrieb Robert Picard:
> There are many issues here but I assume that you want to preserve the
> relationship found in each observation. The following example creates
> a variable called rel_id that identifies each relationship. Your main
> issue of having consistent Group values is done by converting the data
> to long form. Then I create a new variable called gid that identifies
> groups of companies based on the relationships stated in the initial
> dataset. This requires a program of mine called -group_id-, available
> from SSC. Just in case you needed it, I convert back to wide form.
>
> Hope this helps,
>
> Robert
>
> * --------------------- begin example ---------------------
> clear
> input Group1 str10 Var1 Group2 str10 Var2
> 1 companyABC 1 companyABD
> 1 companyABC . .
> 2 companyABD . .
> 3 companyABE . .
> 4 companyABF 2 companyCCC
> 5 companyACF 3 companyDDD
> 6 companyACG . .
> 6 companyACG 4 companyADK
> 7 companyADK . .
> 8 companyADL 5 companyCCD
> 8 companyADL . .
> end
>
> * Assign a unique identifier to each observation
> * These identify a relationship
>
> gen rel_id = _n
>
> * Reshape to long form; drop obs with no company
>
> reshape long Group Var, i(rel_id) j(j)
> drop if Var == "."
>
> * Disregard Group values if they are not Group1
>
> replace Group = . if j > 1
>
> * Each company should have the same Group value
>
> sort Var Group
> by Var: replace Group = Group[1]
>
> * Assign new Group values for companies that were
> * not part of Group1
>
> by Var: gen first = _n == 1
> sum Group, meanonly
> replace Group = r(max) + sum(first) if Group == .
> drop first
>
> * Group co_id when they are part of the same
> * relationship. This requires -group_id-, available
> * from SSC. To install, type ssc install group_id
>
> gen gid = Group
> group_id gid, matchby(rel_id)
> sort gid Var
> list, sepby(gid) noobs
>
> * If desired, convert back to wide
>
> sort rel_id
> reshape wide Var Group gid, i(rel_id) j(j)
> list, noobs sep(0)
> * --------------------- end example -----------------------
>
>
>
>
> On Mon, May 9, 2011 at 7:35 AM, Florian Seliger <[email protected]> wrote:
>> Dear Stalalist,
>>
>> I have a dataset from a firm survey containing several thousand observations.
>>
>> There are six variables with company names (Var1-Var6) where firms are asked to indicate to which other firms they have relationships.
>>
>> Similar companies may occur within Var1-Var6. These are grouped as indicated by the variables group1-group6.
>>
>> Var2-Var6 contain many missing values because many firms answer to have only a relationship to a single firm.
>>
>> The variables group1-group6 have different numbers although the companies are the same in var1 and var2 (and var3…), e.g., group1 may take on value 2 whereas group2 takes on value 1 for the same company. The problem is that there may also occur other companies in var2-var6 than in var1.
>>
>> Please see the example below for a few companies.
>>
>>
>>
>> Group1 Var1 Group2 Var2
>>
>> 1 companyABC 1 companyABD
>>
>> 1 companyABC . .
>>
>> 2 companyABD . .
>>
>> 3 companyABE . .
>>
>> 4 companyABF 2 companyCCC
>>
>> 5 companyACF 3 companyDDD
>>
>> 6 companyACG . .
>>
>> 6 companyACG 4 companyADK
>>
>> 7 companyADK . .
>>
>> 8 companyADL 5 companyCCD
>>
>> 8 companyADL . .
>>
>>
>>
>> At the end, all similar companies across Var1-Var6 should have the same value as in group1. In addition, companies that do not occur in Var1 should be assigned another number. Please look below for an example.
>>
>>
>>
>>
>>
>> Group1 Var1 Group2 Var2
>>
>> 1 companyABC 1 .
>>
>> 1 companyABC 1 .
>>
>> 2 companyABD 2 companyABD
>>
>> 3 companyABE 3 .
>>
>> 4 companyABF 4 .
>>
>> 5 companyACF 5 .
>>
>> 6 compaynACG 6 .
>>
>> 6 companyACG 6 .
>>
>> 7 companyADK 7 companyADK
>>
>> 8 companyADL 8 .
>>
>> 8 companyADL 8 .
>>
>> 9 . 9 companyCCC
>>
>> 10 . 10 companyDDD
>>
>> 11 . 11 companyCCD
>>
>>
>>
>> As I did not find the right approach to assign new numbers with STATA if a company does not occur in var1, I would like to ask you if you have any ideas.
>>
>>
>>
>> Thank you.
>>
>>
>>
>> Best,
>>
>> Florian
>> --
>> NEU: FreePhone - kostenlos mobil telefonieren und surfen!
>> Jetzt informieren: http://www.gmx.net/de/go/freephone
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/