Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: stacking unique values of several variables under one new variable
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: stacking unique values of several variables under one new variable
Date
Mon, 25 Feb 2013 08:44:37 +0000
For "unique" read "distinct".
My code is very similar to Maarten's but I will post it nevertheless.
If it's as simple as your example implies then you can do this:
. gen long obs = _n
. split technology , p(,)
variables created as string:
technology1 technology2
. local k = r(nvars)
. expand `k'
(4 observations created)
. forval j = 1/`k' {
2. bysort obs : replace technology = technology`j'[1] if _n == `j'
3. }
(2 real changes made)
(4 real changes made)
. drop if missing(technology)
(2 observations deleted)
. replace technology = trim(technology)
(2 real changes made)
. drop technology?
. duplicates drop technology, force
Duplicates in terms of technology
(1 observation deleted)
. list
+-------------------+
| technology obs |
|-------------------|
1. | Monoclonals 1 |
2. | Vaccines 2 |
3. | Adjuvant 3 |
4. | Vaccine 3 |
5. | Combinchem 4 |
+-------------------+
Here's the code in one
gen long obs = _n
split technology , p(,)
local k = r(nvars)
expand `k'
forval j = 1/`k' {
bysort obs : replace technology = technology`j'[1] if _n == `j'
}
drop if missing(technology)
replace technology = trim(technology)
drop technology?
duplicates drop technology, force
list
Notes: Knowing that "Vaccines" and "Vaccine" mean the same, and
anything similar, will have to be part of extra code.
Maarten's code assumes that the separator is always ", ". I don't
assume that there is a space always, so I am obliged to trim spaces
afterwards.
Nick
On Mon, Feb 25, 2013 at 6:15 AM, James Bernard <[email protected]> wrote:
> I have been struggling with the following. I would appreciate you help
>
> I have a variable ("Technology) that indicates type(s) of a technology
> for each record. I want to aggregate the unique values of this
> variable under one new variable, say, called "Type:
>
>
> Technology
> -------------------------
> Monoclonals
> Vaccines
> Adjuvant, Vaccine
> Combinchem, Monoclonals
>
>
>
>
>
> Now, i want to create a variable that stores unique values:
>
> Type
> -----------
> Monoclonals
> Vaccines
> Adjuvant,
> Combinchem
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/