Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: loops and strpos to test if an observations string variable value is also in another string variable
From
Dan Blanchette <[email protected]>
To
[email protected]
Subject
st: loops and strpos to test if an observations string variable value is also in another string variable
Date
Fri, 25 Jun 2010 09:32:46 -0400 (EDT)
You can use a variable name instead of a string/local macro variable:
(input the last example dataset you posted)
. list if strpos(class_sem2_names,Child)
+-----------------------------------------------------------------------------------------------+
| Class Teacher Subject Semester Child in_sem_b class_sem2_names |
|-----------------------------------------------------------------------------------------------|
1. | 1 Mrs. Fox Math a Smith 1 SmithJonesFoxTolmieKershawBarker |
2. | 1 Mrs. Fox Math a Jones 1 SmithJonesFoxTolmieKershawBarker |
3. | 1 Mrs. Fox Math a Barker 1 SmithJonesFoxTolmieKershawBarker |
4. | 1 Mrs. Fox Math a Kershaw 1 SmithJonesFoxTolmieKershawBarker |
6. | 1 Mrs. Fox Math a Tolmie 1 SmithJonesFoxTolmieKershawBarker |
+-----------------------------------------------------------------------------------------------+
If you wanted to search all obs using the value of Child at a certain observation like obs 4:
list if strpos(class_sem2_names,Child[4])
works.
You could loop through all obs:
forvalues n= 1/`c(N)' {
list if strpos(class_sem2_names,Child[`n'])
}
HTH,
Dan Blanchette
Research Associate
Center for Entrepreneurship and Innovation
Duke University's Fuqua School of Business
[email protected]
From dckersh <[email protected]>
To <[email protected]>
Subject st: loops and strpos to test if an observations string variable value is also in another string variable
Date Thu, 24 Jun 2010 21:42:04 -0400
I could use some help on using loops and strpos() to see whether an
observationâ??s string variable value is present in another string variable.
I have statewide roster data for a number of different years. Within each
year, schools track students in classes in different ways. Some schools are
very detailed and keep track of classrooms at numerous times throughout the
years. So you will have different records for a child that will contain the
same teacher, course title, course code, but different sections, semesters,
meeting times, etc. The result is a situation where the same "class" has
different students (some students move, etc.). I am trying to find a way to
identify which students (and the overall proportion of students that) were
in a class in each instance. To do this, I want to be able to flag the
first semester students who were also present in the second semester.
A fairly simple way to do this is to select the second semester classes,
reshape the data to one record per class, capture the names of students in
the class in one variable, merge those names (that variable) onto the first
semester version of that class, and then search for the last names of the
students in the first semester of the class within the variable that
captures the last names of the students in the class during the second
semester. Skipping the reshaping and merging which I've figured out, I
theoretically get what I want with the following code on data similar to
below:
gen in_sem_b = 0
replace in_sem_b = 1 if Child=="<NAME>" &
strpos(class_sem2_names,"<NAME>")>0
replace in_sem_b = 1 if Child=="Smith" &
strpos(class_sem2_names,"Smith")>0
*using data structured like this*
Class Teacher Subject Semester Child in_sem_b class_sem2_names
1 Mrs. Fox Math a Smith 1
SmithJonesFoxTolmieKershawBarker
1 Mrs. Fox Math a Jones 1
SmithJonesFoxTolmieKershawBarker
I have seen some code in older listserv posts that substitutes a relative
`X' variable within strpos [strpos(class_sem2_names,`X' )] but I am at a
loss as to what code I can use to simultaneously take each observation's
name from the child variable and to search for it in the class_sem2_names
variables (which will vary by class). Note that the same child may be in
other classes, so the child names are not unique.
I am open to all suggestions. Thanks for any assistance.
Warm Regards,
Dave Kershaw
HEREâ??S MORE DETAILED DATA TO HIGHLIGHT WHAT Iâ??M DOING
*TRUNCATED RAW DATA - one class, two semesters
Class Teacher Subject Semester Child
1 Mrs. Fox Math a Smith
1 Mrs. Fox Math a Jones
1 Mrs. Fox Math a Barker
1 Mrs. Fox Math a Kershaw
1 Mrs. Fox Math a Tanner
1 Mrs. Fox Math a Tolmie
2 Mrs. Fox Math b Smith
2 Mrs. Fox Math b Jones
2 Mrs. Fox Math b Fox
2 Mrs. Fox Math b Tolmie
2 Mrs. Fox Math b Kershaw
2 Mrs. Fox Math b Barker
.
.
.
*Data for only the first semester of a class only
Class Teacher Subject Semester Child
1 Mrs. Fox Math a Smith
1 Mrs. Fox Math a Jones
1 Mrs. Fox Math a Barker
1 Mrs. Fox Math a Kershaw
1 Mrs. Fox Math a Tanner
1 Mrs. Fox Math a Tolmie
*Data for the first semester of a class with names of 2nd merged, kids
flagged.
Class Teacher Subject Semester Child in_sem_b class_sem2_names
1 Mrs. Fox Math a Smith 1
SmithJonesFoxTolmieKershawBarker
1 Mrs. Fox Math a Jones 1
SmithJonesFoxTolmieKershawBarker
1 Mrs. Fox Math a Barker 1
SmithJonesFoxTolmieKershawBarker
1 Mrs. Fox Math a Kershaw 1
SmithJonesFoxTolmieKershawBarker
1 Mrs. Fox Math a Tanner 0
SmithJonesFoxTolmieKershawBarker
1 Mrs. Fox Math a Tolmie 1
SmithJonesFoxTolmieKershawBarker