Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Data Manipulation Question
From
Alex Warofka <[email protected]>
To
statalist <[email protected]>
Subject
st: Data Manipulation Question
Date
Thu, 19 Sep 2013 11:24:43 -0400
Hi,
I'm trying to perform what seems like a relatively simple data
manipulation task on a very large dataset (~20GB with 20 million
observations), but having some difficulties wrapping my head around
the best way to do so using Stata.
I have four variables—individual ID (not unique because one individual
can work for multiple employers in the same period), employer EIN, and
quarter—and am attempting to flag events where >=80% of the employees
working at a given EIN in quarter 1 move to the same different EIN in
quarter 2 AND >=80% of the employees working at such an EIN in quarter
2 came from the same different EIN in quarter 1. In essence, the goal
is to flag spurious transition events where an employer appears to
change but in fact only their EIN has changed. This is the same
procedure used in building the successor-predecessor file for the QWI
and described in Census technical paper TP-2006-01.
My initial thought was to use levelsof and loop over EINs, pulling a
local macro containing the IDs of employees for each EIN, then looping
through these employees to see where they are working in Q2 and so on.
This doesn't work as I run into the 67,784 character macro length
limit. Splitting the dataset by quarter, merging, and then using
_merge to track individual movements between firms doesn't work as my
IDs are not unique.
Does anyone have any recommendations for handling this in Stata? At
this point, I'm becoming tempted to just write a Ruby script to do
this, but would be thrilled to discover it was possible in Stata.
Thanks,
--
Alex Warofka
Research Associate | California Center for Population Research, UCLA
[email protected] | nomad.cm | @AlexWarofka
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/