Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Creating long, filledin dataset from two, year variables |
Date | Mon, 7 Mar 2011 09:54:39 +0000 |
Thanks. This is much clearer to me. You need to -reshape long-. To do that, you need an identifier. Despite what you say, suppose that "Project" here might be repeated. So, you then might guess at egen id = concat(project doctor1 doctor2) being sufficient to identify projects uniquely. You can test out any potential identifiers using -isid-. Then it's a standard -reshape-: reshape long doctor startdate enddate, i(id) You can then -sort- as you wish. You can coarsen to months, if you like, but that's throwing away data. Nick On Mon, Mar 7, 2011 at 9:35 AM, Adrian Stork <storkadrian@googlemail.com> wrote: > Hi Nick, > > First of all thanks for your answer! Here's a more detailed example of > my dataset where > > project_|_startdate1_| enddate1_|_doctor1_|_startdate2_|_enddate2_| > doctor2_|_startdate3 |_enddate3_|_doctor3....s60|e60|d60 > _________________________________________________________________________________________________________ > > Infection 10Jan1995 03Dec2008 J.Smith 23Dec1976 12Feb2009 R.Andrews ....... > Vaccine 15Feb1990 05Jun2007 A.Calvin 12Aug1988 13Sept2004 H.Hollen ....... > Cancer 12Sept1987 12Dec2009 R.Jackson 14Sep1973 23Dec2006 V.Karren ........ > Diabetes 05Jan1992 13Nov2007 P.Stevens 03Jan1981 17Aug2001 A.Calvin ........ > Cadiol. 07Feb1977 09Mar2007 S.Devin 04Apr1985 14Jan2003 J.Smith ........ > > "Project" and "doctor" are strings. start and end date are float. > Sometimes the end-date is missing (".") meaning that the project is > still advised by > that doctor. Each project is mentioned only once in my dataset, so it > should be in fact my identifier,however, I do actually want to > focus on the doctors in order to see which projects one doctor had in > fact in each month and finally to count the number of projects > he had in each month. As you can see J.Smith had a project "Indection" > from 10Jan1995 until 03Dec2008 but he also had a project "Cardiol." > from 04Apr1985 until 14Jan2003 (similar case also for A.Calvin). > This means J.Smith had from 04Apr1985 until 10Jan1995 exactly one > project ("Cardiol."), from 10Jan1995 until 14Jan2003 he had two > projects > ("Cardiol" & "Infection") and from14Jan2003 until 03Dec2008 again only > one project ("Infection"). > This is also why I want the date to be in a panel on a monthly basis > that should be like: > > Doctor | Date | project1 | project2 | project3 |... > J.Smith Apr1985 Cardiol. > J.Smith May1985 Cardiol. > .. .. .. > J.Smith Jan1995 Cardiol Infection > J.Smith Feb1995 Cardiol Infection > .. .. .. .. > J.Smith Jan2003 Cardiol Infection > J.Smith Feb2003 Infection > ... .. .. > J.Smith Dec2008 Infection > A.Calvin ... > A.Calvin .. > .. > > This is everything but easy. Somehow I need to bring the dates at > least into one column. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/