Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Relative Comparision between Observations
From
"Jens Kruk" <[email protected]>
To
[email protected]
Subject
Re: st: Relative Comparision between Observations
Date
Thu, 25 Aug 2011 16:56:16 +0200
one important note: the number of transactions per spread should be small, probably smaller than 10 for virtually every spread.
Jens
-------- Original-Nachricht --------
> Datum: Thu, 25 Aug 2011 15:52:23 +0100
> Von: Nick Cox <[email protected]>
> An: [email protected]
> Betreff: Re: st: Relative Comparision between Observations
> With that number, "my" approach can't get its shoes on, let alone run
> down the track -- although it is I think what you asked for.
>
> I think you really need advice from people who do your kind of thing
> in Stata, but unfortunately I am not one of them. I only have two
> broad thoughts: think long not wide, and at some point -merge- will be
> your friend.
>
> Nick
>
> On Thu, Aug 25, 2011 at 3:43 PM, <[email protected]> wrote:
> > Hi Nick,
> > thanks a lot.
> > The dataset contains 500 000 transactions (in addition to the 7 million
> spreads), but I will use your approach as a starting point for an algorithm
> that allows to cope with this large dataset.
> >
> > Any suggestion to get this done quickly is still very welcome.
> >
> >
> > Best regards and thanks again,
> >
> > Jens
> >
> >
> >
> >
> >
> >
> >
> > -------- Original-Nachricht --------
> >> Datum: Thu, 25 Aug 2011 15:20:58 +0100
> >> Von: Nick Cox <[email protected]>
> >> An: [email protected]
> >> Betreff: Re: st: Relative Comparision between Observations
> >
> >> For -transaction[2]- (e.g.) you can generate
> >>
> >> . gen within_2 = inrange(transaction[2], start, end) & isspread
> >>
> >> Is the number of transactions small enough to allow a variable for
> >> every one of them?
> >>
> >> If so, this is crude but should work
> >>
> >> forval i = 1/`=_N' {
> >> if isspread[`i'] == 0 gen within_`i' =
> inrange(transaction[`i'],
> >> start, end) & isspread
> >> }
> >>
> >> A visceral reaction is that getting the wrong data structure is
> >> horribly easy here, but people who work with this kind of data may be
> >> able to advise constructively.
> >>
> >> Nick
> >>
> >> On Thu, Aug 25, 2011 at 2:55 PM, Jens Kruk <[email protected]> wrote:
> >> > Hi Nick,
> >> > lets say the data looks like this:
> >> >
> >> > id____isspread____start____end____transaction
> >> > 1_____1___________3________6______.
> >> > 2_____0___________.________.______5
> >> > 3_____1___________2________5______.
> >> > 4_____0___________.________.______5.5
> >> >
> >> >
> >> >
> >> > now what I want Stata to do is to tell me (for example by creating
> >> additional variables that contain the ids) that ids 2 and 4 occured
> between
> >> start and end date of observation 1 (5 and 5.5 are between 3 and 6) and
> that id
> >> 2 occured between the start and end date of spread 3 (5 is weakly
> between
> >> 2 and 5).
> >> > A perfect result of the procedure would look like this:
> >> >
> >> > id____isspread____start____end____transaction____tr1___tr2
> >> > 1_____1___________3________6______.______________2_____4__
> >> > 2_____0___________.________.______5______________._____.__
> >> > 3_____1___________2________5______.______________2_____.__
> >> > 4_____0___________.________.______5.5____________._____.__
> >> >
> >> >
> >> > Best, Jens
> >> >
> >> >
> >> >
> >> >
> >> > -------- Original-Nachricht --------
> >> >> Datum: Thu, 25 Aug 2011 14:22:19 +0100
> >> >> Von: Nick Cox <[email protected]>
> >> >> An: [email protected]
> >> >> Betreff: Re: st: Relative Comparision between Observations
> >> >
> >> >> Please show a representative chunk of your data so that precisely
> what
> >> >> are your variables and your observations becomes clear.
> >> >>
> >> >> Nick
> >> >>
> >> >> On Thu, Aug 25, 2011 at 2:09 PM, <[email protected]> wrote:
> >> >>
> >> >> > I want to perform the following task for a very large dataset (so
> >> >> writing a Mata loop is probably not the solution): the dataset
> consists
> >> of two
> >> >> sorts of data: spreads and transactions. Spreads do have a start and
> an
> >> end
> >> >> date, while transactions only have a transaction date. Now I want to
> >> know
> >> >> whether some transaction happend between the start and end date of a
> >> spread.
> >> >> Ideally, I would like to have variables containing all the ids of
> >> >> transactions that occured between the start and end data of the
> spread
> >> for each
> >> >> spread. Is there a way to use inexact matching or merging for this ?
> >> >> > This should be a familiar problem, however, I do not have a clue
> how
> >> to
> >> >> solve it.
> >>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/statalist/faq
> >> * http://www.ats.ucla.edu/stat/stata/
> >
> > --
> > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
> > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/