Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: nearmrg for strings (titles)

From	Daniel Feenberg <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: AW: st: nearmrg for strings (titles)
Date	Fri, 2 Sep 2011 09:28:26 -0400 (EDT)

I spent yesterday trying to fix the program, but I seem to have left it 2years ago in a broken state, and wasn't able to make it work. I'll keep atit over the weekend, but possibly you shouldn't wait. I am sorry to raiseyour hopes.


dan

On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

Hello,

thanks a lot for your response. I tried to test it but I think I didn't understand how to use it (I'm a beginner).

This is my example:

sample.raw
-----------------------------------
       1  "manuela Hech"
       2  "Chris Mueller"
       3  "Fanzisa Haller "
       4  "Ulrike Loerr"
-----------------------------------

universe.raw
----------------------------------
       1  "manuela Hecher"
       2  "Christian Mueller"
       3  "Fanzisa Haller "
       4  "Ulrike Loerr"
---------------------------------


When I execute imatch.exe, it doesn't create the expected merge.txt but it creates 4 empty files:
canons.txt
fort.33
fort.34
fort.35
merge.raw


the code I get:

-----------------------------------
          1            32
          2            32
          3            32
          4            32
          5            32
          6            32
          7            32
          8            32
          9 1          49
         10            32
         11            32
         12 "          34
         13 m         109
         14 a          97
         15 n         110
         16 u         117
         17 e         101
         18 l         108
         19 a          97
         20            32
         21 H          72
         22 e         101
         23 c          99
         24 h         104
         25 e         101
         26 r         114
         27 "          34
         13
         29
         10
          1 1
         30            32
         31            32
         32            32
         33            32
         34            32
         35            32
         36            32
         37            32
         38 2          50
         39            32
         40            32
         41 "          34
         42 C          67
         43 h         104
         44 r         114
         45 i         105
         46 s         115
         47 t         116
         48 i         105
         49 a          97
         50 n         110
         51            32
         52 M          77
         53 u         117
         54 e         101
         55 l         108
         56 l         108
         57 e         101
         58 r         114
         59 "          34
         13
         61
         10
          2 2
         62            32
         63            32
         64            32
         65            32
         66            32
         67            32
         68            32
         69            32
         70 3          51
         71            32
         72            32
         73 "          34
         74 F          70
         75 a          97
         76 n         110
         77 z         122
         78 i         105
         79 s         115
         80 a          97
         81            32
         82 H          72
         83 a          97
         84 l         108
         85 l         108
         86 e         101
         87 r         114
         88            32
         89 "          34
         13
         91
         10
          3 3
         92            32
         93            32
         94            32
         95            32
         96            32
         97            32
         98            32
         99            32
        100 4          52
        101            32
        102            32
        103 "          34
        104 U          85
        105 l         108
        106 r         114
        107 i         105
        108 k         107
        109 e         101
        110            32
        111 L          76
        112 o         111
        113 e         101
        114 r         114
        115 r         114
        116 "          34
        117            -1

      3 records in universe.raw
      9 words
      6 unique words

Enter number of best match and return.
Enter an empty line for no match.

          1            32
          2            32
          3            32
          4            32
          5            32
          6            32
          7            32
          8            32
          9 1          49
         10            32
         11            32
         12 "          34
         13 m         109
         14 a          97
         15 n         110
         16 u         117
         17 e         101
         18 l         108
         19 a          97
         20            32
         21 H          72
         22 e         101
         23 c          99
         24 h         104
         25 "          34
         13
         27
         10
 manuela          2.19

*  "manuela Hech"


1. "manuela Hecher"

0-1:>

-----------------------------------


Thanks, Michaela

________________________________________
Von: [email protected] [[email protected]] im Auftrag von Daniel Feenberg [[email protected]]
Gesendet: Dienstag, 30. August 2011 14:13
An: [email protected]
Betreff: Re: st: nearmrg for strings (titles)

On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

Hello!

I would like to merge two datasets (variables: title, date, publisher).
The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
- Does it make sense to use nearmrg for this?
- In which way are strings merged/matched?
- What would you recommend me?


Some time ago I wrote a program to help a clerical do this rapidly. The
program finds up to 5 likely matches, and lets the operator select the
best match. I used it once to go through a few thousand journal article
matches but it hasn't been used since. There is documentation at:

  http://www.nber.org/imatch

and I would be interested in having a few more users. It is interactive,
but it isn't a GUI program - it runs from the command line and the
operator makes selections with the keyboard.

Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.

Dan Feenberg


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Package -ghansen- now available in SSC
Next by Date: st: Re: Question about ln-linear models
Previous by thread: st: Using cdeco with frequency weights
Next by thread: st: Re: Question about ln-linear models
Index(es):
- Date
- Thread