Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AW: st: nearmrg for strings (titles)
From
Daniel Feenberg <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: AW: st: nearmrg for strings (titles)
Date
Fri, 2 Sep 2011 09:28:26 -0400 (EDT)
I spent yesterday trying to fix the program, but I seem to have left it 2
years ago in a broken state, and wasn't able to make it work. I'll keep at
it over the weekend, but possibly you shouldn't wait. I am sorry to raise
your hopes.
dan
On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:
Hello,
thanks a lot for your response. I tried to test it but I think I didn't understand how to use it (I'm a beginner).
This is my example:
sample.raw
-----------------------------------
1 "manuela Hech"
2 "Chris Mueller"
3 "Fanzisa Haller "
4 "Ulrike Loerr"
-----------------------------------
universe.raw
----------------------------------
1 "manuela Hecher"
2 "Christian Mueller"
3 "Fanzisa Haller "
4 "Ulrike Loerr"
---------------------------------
When I execute imatch.exe, it doesn't create the expected merge.txt but it creates 4 empty files:
canons.txt
fort.33
fort.34
fort.35
merge.raw
the code I get:
-----------------------------------
1 32
2 32
3 32
4 32
5 32
6 32
7 32
8 32
9 1 49
10 32
11 32
12 " 34
13 m 109
14 a 97
15 n 110
16 u 117
17 e 101
18 l 108
19 a 97
20 32
21 H 72
22 e 101
23 c 99
24 h 104
25 e 101
26 r 114
27 " 34
13
29
10
1 1
30 32
31 32
32 32
33 32
34 32
35 32
36 32
37 32
38 2 50
39 32
40 32
41 " 34
42 C 67
43 h 104
44 r 114
45 i 105
46 s 115
47 t 116
48 i 105
49 a 97
50 n 110
51 32
52 M 77
53 u 117
54 e 101
55 l 108
56 l 108
57 e 101
58 r 114
59 " 34
13
61
10
2 2
62 32
63 32
64 32
65 32
66 32
67 32
68 32
69 32
70 3 51
71 32
72 32
73 " 34
74 F 70
75 a 97
76 n 110
77 z 122
78 i 105
79 s 115
80 a 97
81 32
82 H 72
83 a 97
84 l 108
85 l 108
86 e 101
87 r 114
88 32
89 " 34
13
91
10
3 3
92 32
93 32
94 32
95 32
96 32
97 32
98 32
99 32
100 4 52
101 32
102 32
103 " 34
104 U 85
105 l 108
106 r 114
107 i 105
108 k 107
109 e 101
110 32
111 L 76
112 o 111
113 e 101
114 r 114
115 r 114
116 " 34
117 -1
3 records in universe.raw
9 words
6 unique words
Enter number of best match and return.
Enter an empty line for no match.
1 32
2 32
3 32
4 32
5 32
6 32
7 32
8 32
9 1 49
10 32
11 32
12 " 34
13 m 109
14 a 97
15 n 110
16 u 117
17 e 101
18 l 108
19 a 97
20 32
21 H 72
22 e 101
23 c 99
24 h 104
25 " 34
13
27
10
manuela 2.19
* "manuela Hech"
1. "manuela Hecher"
0-1:>
-----------------------------------
Thanks, Michaela
________________________________________
Von: [email protected] [[email protected]] im Auftrag von Daniel Feenberg [[email protected]]
Gesendet: Dienstag, 30. August 2011 14:13
An: [email protected]
Betreff: Re: st: nearmrg for strings (titles)
On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:
Hello!
I would like to merge two datasets (variables: title, date, publisher).
The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
- Does it make sense to use nearmrg for this?
- In which way are strings merged/matched?
- What would you recommend me?
Some time ago I wrote a program to help a clerical do this rapidly. The
program finds up to 5 likely matches, and lets the operator select the
best match. I used it once to go through a few thousand journal article
matches but it hasn't been used since. There is documentation at:
http://www.nber.org/imatch
and I would be interested in having a few more users. It is interactive,
but it isn't a GUI program - it runs from the command line and the
operator makes selections with the keyboard.
Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.
Dan Feenberg
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/