Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: identifying strings that differ on one or two letters


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: identifying strings that differ on one or two letters
Date   Fri, 19 Nov 2010 16:16:12 +0000

On -strgroup-, the pertiment information appears to be within the help file:

"strgroup is implemented as a plugin in order to minimize memory requirements and to maximize speed.  Unfortunately, plugins are specific to the hardware
    architecture and software framework of your computer, i.e., plugins are not cross-platform.  Define a platform by two characteristics: machine type and operating
    system.  Stata stores these characteristics in c(machine_type) and c(os), respectively. strgroup supports the following platforms at this time:

         Machine type                   Operating system
         PC                             Windows
         PC (64-bit x86-64)             Unix
         Macintosh                      MacOSX
         Macintosh (Intel 64-bit)       MacOSX"

The message appears to imply that your platform is not supported. 

On -soundex()- evidently that function classifies more coarsely than you need. 

These string matching problems are very difficult to automate in the sense of replicating what a knowledgeable human would do. 

Nick 
[email protected] 

Dalhia

I tried both techniques suggested by the list (thank you Dmitry and Scott). But neither seem to work, and I am hoping you can tell me what is wrong. 

I can't seem to load "strgroup." When I try to install it on stata 11, it gives me the following message:

"package does not contain strgroup.plugin for WIN64A platform could not load strgroup.pkg from http://fmwww.bc.edu/RePEc/bocode/s/";

I'm sure there is a simple fix, but my stata code knowledge is very basic, and I'm not sure how to fix this problem. 

I also tried Soundex, but it identifies completely different companies as the same. For example, suniti commercials ltd, sunnytex investments pvt ltd, sunteck realty & infrastructure ltd, syndicate bank, all get the same soundex code S532. And soundex does not seem to allow any options that might limit matches to names that are very similar. 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index