<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to bugs</title><link>https://sourceforge.net/p/simmetrics/bugs/</link><description>Recent changes to bugs</description><atom:link href="https://sourceforge.net/p/simmetrics/bugs/feed.rss" rel="self"/><language>en</language><lastBuildDate>Sun, 07 Dec 2014 17:12:31 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/simmetrics/bugs/feed.rss" rel="self" type="application/rss+xml"/><item><title>#7 Loop in TokeniserWhitespace.tokenizeToArrayList</title><link>https://sourceforge.net/p/simmetrics/bugs/7/?limit=25#0a82</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I have a fixed version here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/mpkorstanje/simmetrics/blob/master/src/main/java/uk/ac/shef/wit/simmetrics/tokenisers/TokeniserCSVBasic.java" rel="nofollow"&gt;https://github.com/mpkorstanje/simmetrics/blob/master/src/main/java/uk/ac/shef/wit/simmetrics/tokenisers/TokeniserCSVBasic.java&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But you'll have to build from source. I'm doing an overhaul of the whole thing.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">mpkorstanje</dc:creator><pubDate>Sun, 07 Dec 2014 17:12:31 -0000</pubDate><guid>https://sourceforge.net7e70c80474ae78a3449c85a9f21d6c6c98e3e372</guid></item><item><title>Loop in TokeniserWhitespace.tokenizeToArrayList</title><link>https://sourceforge.net/p/simmetrics/bugs/7/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Never ending loop with specific inputs in uk.ac.shef.wit.simmetrics.tokenisers.TokeniserWhitespace.tokenizeToArrayList&lt;/p&gt;
&lt;p&gt;sample program:&lt;br /&gt;
      public static void main(String[] args) {&lt;br /&gt;
        System.out.println("start");&lt;br /&gt;
        InterfaceStringMetric l_metric = new MongeElkan();&lt;br /&gt;
        String l_address_a = "POST OFFICE HOLD &amp;amp; PHONE";&lt;br /&gt;
        String l_address_b = "2665 - C   N HIGHLAND AVE";&lt;br /&gt;
        float l_score = l_metric.getSimilarity(l_address_a, l_address_b);&lt;br /&gt;
        System.out.println("end, score=" + l_score);&lt;br /&gt;
      }&lt;/p&gt;
&lt;p&gt;stack trace:&lt;br /&gt;
    "main" prio=10 tid=0x00007ff70800d800 nid=0x5451 runnable &lt;span&gt;[0x00007ff70ead2000]&lt;/span&gt;&lt;br /&gt;
       java.lang.Thread.State: RUNNABLE&lt;br /&gt;
        at uk.ac.shef.wit.simmetrics.tokenisers.TokeniserWhitespace.tokenizeToArrayList(TokeniserWhitespace.java:121)&lt;br /&gt;
        at uk.ac.shef.wit.simmetrics.similaritymetrics.MongeElkan.getSimilarity(MongeElkan.java:170)&lt;br /&gt;
        at com.mm.server.inventory.app.MergeWorklistFunctions.main(MergeWorklistFunctions.java:213)&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mitch Claborn</dc:creator><pubDate>Wed, 12 Feb 2014 17:13:58 -0000</pubDate><guid>https://sourceforge.net5a993f8cfc9c0853ed070416777b11a2ed21e2db</guid></item><item><title>String tokensier break down method runs in deadlock</title><link>https://sourceforge.net/p/simmetrics/bugs/6/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;The TokeniserWhitespace.tokenizeToArrayList(String input) runs into deadlock when the input string contains more than one whitespace simultaneously.&lt;br /&gt;
I've fixed it in the source code in version 1.6. But can't commit in your provided svn..&lt;/p&gt;
&lt;p&gt;Thanks&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Zubair Ahmed</dc:creator><pubDate>Fri, 07 Jan 2011 11:02:39 -0000</pubDate><guid>https://sourceforge.netfba415302606a50cba7eb289693e61ca322e42c6</guid></item><item><title>TagLink constructor message massive performance impact</title><link>https://sourceforge.net/p/simmetrics/bugs/5/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;The (misspelled) performance message "WARNING - this metric is not recomended for fast processing..." is causing a massive performance issue when used in a multithreaded environment.&lt;/p&gt;
&lt;p&gt;If several threads each create TagLink algorithm instance, they create contention for the System.out PrintStream which queues all the threads behind each other until the performance message is output.&lt;/p&gt;
&lt;p&gt;I have moved (and spellchecked!) the warning to a static block so it is only output when the class is loaded, and not on every construction and I have a 10 times performance improvement.  I have also made the code stricter around the use of generics to avoid unneccessary casts.&lt;/p&gt;
&lt;p&gt;The irony is not lost on me that a message warning of poor performance is such a massive bottleneck!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Tue, 20 Apr 2010 13:47:48 -0000</pubDate><guid>https://sourceforge.net7848087cbc5feb21a3f56a85ad69a4e30e6f7567</guid></item><item><title>Non-breaking space causes infinite loop</title><link>https://sourceforge.net/p/simmetrics/bugs/4/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;The method tokenizeToArrayList in TokeniserWhitespace.java uses two methods to look at whitespace characters: Character.isWhitespace() and the characters in its delimiters field: "\r\n\t \u00A0". Unfortunately, isWhitespace() does not regard a non-breaking space (\u00A0) as a whitespace character, which ends up causing an infinite loop when tokenising a string.&lt;/p&gt;
&lt;p&gt;The fix is to test for a non-breaking space character when testing isWhitespace() on line 116:&lt;br /&gt;
if (Character.isWhitespace(ch) || (int)ch == 160) {&lt;br /&gt;
curPos++;&lt;br /&gt;
}&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Craig</dc:creator><pubDate>Mon, 21 Sep 2009 13:59:22 -0000</pubDate><guid>https://sourceforge.net68e648776949f60b93111547bff13945128adf9d</guid></item><item><title>Bug with character encoding</title><link>https://sourceforge.net/p/simmetrics/bugs/3/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Some of the metrics (for example BlockDistance) fail if one of the strings has a unicode 160 (non-blocking space) in.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Mon, 07 Jul 2008 16:28:07 -0000</pubDate><guid>https://sourceforge.net2eeaa3d28fdfcb9b0c3cad43a5822f6e3d069c39</guid></item><item><title>Euclidean Distance always returns 0.0</title><link>https://sourceforge.net/p/simmetrics/bugs/2/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I recently installed simmetrics v 1.6 and i'm having a strange result from euclidean distance function.&lt;/p&gt;
&lt;p&gt;I just downloaded the jar and the source, and start playing with the SimpleExample.java file, when trying with the Euclidean distance, it always return 0.0, unless inputing two equal strings.&lt;/p&gt;
&lt;p&gt;abb aba return 0.0&lt;/p&gt;
&lt;p&gt;abc abd return 0.0&lt;/p&gt;
&lt;p&gt;abc abc return 1.0&lt;/p&gt;
&lt;p&gt;I tried with a lot of Strings of different sizes, and had the same result.&lt;/p&gt;
&lt;p&gt;Luis Ibáñez&lt;br /&gt;
ldibanyez@gmail.com&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Thu, 28 Jun 2007 21:18:19 -0000</pubDate><guid>https://sourceforge.nete9e7fa17c129507913cd5b54a868eee2b2186701</guid></item><item><title>Jaro impemetation </title><link>https://sourceforge.net/p/simmetrics/bugs/1/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I’ve found two thing wrong in the implementation of the jaro algorithm and since i am not familiar with the cvs i thought i should post‘em here.&lt;/p&gt;
&lt;p&gt;1) In the computation of distance the line should be &lt;br /&gt;
this.Distance = Math.Min(string1.Length, string2.Length) / 2 + Math.Min(string1.Length, string2.Length) % 2; in order to have a proper rounding&lt;/p&gt;
&lt;p&gt;2) and to avoid the left vs right distance difference that shows up sometimes  we have to edit the following line:&lt;br /&gt;
//compare char with range of characters to either side&lt;br /&gt;
for (int j = Math.Max (0, i - distance); !foundIt &amp;amp;&amp;amp; j &amp;lt;= Math.Min(i + distance, string2.Length - 1 ); j++)&lt;/p&gt;
&lt;p&gt;Keep up the good work!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Tue, 02 Jan 2007 16:18:14 -0000</pubDate><guid>https://sourceforge.net86b803325080a992d0b66f098c5c0c3222237957</guid></item></channel></rss>