[open-ils-commits] r1155 - servres/trunk/conifer/syrup (gfawcett)

svn at svn.open-ils.org svn at svn.open-ils.org
Thu Dec 30 11:48:28 EST 2010


Author: gfawcett
Date: 2010-12-30 11:48:25 -0500 (Thu, 30 Dec 2010)
New Revision: 1155

Modified:
   servres/trunk/conifer/syrup/models.py
Log:
Tune item-sorting to ignore punctuation

Article titles with quotation marks in them were throwing off the sort.

Modified: servres/trunk/conifer/syrup/models.py
===================================================================
--- servres/trunk/conifer/syrup/models.py	2010-12-30 16:37:15 UTC (rev 1154)
+++ servres/trunk/conifer/syrup/models.py	2010-12-30 16:48:25 UTC (rev 1155)
@@ -307,10 +307,14 @@
         # TODO: internationalize the stopwords list.
         STOPWORDS = set(['a', 'an', 'that', 'there', 'the', 'this'])
 
+        RE_PUNCTUATION = re.compile("""[,'".:;]""")
+
         def sort_title(item):
             """First cut of a stop words routine."""
-            normal_text = [t for t in item.lower().split() if t not in STOPWORDS]
-            return  " ".join(normal_text)
+            text = item.lower()
+            text = RE_PUNCTUATION.sub('', text) # remove common punctuation
+            words = [t for t in text.split() if t not in STOPWORDS]
+            return  " ".join(words)
 
         items = self.items()
         # make a node-lookup table



More information about the open-ils-commits mailing list