[open-ils-commits] r1155 - servres/trunk/conifer/syrup (gfawcett)
svn at svn.open-ils.org
svn at svn.open-ils.org
Thu Dec 30 11:48:28 EST 2010
Author: gfawcett
Date: 2010-12-30 11:48:25 -0500 (Thu, 30 Dec 2010)
New Revision: 1155
Modified:
servres/trunk/conifer/syrup/models.py
Log:
Tune item-sorting to ignore punctuation
Article titles with quotation marks in them were throwing off the sort.
Modified: servres/trunk/conifer/syrup/models.py
===================================================================
--- servres/trunk/conifer/syrup/models.py 2010-12-30 16:37:15 UTC (rev 1154)
+++ servres/trunk/conifer/syrup/models.py 2010-12-30 16:48:25 UTC (rev 1155)
@@ -307,10 +307,14 @@
# TODO: internationalize the stopwords list.
STOPWORDS = set(['a', 'an', 'that', 'there', 'the', 'this'])
+ RE_PUNCTUATION = re.compile("""[,'".:;]""")
+
def sort_title(item):
"""First cut of a stop words routine."""
- normal_text = [t for t in item.lower().split() if t not in STOPWORDS]
- return " ".join(normal_text)
+ text = item.lower()
+ text = RE_PUNCTUATION.sub('', text) # remove common punctuation
+ words = [t for t in text.split() if t not in STOPWORDS]
+ return " ".join(words)
items = self.items()
# make a node-lookup table
More information about the open-ils-commits
mailing list