[OPEN-ILS-DEV] Link Checker Staff Client patch

Paul Hoffman paul at flo.org
Thu Dec 1 10:37:06 EST 2011


I can see that a lot of effort has gone into this patch.  I hope you 
won't mind my butting in to comment on this -- we're not running 
Evergreen (yet), and I haven't read your code closely, but I've run into 
problems caused by link checkers in the past and have some thoughts on 
how you might rethink things in order to avoid adding to the problem.

In my opinion, it's much better to separate the code that "harvests" 
URLs to check from the code that actually performs the checking.

Why?  First and foremost, link checking is tricky -- it's best done a 
little at a time during off-peak hours (whatever those might be!); you 
have to (or *should*) consult robots.txt once a day (or so) for each 
host; and you really should randomize the order in which you check links 
so that you don't end up unknowingly burying a server in a flurry of 
requests.  (This last point is where I've been burned in the past.)  
Also, I assume (blithely!) that there are already plenty of good link 
checkers out there -- you could even use something as simple as curl or 
wget with the proper options.

Finally, this simply gives you a lot more flexibility while keeping 
things simpler -- the UNIX philosophy, in a nutshell.

Sorry if this comes across as harsh; it's meant as constructive 
criticism.

Paul.

On Wed, Nov 30, 2011 at 10:31:09PM -0500, Liam Whalen wrote:
> I have been working with George Duimovich to integrate the Link Checker code that he sent to the list in this message http://markmail.org/message/kgbpzgg25cm6fqcs into the Staff Client.  I have completed a working version that has been integrated with Evergreen 2.1.1.   I have included a patch that will upgrade a 2.1.1 branch with the needed changes to run the Link Checker code.  Once the patch is applied and integrated into a working Evergreen install, a new Staff Client needs to be built to provide access to the Link Checker menu item.  The changes to the Staff Client add a Link Checker menu item underneath the Cataloging menu.
> 
> I have included some basic documentation with this email that explains how to use the Link Checker.  It is very simple.  Before you can use the Link Checker code, you need to create the nrcan_contrib schema that will hold the database tables necessary for the Link Checker code to work.  The patch will create a small script (nrcan_contrib.sql) that can be used with the psql command to create the necessary tables.  The documentation goes over these details.
> 
> Because I don't have access to the IRC channel at work, and I did not want to pepper the dev list with questions every time I hit a snag, I spent a lot of time greping code from the existing code base: this means I am probably doing things sub-optimally.  Any pointers or feedback about my code is greatly appreciated.  If there are any questions please feel free to ask me, I will be happy to answer them.
> 
> I started developing this code on a 2.0.8 system, and I'm fairly certain that the LinkChecker.pm file, the fm_IDL.xml file, and the various xul/server/cat/ files can be copied into a 2.0.8 system to get the server side of the code working. However, I don't think the Staff Client modifications will port very well to 2.0.8 due to changes in the po files.  I still have a 2.0.8 client that works with my changes.  If someone would like to try this out on a 2.0.8 systems let me know and I will dig up my 2.0.8 modified Staff Client.
> 
> Regarding the po files and internationalization in general, this code needs some work.  Most of the strings are still hard coded into the code.  I started moving strings into Entities and the cat.properties file, but I ran into an error with the catStrings variable and I couldn't resolve it, so I decided to get something working and return to the strings when I could get some feedback about catStrings.  Also, I have built the client with access to 5 different locales to make sure my code works in different locales.  It seems the changes I made to the various po files are working, but I'm not sure if I made enough changes.  I added specific entries for variables in en-CA, en-GB, and fr-CA, and I added entries to the .pot files, which I assumed control the entries for locales where there is no specific entry.  Is this a correct assumption?
> 
> Some things that could be added to this code include a method to share reports among different Staff.  A simple related table with a report id and a staff id could be used to provide others with access to reports not created by themselves.  The biggest amount of work in that instance would be some kind of staff picker code that would allow people to choose which Staff to share a report with.  As well, I have left a column in one of the database tables to hold URLs returned by web servers in the case of 300 level http errors.  This column could be used to create suggestions for possible replacement URLs.  A search and replace feature would also be a nice addition to the code.
> 
> I hope this code proves useful to someone.  Please contact me with any questions.
> 
> Liam Whalen
> NRCan Library
> lwhalen at nrcan.gc.ca<mailto:lwhalen at nrcan.gc.ca> <- reach me at work
> lwhalen at uwo.ca<mailto:lwhalen at uwo.ca> <- reach me at home
> 
> 
> DCO for NRCan Library
> ------------------------ Developer's Certificate of Origin 1.1
> By making a contribution to this project, I certify that:
> (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or
> (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or
> (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.
> (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.
> Signed-off-by: Liam Whalen
> 


> sh: w3m: command not found


> sh: w3m: command not found


-- 
Paul Hoffman <paul at flo.org>
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 445-2914
(617) 442-2384 (FLO main number)


More information about the Open-ils-dev mailing list