NEW BOT FILTERING!
Start of IP checking
End of IP checking
View and Rank Trains
John Hurst
Version 1.0.0
20161223:102357
Table of Contents
1. Overview
This document describes various python cgi scripts for viewing
the author's railway photograph collection. The collection is
held in a database (directory plus sub-directories) that can be
accessed from a variety of web pages (see for example, my school
server page). There are two main scripts, one to view an
image at full resolution, and one to view the popularity
rankings of the images. Other scripts described here help in
maintaining the system.
The first script defined is <viewtrains.py 2.1,2.2>. This script takes a
single parameter, which is the short name of an image, and
searches the database for that image, and then renders it at
full resolution, along with data about the image retrieved from
the associated XML file. The original version of this script
generated the output HTML directly. This has been re-engineered
(20161222) to generate an XML file first, which is them
translated to HTML. This is to make the script consistent with
all the static files in the train catalogue.
The other main script is <ranktrains.py 3.1>, which delivers a page of
thumbnail images in order of popularity. Because of the large
number of images, each such page delivers only a few of the
total number of images, but buttons are included to navigate
around the complete rankings.
<tidytrains.py 4.1> reorganizes the ranking
files. It reads two files to determine current rankings. The
first file contains a single entry for each image, containing
the current voting score, and the second file contains a list of
votes since then. This ranking is then used to determine if any
images should be removed from the system, depending upon whether
or not their rank is less than some threshold.
<ranking.py 5.1,5.2> reads two files to
determine current rankings. The first file contains a single
entry for each image, containing the current voting score, and
the second file contains a list of votes since then. A third
file is then constructed, containing the updated rankings.
<rank.py 6.1> provides three procedures
(strtotime, rankdata, ranklog) for other
programs.
2. viewtrains.py Main Body
"viewtrains.py" 2.1 =
Here we do all the definitions. There is a subtlety here, as we
cannot define all the global constants until we upon which
server we are running. Hence the true globals are defined
first, then we identify the server (which also defines globals
local to the particular server OS, then we define all the
globals relevant to this particular program.
Note that <define global constants 2.4> and <determine server environment 2.9> are also used by other
programs defined within this literate program.
"viewtrains.py" 2.2 =clientIP=os.environ["REMOTE_ADDR"]
if debug: sys.stderr.write("\nLOGFILE={0}\n".format(LOGFILE))
# start constructing the temporary xml file
rawxml=open(TEMPFILE,'w')
print "Content-Type: text/html\n\n";
<collect cgi parameters 2.12>
<collect previous rankings 2.13>
<viewtrains add header 2.14>
# This note is by way of an interim explanation. Remove after 20171224.
rawxml.write("""<p><b>Note:</b> <i>this page has been re-engineered to
be consistent with all the other railway pages. The good news
(?) is that the fault which precluded the use of Microsoft
Explorer has now gone, although whether that is because
Microsoft have fixed their software in the meantime, or the new
design avoids the Microsoft bug is not clear.
Anyway, enjoy!</i></p>\n""")
if
imageisrelative:
display(imageparm)
else:
res=
visit(
top,0,imageparm)
scriptnameparm="viewtrains.py?image=%s" % (escimageparm)
<viewtrains add trailer 2.15>
rawxml.close()
<viewtrains translate XML to HTML 2.10>
The main work of this script is done in one of the two
procedures display and visit, depending upon
whether the user offered a relative image path or not.
Somewhat contradictory to normal usage, a relative path refers
to the location of the image relative to the base trains
directory, rather than the root directory, hence the
terminology. A non-relative path simply gives the image name,
and hence a full directory search must be carried out in order
to locate the image.
Note that some global constants are determined only after the
server has been identified. For this reason, the <determine server environment 2.9> must be invoked after the
other global constants are defined.
2.1 Initialisation
2.1.1 Viewtrains Imports
<viewtrains imports 2.3> =import cgi
import os
import datetime
import math
import rank
from rank import DECAY
import re
import socket
import string
from subprocess import Popen,PIPE
import sys
import time
import urllib
from xml.dom.minidom import parse, parseString, Node
import cgitb
cgitb.enable()
2.1.2 Define Constants
<define global constants 2.4> =(year, month, day, hour, minute, second, weekday, yday, DST) = \
time.localtime(time.time())
debug=0
debugFlag=0
convertXML=False
cachedHTML=False
htmlpath=''
EXTN=".xml"
<globals for macosx 2.5> =CGIBIN="http://%s/~ajh/cgi-bin" % (server)
HOMEPAGE="http://%s/~ajh" % (server)
BASEPAGE="/home/ajh/www"
LOGFILE="/home/ajh/local/%s/logs" % server
<globals for linux 2.6> =CGIBIN="http://%s/~ajh/cgi-bin" % (server)
HOMEPAGE="http://%s/~ajh" % (server)
BASEPAGE="/home/ajh/www"
LOGFILE="/home/ajh/local/%s/logs" % server
<define constants for viewtrains 2.7> =# define constants for viewtrains
tm = "%4d%02d%02d:%02d%02d" % (year, month, day, hour, minute)
now=datetime.datetime.now()
startnow=datetime.datetime.now()
tsstring=now.strftime("%Y%m%d:%H%M")
today=startnow.strftime("%Y%m%d")
todayStr=now.strftime("%d %b %Y")
TEMPFILE='/tmp/viewtrain.xml'
2.1.3 Define Regular Expression Patterns
<define regular expression patterns 2.8> =ignoredirs = re.compile('(tmp)|(units)')
lepatstr="(\d{4})" # year
lepatstr+="(\d{2})" # month
lepatstr+="(\d{2}):" # day
lepatstr+="(\d{2})" # hour
lepatstr+="(\d{2}) " # minute
lepatstr+="([0-9a-f\.:]+ )?" # ip address
lepatstr+="trains/" # match and discard base of image directories
lepatstr+="(.*)$" # image path
logentrypat=re.compile(lepatstr)
notserved=re.compile(".*\*\*\* .* \*\*\*")
<determine server environment 2.9> =# determine which host/server environment
(system,host,release,version,machine)=os.uname()
server=''
if os.environ.has_key("SERVER_NAME"):
server=os.environ["SERVER_NAME"]
else:
server=socket.gethostname()
if debug:
sys.stderr.write("server={}; host={}\n".format(server,host))
if system=='Darwin':
<globals for macosx 2.5>
elif system=='Linux':
<globals for linux 2.6>
else:
print "Unknown system %s<br/>" % (system)
sys.exit(1)
SCRIPT=CGIBIN+"/viewtrains.py"
top=
BASEPAGE+"/trains"
XSLTPROC='/usr/bin/xsltproc'
if debug:
sys.stderr.write("server={}; host={}\n".format(server,host))
2.2 Perform the XML Translation
<viewtrains translate XML to HTML 2.10> =filename=TEMPFILE
relcwd='/home/ajh/www/trains'
URL=os.getenv('HTTP_REFERER')
BASE='/home/ajh/www'
PRIVATE="/home/ajh/local/"+server
# define the parameters to the translation
filestat=os.stat(TEMPFILE)
filemod=filestat.st_mtime
dtfilemod=datetime.datetime.fromtimestamp(filemod)
dtstring=dtfilemod.strftime("%Y%m%d:%H%M")
parms=""
parms+="--param xmltime \"'%s'\" " % (dtstring)
parms+="--param htmltime \"'%s'\" " % (tsstring)
parms+="--param filename \"'%s'\" " % (filename)
parms+="--param relcwd \"'%s'\" " % (relcwd)
parms+="--param URL \"'%s'\" " % (URL)
parms+="--param today \"'%s'\" " % (todayStr)
parms+="--param host \"'%s'\" " % (host)
parms+="--param server \"'%s'\" " % (server)
parms+="--param base \"'%s'\" " % (BASE)
for key in form:
if debug:
sys.stderr.write('form[{0}]={1}\n'.format(key,form[key]))
value=form[key].value
parms+="--param "+key+" \"'%s'\" " % (value)
xslfile='/home/ajh/lib/xsl/ajhwebrail.xsl'
requestedFile=TEMPFILE
<viewtrains convert XML to HTML 2.11>
2.2.1
<viewtrains convert XML to HTML 2.11> =cmd=XSLTPROC+" --xinclude %s%s %s " % (parms,xslfile,requestedFile)
#(pipein,pipeout,pipeerr)=os.popen3(cmd)
pid=Popen(cmd,shell=True,stdout=PIPE,stderr=PIPE,close_fds=True)
(pipeout,pipeerr)=(pid.stdout,pid.stderr)
if debugFlag:
cwd=os.getcwd()
print "<p>%s: (cwd:%s) %s</p>" % (tsstring,cwd,cmd)
sys.stderr.write("(cwd:%s) %s: %s\n" % (cwd,tsstring,cmd))
# report the fact, and the context (debugging purposes)
if debugFlag:
print "%s: converting %s with %s\n" % (tsstring,requestedFile,xslfile)
# process the converted HTML
convertfn="/home/ajh/www/tmp/convert.html"
if convertXML:
try:
htmlfile=open(convertfn,'w')
except:
msg="couldn't open HTML conversion file %s" % convertfn
sys.stderr.write("%s: %s\n" % (tsstring,msg))
convertXML=False
# check that directory exists
dirpath=os.path.dirname(htmlpath)
#if not os.path.isdir(dirpath):
# os.makedirs(dirpath,0777)
#htmlfile2=open(htmlpath,'w')
for line in pid.stdout.readlines():
print line,
# htmlfile2.write(line)
# if convertXML:
# htmlfile.write("%s\n" % line)
#if convertXML:
# htmlfile.close()
#htmlfile2.close()
#os.chmod(htmlpath,0666)
errs=[]
for line in pipeerr.readlines():
errs.append(line)
logfile=PRIVATE+'/xmlerror.log'
logfiled=open(logfile,'a')
if errs:
logfiled.write("%s: ERROR IN REQUEST %s\n" % (tsstring,requestedFile))
print "<HR/>\n"
print "<H3>%s: MESSAGES GENERATED BY: %s</H3>\n" % (tsstring,requestedFile)
print "<PRE>"
for errline in errs:
logfiled.write("%s: %s" % (tsstring,errline))
errline=cgi.escape(errline)
errline=errline.rstrip()
print "%s: %s" % (tsstring,errline)
print "</PRE>"
print "<p>Please forward these details to "
print "<a href='mailto:ajh@csse.monash.edu.au'>John Hurst</a>"
else:
logfiled.write("%s: %s: NO ERRORS IN %s\n" % (tsstring,clientIP,requestedFile))
logfiled.close()
pipeout.close(); pipeerr.close()
2.3 Supporting Code
2.3.1 Collect cgi parameters
<collect cgi parameters 2.12> =form = cgi.FieldStorage()
if form.has_key('debug'):
debug=form['debug'].value
if debug:
print form
print cgi.print_environ()
print os.environ
ipadr=convertIPtoHex(os.getenv("REMOTE_ADDR"))
gotparms=0; dontlog=0
if form.has_key("image"):
imageparm=form["image"].value
gotparms=1
res=re.match('([^.]+).jpg$',imageparm)
if res:
imageparm=res.group(1)
res=re.match('^trains/',imageparm)
if res:
imageisrelative=1
else:
imageisrelative=0
if form.has_key("disablevote"):
dontlog=1
if not gotparms:
rawxml.write("<H1>Error!</H1>\n")
rawxml.write("<P>You are using a browser which has not passed in the ")
rawxml.write("cgi parameters ")
rawxml.write("correctly. Please use a different browser that does ")
rawxml.write("handle parameters properly. ")
rawxml.write("(Mozilla, Safari, Epiphany, Firefox are known to work).</P>")
rawxml.write("<P>Alternatively, type the name of an image into the following ")
rawxml.write("box and click submit/hit enter. Note: just the image name ")
rawxml.write("is required, don't use any path prefix.</P>")
rawxml.write(" <p></p>\n")
rawxml.write(" <form action=\"%s/viewtrains.py\" method=\"post\">\n" % (SCRIPT))
rawxml.write(" <input type=\"submit\" value=\"submit\"/>")
rawxml.write(" <input type=\"text\" size=\"30\" name=\"image\" value=\"%s\"/>" % (""))
rawxml.write(" </form>")
rawxml.close()
rawxml=open(TEMPFILE,'r')
for l in rawxml.readlines():
print l
rawxml.close()
sys.exit(0)
escimageparm=urllib.quote(imageparm)
Use the Python library to retrieve cgi parameters. Currently
there only one, image, which is the name of an
image in the train library. Two alternatives are available:
- The parameter starts with "trains/", in which
case it is a relative pathname into the trains directory,
and no searching is required; or
- It does not, in which case the name must be searched
against the image library to find the required
image.
The choice between these two is flagged in the variable
imageisrelative.
Discard any ".jpg" suffix.
Escape any suspect URL parameter characters.
2.3.2 Collect Previous Rankings
All previous rankings have been reduced to a single vote
value for each image. These values are stored in a file
RANKINGS, together with the date and time of the
rankings. These votes are exponentially decayed, and used as
the base values for any additional votes cast since that
date.
<collect previous rankings 2.13> =
RANKINGS=LOGFILE+"/trainrank"
VIEWINGS=LOGFILE+"/trainview"
totalimages, datatime, votefactor,
table = rank.rankdata(RANKINGS)
Call the rank module to collect current ranking data.
This consists of a quadruple:
- totalimages
- The total number of images in the ranking data
- datatime
-
The date and time at which these rankings were recorded
(essential for computing the decay factor in rankings)
- votefactor
- table
-
a dictionary of images and their current votes, indexed
by the canonical image name (see
<viewtrains: getrank: get relative path to image 2.20>)
2.3.3 Headers and Trailers
<viewtrains add header 2.14> =# viewtrains add header
#rawxml.write("QUERY_STRING=",os.getenv("QUERY_STRING"),"<br/>")
#rawxml.write("USER_AGENT=",os.getenv("USER_AGENT"),"<br/>")
rawxml.write('<?xml version="1.0"?>\n')
rawxml.write('<?xml-stylesheet href="file:///home/ajh/www/lib/xsl/ajhwebrail.xsl"?>\n')
rawxml.write('<!DOCTYPE TrainPage SYSTEM "file:///home/ajh/www/trains/TrainPage.dtd">\n')
rawxml.write('<TrainPage xmlns:xi="http://www.w3.org/2001/XInclude"\n')
rawxml.write(' system="" page="index"\n')
rawxml.write(' searchbutton="Railway"\n')
rawxml.write(' search="trains/">\n')
rawxml.write(' <border>Trains:AJH</border>\n')
rawxml.write(' <TrainHeader>\n')
rawxml.write(' <Shunting system="Central"/>\n')
rawxml.write(' </TrainHeader>\n')
<viewtrains add trailer 2.15> =# viewtrains add trailer
#rawxml.write(' <includeHTML file="/trains/WebRing.html"/>\n')
rawxml.write(' <TrainTrailer/>\n')
rawxml.write('</TrainPage>\n')
2.4 Define Subroutines
2.4.1 Define Subroutines
<define subroutines 2.16> =
These procedures are of sufficient significance that they
have been moved to a separate section.
2.4.2 Define function convertIPtoHex
<define function convertIPtoHex 2.17> =# define function convertIPtoHex
def convertIPtoHex(ipadrDec):
ipadrHex=ipadrDec
res=re.match(r'(\d+)\.(\d+)\.(\d+)\.(\d+)',ipadrDec)
if res:
d1=int(res.group(1))
d2=int(res.group(2))
d3=int(res.group(3))
d4=int(res.group(4))
ipadrHex = "%02x%02x%02x%02x" % (d1,d2,d3,d4)
return ipadrHex
The logic of this function is simple enough: extract the
integer (decimal) values of each field in an IP address, and
convert each to a two-digit hexadecimal value. Concatenate
all these into a single hex string, which is returned.
2.4.3 Define procedure log
<define procedure log 2.18> =def log(ipadr,image,acc,ok):
global dontlog
if dontlog:
return
access=""
if not acc:
refer = os.getenv("HTTP_REFERRER")
access=" *** not served *** (ref: %s)" % (refer)
elif not ok:
access=" *** already voted ***"
try:
f=open(VIEWINGS,'a')
except:
print "Cannot open logfile %s" % (VIEWINGS)
f.write("%s %s %s%s\n" % (tm,ipadr,image,access))
f.close()
#print "<P>Logged %s %s</P>" % (tm,image)
Every image access is logged, for recording its popularity.
The exceptions are where there is an explicit request not to
log (dontlog is true), and where there is some problem
in delivering the image.
2.4.4 Determine rank of image
<define getrank routine 2.19> =
Match the various fields in a logfile entry. These are the
year, month, day, hour and minute of the entry (in the
format YYYYMMDD:hhmm), followed by the IP address
(now stored in hexadecimal, but originally in decimal, and
before that, not at all), and then the image address,
including the base directory trains/, which is
stripped off. Note that any logging of whether the image
actually was served, or had already been voted upon, has
been discarded previously.
<viewtrains: getrank: get relative path to image 2.20> =res=re.match(r'trains/(.*)$',imageparm)
if res:
imageparm=res.group(1)
thisImageIsScorable=1
res=re.match('.*trains/(.*)$',path)
if res:
path=res.group(1)
The relative path to the image is the absolute pathname for
the image, with all prefix directories upto and including
trains/ stripped off. This gives a unique canonical
form for the image. Note that the image type .jpg
has already been stripped.
<viewtrains: getrank: open the VIEWINGS file 2.21> =try:
data=open(VIEWINGS)
except:
print "Cannot open logfile %s" % (VIEWINGS)
sys.exit(1)
<viewtrains: getrank: read the VIEWINGS file 2.22> =
<getrank: extract data from a single logfile entry 2.23> =res=
logentrypat.match(l)
if res:
logdate=l[0:8]
year=int(res.group(1))
month=int(res.group(2))
day=int(res.group(3))
hour=int(res.group(4))
minute=int(res.group(5))
accesstime = datetime.datetime(year,month,day,hour,minute)
timesinceaccess=now-accesstime
dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0
expval=-dayssinceaccess/DECAY
voteval=math.exp(expval)
if debug:
sys.stderr.write("%1.4f %2.6f %2.5f" % \
(voteval,expval,dayssinceaccess))
thisipadr=res.group(6)
if thisipadr:
thisipadr=thisipadr.strip()
imagepath=res.group(7)
if debug:
sys.stderr.write("%s %s %s : %s %s <%s> %s" % \
(year,month,day,hour,minute,thisipadr,imagepath))
if table.has_key(imagepath):
table[imagepath]+=voteval
else:
table[imagepath]=voteval
if debug:
sys.stderr.write("{}={}, {}={}, {}={}\n".format(\
thisipadr,ipadr,imagepath,imageparm,logdate,today))
if thisipadr==ipadr and imagepath==imageparm and logdate==today:
thisImageIsScorable = 0
<viewtrains: getrank: compute new ranking of this image 2.24> =list=[]
for key in sorted(table.keys()):
list.append((table[key],key))
#print "%f %s" % (table[key],key)
sortlist=sorted(list,reverse=True)
totalimages=len(sortlist)
i=1; last=0.0; rank=1; thisrank=0
for (n,k) in sortlist:
if last!=n:
rank=i
#print "%4d %2.6f %s<br/>" % (rank,n,k)
i+=1
last=n
if k==path:
thisrank=rank
break
2.4.5 Define the XML Dispatch Routines
<define the XML dispatch routines 2.25> =def escape(t):
return re.sub('&','&',t)
def doname(elem,path):
if elem.firstChild:
text=elem.firstChild.nodeValue
pathsplit=re.match(
BASEPAGE+'/(.*)/([^/]*)$',path)
if pathsplit:
pathbase=pathsplit.group(1)
pathfile=pathsplit.group(2)
dirattr=elem.getAttribute('dir')
dir=""
if dirattr:
dir=dirattr
page=""
pageattr=elem.getAttribute('page')
if pageattr:
url=pathbase+'/'+pageattr+EXTN+'#'+pathfile
page=pageattr
else:
url=pathbase+'/index'+EXTN+'#'+pathfile
rawxml.write("<item><i>name:</i> <uri href=\"%s\">%s</uri> dir=%s page=%s</item>\n" % (url,text,dir,page))
else:
rawxml.write("<item><i>name:</i> %s</item>\n" % (text))
def dothumb(elem,path):
pass
#text=elem.firstChild.nodeValue
#print "<LI><I>thumb:</I> %s</LI>" % text
def dosize(elem,path):
bytes=pixels=""
print "%s" % (elem)
attrs=elem.attributes
for i in range(attrs.length):
attr = attrs.item(i)
if attr.name=='bytes':
bytes=attr.value
if attr.name=='pixels':
pixels=attr.value
rawxml.write("<item><i>size:</i> %s bytes, %s pixels</item>\n" % (bytes,pixels))
def dodate(elem,path):
taken=catalogued=""
attrs=elem.attributes
for i in range(attrs.length):
attr = attrs.item(i)
if attr.name=='taken':
taken=attr.value
if attr.name=='catalogued':
catalogued=attr.value
rawxml.write("<item><i>date:</i> taken: %s, catalogued %s</item>\n" % (taken,catalogued))
def dodir(elem,path):
text=elem.firstChild.nodeValue
rawxml.write("<item><i>directory:</i> %s</item>\n" % text)
def dopage(elem,path):
text=elem.firstChild.nodeValue
rawxml.write("<item><i>page:</i> %s</item>\n" % text)
def dophotographer(elem,path):
text=elem.firstChild.nodeValue
rawxml.write("<item><i>photographer:</i> %s</item>\n" % text)
def doindex(elem,path):
if elem.firstChild:
text=escape(elem.firstChild.nodeValue)
rawxml.write("<item><i>index terms:</i> %s</item>\n" % text)
def totext(node,path):
if node.nodeType==node.TEXT_NODE:
return node.nodeValue
elif node.nodeType==node.ELEMENT_NODE:
text=''
for n in node.childNodes:
text=text+totext(n,path)
text=escape(text)
if node.tagName=='narrower':
return "<DIV style=\"margin-left:20;font-style:italic\">%s</DIV>" % text
elif node.tagName=='uri':
href='unknown'
attributes=node.attributes
print attributes
for i in range(attributes.length):
attr = attributes.item(i)
#print "Looking at attribute name %s" % (attr)
if attr.name=='href':
href=attr.value
return "<uri href=\"%s\">%s</uri>" % (href,text)
elif node.tagName=='p':
return "<p>%s</p>" % (text)
elif node.tagName=='b':
return "<B>%s</B>" % (text)
elif node.tagName=='i':
return "<i>%s</i>" % (text)
elif node.tagName=='em':
return "<EM>%s</EM>" % (text)
elif node.tagName=='dq':
return "\"%s\"" % (text)
elif node.tagName=='medium':
return text
elif node.tagName=='catalogue':
return text
elif node.tagName=='description':
return text
else:
return "&lt;%s>%s&lt;/%s>" % (node.tagName,text,node.tagName)
elif node.childNodes:
text=''
for n in node.childNodes:
text=text+totext(n,path)
return text
else:
return "**unknown node**"
def dodescription(elem,path):
text=totext(elem,path)
rawxml.write("<item><i>description:</i> %s</item>\n" % (text))
def domedium(elem,path):
text=totext(elem,path)
rawxml.write("<item><i>medium:</i> %s</item>\n" % text)
def docatalogue(elem,path):
text=totext(elem,path)
rawxml.write("<item><i>catalogue:</i> %s</item>\n" % text)
dispatch={'name':doname,
'thumb':dothumb,
'size':dosize,
'date':dodate,
'dir':dodir,
'page':dopage,
'photographer':dophotographer,
'index':doindex,
'medium':domedium,
'catalogue':docatalogue,
'description':dodescription}
2.4.6 Define procedure to display the image
This is the procedure that does most of the real work in
displaying the full image.
<define procedure to display the image 2.26> =def
display(image):
global totalimages,imageparm
#sys.stderr.write("display({})\n".format(image))
path=
top[0:len(
top)-6]+image
xmlfile=path+EXTN
jpgfile=path+".jpg"
acc=os.access(jpgfile,os.R_OK)
pathsplit=re.match(
BASEPAGE+'/(.*)/([^/]*)$',path)
if pathsplit:
pathbase=pathsplit.group(1)
pathfile=pathsplit.group(2)
if acc:
<display can access file 2.27>
else:
<display cannot access file 2.28>
#rawxml.write('<P><A href="%s">Go to %s</A></P>' % (image,image))
# get image name
imagename=image
res=re.match('(.*/)?([^/]*)$',image)
if res:
imagename=res.group(2)
else:
print "Shouldn't happen!"
(totalimages,rank,ok)=getrank(path,image)
log(ipadr,image,acc,ok)
startrank = 25 * ((rank-1) / 25)
<Print ranking information 2.30>
if not ok:
rawxml.write('<p>You have already voted for this image today!</p>\n')
return
<display can access file 2.27> =#rawxml.write("<P>%s,%08x</P>" % (jpgfile,acc))
rawxml.write("<img src=\"%s\"/>\n" % (image+".jpg"))
rawxml.write('<itemize>\n')
rawxml.write("<item><i>file:</i> %s </item>\n" % (xmlfile))
dom=parse(xmlfile)
elems=dom.getElementsByTagName('image').item(0).childNodes
for n in elems:
if n.nodeType == Node.ELEMENT_NODE:
if dispatch.has_key(n.tagName):
dispatch[n.tagName](n,path)
else:
rawxml.write("Could not dispatch tag %s<br/>" % (n.tagName))
rawxml.write('</itemize>\n')
The file to display is accessible, so generate the HTML
reference to it, print the name of the XML file, then parse
it in order to display the various attributes relating to
the image (as defined in the XML file).
<display cannot access file 2.28> =if pathsplit:
gifpath=pathbase+'/thumb/'+pathfile+'.gif'
gifacc=os.access(
BASEPAGE+'/'+gifpath,os.R_OK)
else:
gifacc=False
#rawxml.write("<P>%s,%08x</P>" % (jpgfile,acc))
rawxml.write("<P><B>The file %s is not available</B>" % (jpgfile))
if gifacc:
rawxml.write('<img src="'+gifpath+'"/></P>')
rawxml.write('<P>The image has been removed for space reasons. ')
rawxml.write('It will be retrieved overnight.</P>')
else:
rawxml.write("</P>")
res=re.match(".*/([^/]*)$",path)
name="Sorry"
if res:
name=res.group(1)
<Explain missing images 2.29>
<Explain missing images 2.29> =rawxml.write("<P>This may be because the file has been relocated to a ")
rawxml.write("different location. Try clicking this link to search the ")
rawxml.write("website:\n")
rawxml.write('<form action="{}" method="post" name="image">\n'.format(SCRIPT))
rawxml.write('<input type="submit" name="image" value="{}">')
rawxml.write('<img src="{}"/></input>\n'.format(name,gifpath))
rawxml.write('</form>\n')
rawxml.write('If that does not work, it may be that the image has been ')
rawxml.write('removed for space reasons. Sorry.')
<Print ranking information 2.30> =rawxml.write('<form action="{}#{}" method="get">\n'.format(os.getenv('HTTP_REFERER'),imagename))
rawxml.write('<input type="hidden" name="return" value="back"/>\n')
rawxml.write('</form>\n')
rawxml.write('<form action="{0}/ranktrains.py" method="post">\n'.format(CGIBIN))
rawxml.write('This image ranks {0:d} out of {1:d}\n'.format(rank,totalimages))
rawxml.write('<button type="submit" name="startnum" value="{0:d}">{1:04d}-{2:04d}</button>\n'.format(startrank,startrank+1,startrank+25))
rawxml.write('\n')
rawxml.write('</form>\n')
2.4.7 Define Procedure to Search Directories
The procedure visit is called when we have an image
name, but no path to the image. The procedure recursively
visits all directories reachable from the initial parameter
dir, and if it finds the image, calls display
to perform the actual display of the image. It then
returns. Thumbnail directories are skipped. It is assumed
that the initial dir parameter contains the substring
trains/, indicating where the trains subdirectory
begins.
<define procedure to search directories 2.31> =def
visit(
dir,level,image):
list = os.listdir(
dir)
for f in list:
if f == 'thumb':
continue
path =
dir + "/" + f
if f == image+'.jpg':
res=re.match('(.*)(trains/[^.]*)\.jpg',path)
if res:
display(res.group(2))
return 1
if os.path.isdir(path):
res=
visit(path,level+2,image)
if res:
return 1
return 0
3. ranktrains.py
ranktrains.py is a web script that delivers pages of
rankings for the railway database. Each page is limited in size
(currently 25 images), and there are navigation buttons to
browse forwards and backwards through the rankings. Images are
buttons that take the viewer to the full-size image (via
viewtrains), while image titles are links that take the viewer
to the home page of the image.
"ranktrains.py" 3.1 =
3.1 ranktrains: Define constants
This macro is also defined in other sections.
<ranktrains: define constants 3.2> =
3.2 ranktrains: Collect cgi parameters
<ranktrains: collect cgi parameters 3.3> =form = cgi.FieldStorage()
gotparms=0
numtodisplay=25; startnum=0
if form.has_key("number"):
numtodisplay=int(form["number"].value)
gotparms=1
if form.has_key("startnum"):
startnum=int(form["startnum"].value)
gotparms=1
stopnum=startnum+numtodisplay
if debug:
print "numtodisplay = %d, startnum = %d" % (numtodisplay,startnum)
3.3 ranktrains: Collect previous ranking information
<ranktrains: collect previous ranking information 3.4> =RANKINGS=LOGFILE+"/trainrank"
VIEWINGS=LOGFILE+"/trainview"
totalimages, datatime, votefactor, table = rank.rankdata(RANKINGS)
if debug:
print "totalimages=%s<br/>" % (totalimages)
print "datatime=%s<br/>" % (datatime)
print "votefactor=%s<br/>" % (votefactor)
print "table=%s<br/>" % (table)
3.4 ranktrains: Update Rankings with Latest Log Info
<ranktrains: update rankings with latest log info 3.5> =# ranktrains: update rankings with latest log info
<define regular expression patterns 2.8>
totalimages, logcount, ranktime, sorttime, sortlist = \
rank.
ranklog(VIEWINGS,table,notserved)
if debug:
print "totalimages=%s<br/>" % (totalimages)
print "logcount=%s<br/>" % (logcount)
print "ranktime=%s<br/>" % (ranktime)
print "sorttime=%s<br/>" % (sorttime)
print "sortlist=%s<br/>" % (sortlist)
3.5 rankings: Print Forward and Backward Buttons
<rankings: print forward and backward buttons 3.6> =rawxml.write("<form action=\""+CGIBIN+"/ranktrains.py\" method=\"post\">\n")
rawxml.write("<table align=\"center\"><tr>\n")
#rawxml.write("<input type=\"hidden\" name=\"number\" value=\"%d\"/>" % (numtodisplay))
if startnum-numtodisplay>=0:
rawxml.write("<td><button type=\"submit\" name=\"startnum\" ")
rawxml.write("value=\"%d\">Prev (%d-%d)</button></td>" % \
(startnum-numtodisplay,startnum-numtodisplay+1,startnum))
else:
rawxml.write("<td><button type=\"submit\">(Prev)</button></td>")
rawxml.write("<td><button type=\"submit\" name=\"startnum\" ")
rawxml.write("value=\"%d\">Next (%d-%d)</button></td>" % \
(startnum+numtodisplay,startnum+numtodisplay+1,startnum+2*numtodisplay))
rawxml.write("</tr></table>\n")
rawxml.write("</form>\n")
3.6 ranktrains: Generate Image Rankings
<ranktrains: generate image rankings 3.7> =rawxml.write("<title>Image Rankings %d - %d</title>\n" % (startnum+1,stopnum))
rawxml.write("<form action=\""+CGIBIN+"/viewtrains.py\" ")
rawxml.write("method=\"post\" name=\"image\"><table align=\"center\">\n")
perline=5; posonline=0
i=1; last=0.0; rank=1
for (n,k) in sortlist:
#rawxml.write("%1.4f %s <br/>" % (n,k))
#(n,k) = sortlist[i]
if last!=n:
rank=i
if i>startnum:
if posonline==0:
rawxml.write("<tr>\n")
res=re.match("(.*)/([^/]*)",k)
path="" ; image=""
if res:
path=res.group(1)
image=res.group(2)
else:
image=k
caption=k
if len(caption)>17:
caption=path+"<br/>"+image
try:
xmlfname=
BASEPAGE+"/trains/"+path+"/"+image+".xml"
xmlfile = open(xmlfname)
dom = parse(xmlfile)
nameelem=dom.getElementsByTagName('name').item(0)
pageattr=nameelem.getAttributeNode('page')
if pageattr:
page=pageattr.nodeValue
else:
page='index'
xmlfile.close()
rawxml.write("<td align=\"center\">\n")
rawxml.write("<table><tr><td>%4d</td>\n" % (rank))
rawxml.write(" <td align=\"right\">%2.6f</td></tr>\n" % (n))
rawxml.write(" <tr><td colspan=\"2\" align=\"center\">")
#rawxml.write(' <input type="hidden" name="disablevote" value="1"/>')
rawxml.write(" <button type=\"submit\" name=\"image\" value=\"trains/%s\">\n" % (k))
rawxml.write(" <img align=\"center\" alt=\"click me for full image\" ")
rawxml.write(" src=\"trains/%s/thumb/%s.gif\"/>\n" % (path,image))
rawxml.write(" </button></td></tr><tr><td colspan=\"2\" align=\"center\">")
rawxml.write(" <uri href=\"trains/%s/%s.xml#%s\">%s</uri></td></tr></table></td>" % \
(path,page,image,caption))
except IOError:
rawxml.write(" <td align=\"center\">")
rawxml.write(" <table><tr><td>%4d</td>" % (rank))
rawxml.write(" <td align=\"right\">%2.6f</td></tr>" % (n))
rawxml.write(" <tr><td colspan=\"2\" align=\"center\">")
rawxml.write("Cannot access<br/>")
rawxml.write("%s at %s</td></tr></table></td>" % (caption,xmlfname))
pass
posonline+=1
if posonline==perline:
rawxml.write("</tr>\n")
posonline=0
i+=1
if i>stopnum:
break
last=n
rawxml.write("</table></form>\n")
3.7 ranktrains: Print rankings table
<ranktrains: print rankings table 3.8> =rawxml.write("<title>Rankings Images Table</title>\n\n")
rawxml.write("<form action=\""+CGIBIN+"/ranktrains.py\" method=\"post\">\n\n")
rawxml.write("<table align=\"center\"><tr>\n\n")
#rawxml.write("<input type=\"hidden\" name=\"number\" value=\"%d\"/>\n" % (numtodisplay))
linecount=0
# compute score from first image
(score,key)=sortlist[0]
rawxml.write("<td align=\"left\">%7.4f</td>\n" % (score))
nimages=len(sortlist)-1
for i in range(0,totalimages,numtodisplay):
rawxml.write("<td align=\"center\">\n")
rawxml.write("<button type=\"submit\" name=\"startnum\" \n")
rawxml.write("value=\"%d\">%04d-%04d</button></td>\n" % (i,i+1,i+numtodisplay))
linecount+=1
# compute score from image number i+numtodisplay
j=i+numtodisplay-1
if j > nimages: j = nimages
(score,key)=sortlist[j]
if linecount % 6 == 2:
rawxml.write("<td align=\"left\">%7.4f</td>\n" % (score))
if linecount % 6 == 4:
rawxml.write("<td align=\"left\">%7.4f</td>\n" % (score))
if linecount % 6 == 0:
rawxml.write("<td align=\"left\">%7.4f</td>\n" % (score))
rawxml.write("</tr><tr>\n\n")
# compute score from image number i+numtodisplay+1
j=i+numtodisplay
if j > nimages: j = nimages
(score,key)=sortlist[j]
rawxml.write("<td align=\"left\">%7.4f</td>\n" % (score))
# compute score from last image
(score,key)=sortlist[len(sortlist)-1]
rawxml.write("<td align=\"left\">%7.4f</td>\n" % (score))
rawxml.write("</tr></table>\n\n")
rawxml.write("</form>\n\n")
rawxml.write("""
<p>
A full explanation of how these rankings are computed can be found
on the <uri href=\""""+HOMEPAGE+"""/trains/pops/index"""+EXTN+"""">Vox Pops Page</uri>
</p>
\n""")
3.8 ranktrains: Print Ranking Analysis
<ranktrains: print ranking analysis 3.9> =rawxml.write("<title>Ranking Analysis Data</title>\n\n")
rawxml.write('<p>Time analyses based on wallclock times</p>\n')
rawxml.write('votefactor=%f ' % (votefactor) + \
'(this is the decay since last rankings were computed)<br/>\n')
rawxml.write("Ranking data input took %d.%06d seconds for %d images<br/>\n" % \
(datatime.seconds,datatime.microseconds,totalimages))
rawxml.write("Logfile input took %d.%06d seconds for %d entries<br/>\n" % \
(datatime.seconds,datatime.microseconds,logcount))
rawxml.write("Input analysis and sorting took %d.%06d seconds<br/>\n" % \
(sorttime.seconds,sorttime.microseconds))
rawxml.write("Data ranking took %d.%06d seconds`<br/>\n" % \
(ranktime.seconds,ranktime.microseconds))
3.8.1 print header of html page
This code prints the header part of the html page.
<print header of html page 3.10> =print """
<html>
<head>
print the starting lines, then ...
<print header of html page 3.11> = <title>""",
print htmltitle,
print """</title>
print the page title,including the "MONASH
UNIVERSITY", "INFORMATION TECHNOLOGY" and
"Clayton School" parts.
<print header of html page 3.12> = <base href=\""""+HOMEPAGE+"""/"/>
<link rel="stylesheet" HREF="styles/monash.css" type="text/css" />
</head>
<body>
<div id="global-header">
<div id="global-images">
<table width="100%" bgcolor="white">
<tr width="100%">
<td align="left">
<table>
<tr>
<td align="left">
<a href="http://www.monash.edu.au">
<span style="font-family:sans-serif;font-size:+160%;font-weight:bold;
background-color:#ffffff;color:black">
MONASH UNIVERSITY
</span>
</a>
</td>
</tr>
<tr>
<td align="left">
<a href="http://www.infotech.monash.edu.au" COLOR="black">
<span style="font-family:sans-serif;font-size:+140%;font-weight:bold;
background-color:#ffffff;color:black">INFORMATION TECHNOLOGY</span>
</a>
</td>
</tr>
<tr>
<td align="left">
<a href="http://www.csse.monash.edu.au" COLOR="black">
<span style="font-family:sans-serif;font-size:+120%;
font-weight:bold;background-color:#ffffff;color:black">
Clayton School</span>
</a>
</td>
</tr>
</table>
</td>
<td align="right">
""",
Generate the trains image on the trains page. We do this
from the list of available images in web/images/banner (added
manually to this list), by choosing one indexed by the low
order bits of the current microsecond, that is,
pseudo-randomly.
<print header of html page 3.13> =rightnow=datetime.datetime.now()
locos=["R707-1.jpg", "6029-32.jpg", "4472=R761-11.jpg",\
"621-16.jpg", "F255-1.jpg", "5910-4.jpg",\
"3813-5.jpg", "5112+5910-4a.jpg","W933-8.jpg",\
"520-6.jpg", "3203-3.jpg", "3642-1.jpg",\
"Rx207-15.jpg", "38=D3+K=R-3.jpg", "D3-639+R707-1.png",\
"J549-21.jpg", "5367-2.jpg", "W22-1.png",\
"3813-5a.jpg", "tgv-13.jpg", "S300-1.jpg",\
"4472=R761-11.jpg","6029-6.jpg", "Callington-1.jpg",\
"NYCHudson-1.jpg", "R711-3.jpg", "MerddinEmrys-2.jpg",\
"3801+3813+3820-2.jpg"]
loco=locos[rightnow.microsecond % len(locos)]
print "<img align=\"right\" SRC=\"images/banner/" + loco,
print "\" height=\"79\" alt=\"steam loco " + loco + "\"\/>",
Now complete the final part of the header. A warning message
about using Internet Explorer is also added, as that
application is not W3C compliant.
<print header of html page 3.14> =print """
</td>
</tr>
</table>
</div>
<div class="spacer"></div>
<table style="background-color:#339933;border-top:1px solid #000000"
width="100%" id="global-nav" summary="Layout for site-wide navigation">
<tr>
<td valign="center">
<div style="font-size:+140%;margin-left:1em">
<a HREF="index"""+EXTN+"""">JOHN HURST</a>
Warning: This page works with any browser EXCEPT Internet Explorer!
<xsl:copy-of select="$GlobalNavBar"/>
</div>
</td>
</tr>
</table>
<!-- U T I L I T Y N A V I G A T I O N -->
<table style="background-color:#3c6;color:#fff;vertical-align:middle;
text-align: right" width="100%" id="global-utils"
summary="Layout for utility navigation">
<tr>
<td align="left">
<a HREF="position/index"""+EXTN+"""">Position</a> |
<a HREF="research/index"""+EXTN+"""">Research</a> |
<a HREF="teaching/index"""+EXTN+"""">Teaching</a> |
<a HREF="admin/index"""+EXTN+"""">Administration</a> |
<a HREF="professional/index"""+EXTN+"""">Professional</a> |
<a HREF="personal/index"""+EXTN+"""">Personal</a> |
<a HREF="trains/index"""+EXTN+"""">Railways</a>
|
<a HREF=\""""+CGIBIN+"""/train-map.py">Site map</a>
</td>
</tr>
</table>
<TABLE WIDTH="100%" BGCOLOR="#fff" CELLSPACING="0" CELLPADDING="0">
<TR><TD COLOR="#ffffff" BGCOLOR="#33cc66" COLSPAN="40" ALIGN="center">
<B>Central Shunting Yard</B></TD></TR>
<TR>
<TD ALIGN="center" BGCOLOR="silver" COLSPAN="3">Main</TD>
<TD ALIGN="center" BGCOLOR="lightgreen" COLSPAN="7">Australia</TD>
<TD ALIGN="center" BGCOLOR="lightpink" COLSPAN="3">Miscellaneous</TD>
<TD ALIGN="center" BGCOLOR="lightblue" COLSPAN="5">Rest of World</TD>
</TR><TR>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./index"""+EXTN+"""">
<IMG SRC="trains/./trains.gif" ALT="Main Railway Page"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/new/index"""+EXTN+"""">
<IMG SRC="trains/new/trains.gif" ALT="The Latest Additions"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/pops/index"""+EXTN+"""">
<IMG SRC="trains/pops/thumb/ajh.gif" ALT="The Most Popular Images"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white">
<A HREF="trains/anr/index"""+EXTN+"""">
<IMG SRC="trains/anr/trains.gif" ALT="Australian National Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nsw/index"""+EXTN+"""">
<IMG SRC="trains/nsw/trains.gif" ALT="New South Wales Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/qld/index"""+EXTN+"""">
<IMG SRC="trains/qld/trains.gif" ALT="Queensland Railways" HEIGHT="30"
WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/sa/index"""+EXTN+"""">
<IMG SRC="trains/sa/trains.gif" ALT="South Australian Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tas/index"""+EXTN+"""">
<IMG SRC="trains/tas/trains.gif" ALT="Tasmanian Railways" HEIGHT="30"
WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/vic/index"""+EXTN+"""">
<IMG SRC="trains/vic/trains.gif" ALT="Victorian Railways" HEIGHT="30"
WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/wa/index"""+EXTN+"""">
<IMG SRC="trains/wa/trains.gif" ALT="West Australian Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/misc/index"""+EXTN+"""">
<IMG SRC="trains/misc/trains.gif" ALT="Miscellaneous Railway Items"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A
HREF="trains/private/index"""+EXTN+"""">
<IMG SRC="trains/private/trains.gif" ALT="Private Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A
HREF="trains/tourist/index"""+EXTN+"""">
<IMG SRC="trains/tourist/trains.gif" ALT="Tourist and Preservation"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./rest"""+EXTN+"""">
<IMG SRC="trains/./thumb/rest.gif" ALT="African/Asian Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/br/index"""+EXTN+"""">
<IMG SRC="trains/br/trains.gif" ALT="British Railways" HEIGHT="30"
WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/europe/index"""+EXTN+"""">
<IMG SRC="trains/europe/trains.gif" ALT="Continental European Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nz/index"""+EXTN+"""">
<IMG SRC="trains/nz/trains.gif" ALT="New Zealand Railways"
HEIGHT="30" WIDTH="30"></A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/usa/index"""+EXTN+"""">
<IMG SRC="trains/usa/trains.gif" ALT="North American Railways"
HEIGHT="30" WIDTH="30"></A></TD>
</TR><TR>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./index"""+EXTN+"""">Central</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/new/index"""+EXTN+"""">Latest</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/pops/index"""+EXTN+"""">VoxPop</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/anr/index"""+EXTN+"""">ANR</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nsw/index"""+EXTN+"""">NSW</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/qld/index"""+EXTN+"""">QLD</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/sa/index"""+EXTN+"""">SA</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tas/index"""+EXTN+"""">TAS</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/vic/index"""+EXTN+"""">VIC</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/wa/index"""+EXTN+"""">WA</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/misc/index"""+EXTN+"""">Misc</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/private/index"""+EXTN+"""">Private</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tourist/index"""+EXTN+"""">Tourist</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./rest"""+EXTN+"""">Rest</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/br/index"""+EXTN+"""">BR</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/europe/index"""+EXTN+"""">Europe</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nz/index"""+EXTN+"""">NZ</A></TD>
<TD ALIGN="center" BGCOLOR="white"><A HREF="trains/usa/index"""+EXTN+"""">US&</A></TD>
</TR>
</TABLE></div>"""
3.8.2 Print Trailer of HTML Page
<print trailer of html page 3.15> =print """
<HR SIZE="4" NOSHADE="on" COLOR="#339"/>
<TABLE width="100%" align="center" border="0" cellspacing="0" cellpadding="0">
<TR><TD height="10"></TD></TR>
<TR>
<TD>This page maintained by John Hurst. <BR/>
Copyright
<A HREF="http://www.adm.monash.edu.au/unisec/pol/itec12.html">
Monash University Acceptable Use Policy
</A>
</TD>
<TD><xsl:copy-of select="$GlobalCounter"/></TD>
<TD ALIGN="right" ROWSPAN="2">
<IMG VALIGN="bottom" SRC="images/MadeOnMac.gif"/>
<A HREF="index"""+EXTN+"""">
<IMG ALIGN="center" height="50" width="33"
SRC="family/john9808.gif"
alt="My Photo"/></A>
<A HREF="trains/index"""+EXTN+"""">
<IMG ALIGN="center" height="50" width="33"
SRC="images/train.gif"
alt="Train Photo"/></A>
</TD>
</TR>
<TR>
<TD ALIGN="left" valign="bottom" COLSPAN="3">
<SPAN STYLE="font-size:80%">
<P>
Dynamically generated at """+\
tm+"""\
<BR/>
Maintainer use only; not generally accessible:
<!-- **** NB! The "localhost" in the following MUST
be split to avoid being converted for
other server contexts -->"""
ind=' '
print ind+'<A href="http://local'+'host/~ajh/cgi-bin/'+scriptnameparm+'">Local Server</A>'
print ind+"<xsl:text>
</xsl:text>"
print ind+'<A href="http://www.ajh.id.au/~ajh/cgi-bin/'+scriptnameparm+'">Home Server</A>'
print ind+"<xsl:text>
</xsl:text>"
print ind+'<A href="http://www.ajhurst.org/~ajh/cgi-bin/'+scriptnameparm+'">Hurst Server</A>'
print ind+"<xsl:text>
</xsl:text>"
print ind+'<A href="http://dimboola.infotech.monash.edu.au/~ajh/cgi-bin/'+scriptnameparm+'">'
print ind+'Work Server</A>'
print ind+"<xsl:text>
</xsl:text>"
print ind+'<A href="http://www.csse.monash.edu.au/cgi-bin/cgiwrap/ajh/'+scriptnameparm+'">'
print ind+'CSSE Server</A>'
print """
<xsl:text>
</xsl:text>
</P>
</SPAN>
</TD>
</TR>
</TABLE>
</body>
</html>
"""
4. tidytrains.py
read two files to determine current rankings. The first file
contains a single entry for each image, containing the current
voting score, and the second file contains a list of votes since
then. This ranking is then used to determine if any images
should be removed from the system, depending upon whether or not
their rank is less than some threshold.
"tidytrains.py" 4.1 =#!/usr/bin/python
import datetime
import getopt
import os
import rank
import re
import shutil
import socket
import sys
import time
<define global constants 2.4>
<determine server environment 2.9>
THRESHOLD=0.000170
MASTERDIR=BASEPAGE+'/trains'
WEBDIR=BASEPAGE
WEBLOG=LOGFILE
WEBPAGE=WEBDIR+'/trains/'
WEBRANK=WEBLOG+'/trainrank'
WEBVIEW=WEBLOG+'/trainview'
opts, args = getopt.getopt(sys.argv[1:],"s:f:")
#CURRENT=WEBRANK
#LOGFILE=WEBVIEW
CURRENT=WEBLOG+'/trainrank'
LOGFILE=WEBLOG+'/trainview'
LISTFILE=WEBLOG+'/trainlist'
command="find %s -name \*.jpg >%s" % (MASTERDIR,LISTFILE)
print command
status=os.system(command)
if status:
print "Urrk 2! %d" % (status)
sys.exit(status)
available=[]
avfile=open(LISTFILE)
for l in avfile.readlines():
l=l.strip()
l=l[11:] # strip off www/trains/
#print ">%s<" % l
available.append(l)
avfile.close()
#os.remove(LISTFILE)
oneday=datetime.timedelta(days=1)
yesterday=datetime.datetime.now()-oneday
starttime=yesterday.strftime("%Y%m%d:000000")
finishtime="20201231:235959"
for opt,val in opts:
print "%s %s" % (opt,val)
if opt=='-s':
starttime=val
elif opt=='-f':
finishtime=val
else:
print "Unknown option %s" % (opt)
print "starttime = %s" % (starttime)
totalimages, datatime, votefactor, table = rank.rankdata(CURRENT)
ignorepat=re.compile(".*\*\*\* [^*]* \*\*\*")
totalimages,logcount, ranktime,sorttime,sortlist = \
rank.ranklog(LOGFILE,table,ignorepat,starttime,finishtime)
for (val,key) in sortlist:
if val<THRESHOLD:
#path=WEBPAGE+key+'.jpg'
path=SCSSEPAGE+key+'.jpg'
if key in available:
print 'removing %s (not really)' % (path)
#command='ssh -1 nexus.csse.monash.edu.au "rm %s"' % (path)
#status=os.system(command)
#os.remove(path)
else:
print '%s already removed' % (path)
pass
table = {}
ignorepat=re.compile(r'.*((-[0-9a-z]+)|(already voted \*\*\*))$')
totalimages,logcount, ranktime,sorttime,sortlist = \
rank.ranklog(LOGFILE,table,ignorepat,starttime,finishtime)
pathpat=re.compile(r'([^ ]+) ')
for (val,key) in sortlist:
#print "%s %7.6f" % (key,val)
res=pathpat.match(key)
if res:
path=res.group(1)
webpath = MASTERDIR+path+'.jpg'
#sitepath = WEBPAGE+path+'.jpg'
sitepath = SCSSEPAGE+path+'.jpg'
#webacc=os.access(webpath,os.F_OK)
#siteacc=os.access(sitepath,os.F_OK)
#if webacc:
#if siteacc:
#pass
#else:
#print 'cp -p %s %s' % (webpath,sitepath)
#shutil.copy2(webpath,sitepath)
#else:
#if siteacc:
#print 'Anomalous: %s exists but %s does not' % (sitepath,webpath)
#else:
#print 'Cannot recover %s as there is no master copy' % (sitepath)
command='/usr/local/bin/rsync -auv %s nexus.csse.monash.edu.au:%s &>/dev/null' % \
(webpath,sitepath)
status=os.system(command)
if not status:
print 'recovered %s' % (sitepath)
else:
print 'Could NOT recover %s, status:%d' % (sitepath,status)
sys.exit(0)
5. ranking.py
This program brings the rankings data up to date. Normally, the
trainrank file is untouched by normal browsing, and
trainviews simply log an extra entry in the trainview
file. Running ranking.py analyses the votes in
trainview, adjusts the rankings from trainrank
according, and outputs a new version of trainrank.
It was designed this way to avoid race conditions if two or more
users simultaneously vote for any railway images, and to avoid
multiple update errors on the rankings file. In practice, I am
not sure if this is warranted, or indeed whether it avoids the
problem. More thought needed.
Currently, the first two parameters to ranking.py are the
current files trainrank and trainview (in that
order), and the third parameter to ranking.py is the new
file that takes the place of trainrank. It is the user's
responsibility to check that these files are processed
correctly, and then move the new file in place of the old, and
to empty the trainview votes file. Perhaps this will be
done automatically sometime in the future?
"ranking.py" 5.1 =#!/usr/bin/python
# read two files to determine current rankings. The first file
# contains a single entry for each image, containing the current
# voting score, and the second file contains a list of votes since
# then. A third file is then constructed, containing the updated
# rankings.
# DO NOT EDIT THIS FILE!
# use ~/Computers/python/viewtrains/viewtrains.xlp instead
import re,string,sys,datetime
import cgi,os
import getopt
import socket
import time
import urllib
import math
<define global constants 2.4>
<determine server environment 2.9>
tm = time.asctime(time.localtime(time.time())) + ["", "(Daylight savings)"][DST]
startnow=datetime.datetime.now()
ignoredirs = re.compile('(tmp)|(units)')
top=BASEPAGE+"/trains"
jpgpat=re.compile(r'(.*)\.jpg$')
xmlpat=re.compile(r'.*\.xml$')
datepat=re.compile(r'(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2})(\d{2})')
def
strtotime(str,default):
res=datepat.match(str)
if res:
thisdatetime=datetime.datetime(int(res.group(1)), # year
int(res.group(2)), # month
int(res.group(3)), # day
int(res.group(4)), # hour
int(res.group(5)), # minute
int(res.group(6))) # second
return thisdatetime
else:
return default
opts, args = getopt.getopt(sys.argv[1:],"s:f:")
CURRENT=args[0]
LOGFILE=args[1]
NEW=args[2]
starttime="20050101:000000"
finishtime="20201231:235959"
for opt,val in opts:
print "%s %s" % (opt,val)
if opt=='-s':
starttime=val
elif opt=='-f':
finishtime=val
else:
print "Unknown option %s" % (opt)
print "%s %s" % (starttime,finishtime)
starttime=strtotime(starttime,None)
finishtime=strtotime(finishtime,None)
print "%s %s" % (starttime,finishtime)
currcount=0
currlist=file(CURRENT,"r")
table={}
DECAY=15.0
currdate=currlist.readline()
currdatetime=strtotime(currdate,startnow)
timesincelast=startnow-currdatetime
dayssincelast=timesincelast.days+timesincelast.seconds/86400.0
expval=-dayssincelast/DECAY
votefactor=math.exp(expval)
print 'votefactor=%f' % (votefactor)
totalimages=0
for l in currlist.readlines():
res=re.match(r'([^ ]+) +([0-9.]+)$',l)
if res:
lastvote=float(res.group(2))
nowvote=votefactor*lastvote
table[res.group(1)]=nowvote
else:
print 'bad format in %s' % (l)
totalimages+=1
currlist.close()
datanow=datetime.datetime.now()
datatime = datanow-startnow
print "Data input took %d.%06d seconds for %d images<BR/>" % \
(datatime.seconds,datatime.microseconds,totalimages)
data=open(LOGFILE)
pat=re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) ([0-9a-f\.:]+ )?(trains)?(.*)$")
20100610:143701 Note that the : in the IP field
(group 6) has been added to cope with IPv6 naming
conventions.
"ranking.py" 5.2 =
<ranking: parse and process a ranking entry 5.3> =year=int(res.group(1))
month=int(res.group(2))
day=int(res.group(3))
hour=int(res.group(4))
minute=int(res.group(5))
accesstime = datetime.datetime(year,month,day,hour,minute)
#print "%s %s %s" % (accesstime,starttime,finishtime)
if accesstime < starttime or accesstime > finishtime:
print "ignoring %s" % (l)
continue
timesinceaccess=startnow-accesstime
dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0
expval=-dayssinceaccess/DECAY
voteval=math.exp(expval)
#print "%1.4f %2.6f %2.5f" % (voteval,expval,dayssinceaccess)
ipadr=res.group(6)
if ipadr:
ipadr=ipadr.strip()
imagepath=res.group(8)
res=re.match('(.*)(/[^/.]+/\.\./)(.*)$',imagepath)
while res:
imagepath=res.group(1)+'/'+res.group(3)
res=re.match('(.*)(/[^/.]+/\.\./)(.*)$',imagepath)
print imagepath
imagepath=imagepath[1:]
if table.has_key(imagepath):
table[imagepath]+=voteval
else:
table[imagepath]=voteval
<ranking: build new ranking list 5.4> =currlist=file(NEW,"w")
currlist.write('%04d%02d%02d:%02d%02d%02d\n' % (startnow.year,
startnow.month,
startnow.day,
startnow.hour,
startnow.minute,
startnow.second))
for (val,key) in sortlist:
currlist.write('%s %f\n' % (key,val))
currlist.close()
<ranking: print closing summary 5.5> =closenow=datetime.datetime.now()
sorttime = closenow-ranknow
print "Data sorting took %d.%06d seconds<BR/>" % (sorttime.seconds,sorttime.microseconds)
6. rank.py
This module provides three procedures for analysing the
rankings of John's railway photographs. Rnaking data is stored
in two files, generically known as the trainrank and
trainview files. The first stores ranking data for a
set of images, as computed at a specified data and time. The
second stores access requests for images in the database,
together with the time and IP address of the request. Note
that there is one entry for each unique image in the first
file, whereas there may be multiple entries in the second file
for any given image.
"rank.py" 6.1 =
6.1 rank: define the strtotime function
<rank: define the strtotime function 6.2> =def
strtotime(str,default):
res=datepat.match(str)
if res:
thisdatetime=datetime.datetime(int(res.group(1)), # year
int(res.group(2)), # month
int(res.group(3)), # day
int(res.group(4)), # hour
int(res.group(5)), # minute
int(res.group(6))) # second
return thisdatetime
else:
return default
6.2 rank: define rankdata procedure
The rankdata procedure reads the train rank file (the
path to which is passed as the parameter), containing a date
and time when the file was created, together a line for every
image in the database. Each line contains the image path,
starting at the trains subdirectory, together with the
(decayed) vote value. Each vote is decayed by an exponential
factor votefactor, where the exponent is proportional
to the length of time since the creation date of the file. An
updated vote value is entered into an associative
table, which is returned, along with various
housekeeping values of totalimages (the total number of
distinct images processed), and datatime, the elapsed
wall time spent in processing this file.
<rank: define rankdata procedure 6.3> =def
rankdata(CURRENT):
datastart=datetime.datetime.now()
totalimages=0
table={}
currlist=file(CURRENT,"r")
currdate=currlist.readline()
currdatetime=strtotime(currdate,startnow)
timesincelast=startnow-currdatetime
dayssincelast=timesincelast.days+timesincelast.seconds/86400.0
expval=-dayssincelast/DECAY
votefactor=math.exp(expval)
for l in currlist.readlines():
res=re.match(r'([^ ]+) +([0-9.]+)$',l)
if res:
lastvote=float(res.group(2))
nowvote=votefactor*lastvote
table[res.group(1)]=nowvote
else:
print 'bad format in %s' % (l)
totalimages+=1
currlist.close()
datanow=datetime.datetime.now()
datatime = datanow-datastart
return totalimages, datatime, votefactor, table
6.3 rank: define ranklog procedure
<rank: define ranklog procedure 6.4> =def
ranklog(LOGFILE,table,ignorepat,
starttime="20050101:000000",finishtime="20201231:235959"):
logstart=datetime.datetime.now()
data=open(LOGFILE)
logcount=0
starttime=strtotime(starttime,None)
finishtime=strtotime(finishtime,None)
for l in data.readlines():
logcount+=1
l=l.strip()
res=ignorepat.match(l)
if res:
continue
res=pat.match(l)
if res:
year=int(res.group(1))
month=int(res.group(2))
day=int(res.group(3))
hour=int(res.group(4))
minute=int(res.group(5))
accesstime = datetime.datetime(year,month,day,hour,minute)
#print "%s %s %s" % (accesstime,starttime,finishtime)
if accesstime < starttime or accesstime > finishtime:
#print "ignoring %s" % (l)
continue
timesinceaccess=startnow-accesstime
dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0
expval=-dayssinceaccess/DECAY
voteval=math.exp(expval)
#print "%1.4f %2.6f %2.5f" % (voteval,expval,dayssinceaccess)
ipadr=res.group(6)
if ipadr:
ipadr=ipadr.strip()
imagepath=res.group(8)
if table.has_key(imagepath):
table[imagepath]+=voteval
else:
table[imagepath]=voteval
pass
ranknow=datetime.datetime.now()
ranktime = ranknow-logstart
list=[]
for key in sorted(table.keys()):
list.append((table[key],key))
sortlist=sorted(list,reverse=True)
totalimages = len(sortlist)
closenow=datetime.datetime.now()
sorttime = closenow-ranknow
return totalimages, logcount, ranktime, sorttime, sortlist
7. The Makefile
The Makefile handles the nitty-gritty of copying
files to the right places, and setting permissions, etc.
"Makefile" 7.1 =default=viewtrains
flags=-2 bash
WEBPAGE=/home/ajh/public_html/research/literate
FILES=.bash_profile .bashrc .bash_logout
XSLLIB=/home/ajh/lib/xsl
XSLFILES=$(XSLLIB)/lit2html.xsl $(XSLLIB)/tables2html.xsl
CGIBIN=${HOME}/www/cgi-bin
INSTALLFILES=${CGIBIN}/viewtrains.py
include $(HOME)/etc/MakeXLP
viewtrains.tangle viewtrains.xml: viewtrains.xlp
xsltproc --xinclude -o viewtrains.xml $(XSLLIB)/litprog.xsl viewtrains.xlp
touch viewtrains.tangle
viewtrains.html: viewtrains.xml $(XSLFILES)
xsltproc --xinclude $(XSLLIB)/lit2html.xsl viewtrains.xml >viewtrains.html
install: $(INSTALLFILES)
${HOME}/%: %
@if [ -f $@ ] ; then \
diff -q $@ $< >/dev/null ;\
if [ $$? -eq 1 ] ; then \
echo cp $< $@ ;\
cp $< $@ ;\
fi ;\
else \
echo cp $< $@ ;\
cp $< $@ ;\
fi
${CGIBIN}/%: %
@if [ -f $@ ] ; then \
diff -q $@ $< >/dev/null ;\
if [ $$? -eq 1 ] ; then \
echo chmod 755 $< ; cp $< $@ ;\
chmod 755 $< ;\
echo cp $< $@ ;\
cp $< $@ ;\
else \
echo "$< has not been altered" ;\
fi ;\
else \
echo chmod 755 $< ; cp $< $@ ;\
chmod 755 $< ;\
cp $< $@ ;\
fi
all: install viewtrains.py
Makefile: viewtrains.tangle
8. TO DO
- Make link of "rank n" to anchor point in ranktrains.py.
First need to add anchors to ranktrains.py to make this
work.
9. Document History
20161223:102357 |
ajh |
1.0.0 |
viewtrains redesigned to create dynamic XML to make look and
feel consistent with regular train pages.
|
<version number .1> = 1.0.0
<version date .2> = 20161223:102357
10. Indices
10.1 Identifier Index
Identifier |
Defined in |
Used in |
BASEPAGE |
2.5 |
2.9, 2.25, 2.26, 2.28, 3.7
|
BASEPAGE |
2.6 |
2.9, 2.25, 2.26, 2.28, 3.7
|
RANKINGS |
2.13 |
|
VIEWINGS |
2.13 |
|
datatime |
6.3 |
|
dir |
2.31 |
2.31, 2.31
|
display |
2.26 |
2.2, 2.31
|
imageisrelative |
2.12 |
2.2 |
imageisrelative |
2.12 |
2.2 |
logentrypat |
2.8 |
2.23 |
rankdata |
6.3 |
|
ranklog |
6.4 |
3.5 |
strtotime |
6.2 |
5.1 |
table |
2.13 |
2.19, 2.19, 2.19
|
table |
6.3 |
2.19, 2.19, 2.19
|
top |
2.9 |
2.2, 2.26, 2.26
|
totalimages |
6.3 |
|
visit |
2.31 |
2.2, 2.31
|
votefactor |
6.3 |
|
10.2 Chunk Index
Chunk Name |
Defined in |
Used in |
Explain missing images |
2.29 |
2.28 |
Print ranking information |
2.30 |
2.26 |
collect cgi parameters |
2.12 |
2.2 |
collect previous rankings |
2.13 |
2.2 |
define constants for viewtrains |
2.7 |
2.1, 3.1
|
define function convertIPtoHex |
2.17 |
2.16 |
define getrank routine |
2.19 |
2.16 |
define global constants |
2.4 |
2.1, 3.1, 3.2, 4.1, 5.1
|
define procedure log |
2.18 |
2.16 |
define procedure to display the image |
2.26 |
2.16 |
define procedure to search directories |
2.31 |
2.16 |
define regular expression patterns |
2.8 |
2.1, 3.5
|
define subroutines |
2.16 |
2.1 |
define the XML dispatch routines |
2.25 |
2.16 |
determine server environment |
2.9 |
2.1, 3.1, 4.1, 5.1
|
display can access file |
2.27 |
2.26 |
display cannot access file |
2.28 |
2.26 |
getrank: extract data from a single logfile entry |
2.23 |
2.22 |
globals for linux |
2.6 |
2.9 |
globals for macosx |
2.5 |
2.9 |
print header of html page |
3.10, 3.11, 3.12, 3.13, 3.14
|
|
print trailer of html page |
3.15 |
|
rank: define rankdata procedure |
6.3 |
6.1 |
rank: define ranklog procedure |
6.4 |
6.1 |
rank: define the strtotime function |
6.2 |
6.1 |
ranking: build new ranking list |
5.4 |
5.2 |
ranking: parse and process a ranking entry |
5.3 |
5.2 |
ranking: print closing summary |
5.5 |
5.2 |
rankings: print forward and backward buttons |
3.6 |
3.1, 3.1
|
ranktrains: collect cgi parameters |
3.3 |
3.1 |
ranktrains: collect previous ranking information |
3.4 |
3.1 |
ranktrains: define constants |
3.2 |
3.1 |
ranktrains: generate image rankings |
3.7 |
3.1 |
ranktrains: print ranking analysis |
3.9 |
3.1 |
ranktrains: print rankings table |
3.8 |
3.1 |
ranktrains: update rankings with latest log info |
3.5 |
3.1 |
version date |
.2 |
|
version number |
.1 |
|
viewtrains add header |
2.14 |
2.2, 3.1
|
viewtrains add trailer |
2.15 |
2.2, 3.1
|
viewtrains convert XML to HTML |
2.11 |
2.10 |
viewtrains imports |
2.3 |
2.1, 3.1
|
viewtrains translate XML to HTML |
2.10 |
2.2, 3.1
|
viewtrains: getrank: compute new ranking of this image |
2.24 |
2.19 |
viewtrains: getrank: get relative path to image |
2.20 |
2.19 |
viewtrains: getrank: open the VIEWINGS file |
2.21 |
2.19 |
viewtrains: getrank: read the VIEWINGS file |
2.22 |
2.19 |
10.3 File Index
File Name |
Defined in |
Makefile |
7.1 |
rank.py |
6.1 |
ranking.py |
5.1, 5.2
|
ranktrains.py |
3.1 |
tidytrains.py |
4.1 |
viewtrains.py |
2.1, 2.2
|
54 accesses since 20 Jul 2020,
HTML cache rendered at 20180703:1431