Apr 12 2008

lists

Published by jfrank under resources

I found an interesting site called maxmind while looking for a database of country information.

they have a simple iso list,

http://www.maxmind.com/app/iso3166

and a database of world cities

http://www.maxmind.com/app/worldcities

free.

No responses yet

Feb 11 2008

GSA Statistics python scripts

Published by jfrank under python

I have been tasked with upgrading Google search appliances and in doing so I wanted to calculate some statistics.

Comparing Crawled Pages Across GSA’s (or mini’s)

You could use this to compare any two url xml files from the Crawl Diagnostics –> Export All Pages To a File
I could have called them A and B, but I was comparing a mini and a gsa at the time, so the naming convention in the file goes. It uses simple python sets to see what is crawled in one machine, but not the other etc. Expects two local files, mini-urls.xml and gsa-urls.xml; either could be a gsa or mini export.

import xml.dom.minidom

def main():
    miniUrlSet = extractUrls(xml.dom.minidom.parse('mini-urls.xml'))
    gsaUrlSet = extractUrls(xml.dom.minidom.parse('gsa-urls.xml'))
    print 'mini', len(miniUrlSet)
    print 'gsa', len(gsaUrlSet)
    print 'intersections', len(miniUrlSet & gsaUrlSet)
    print 'mini is sub of gsa?', miniUrlSet <= gsaUrlSet
    gsaNotMini = gsaUrlSet - miniUrlSet
    print 'things in gsa but not mini:', len(gsaNotMini)
    for i in gsaNotMini:
        print i
    miniNotGsa = miniUrlSet - gsaUrlSet
    print 'things in mini but not gsa', len(miniNotGsa)
    for i in miniNotGsa:
        print i

def extractUrls(dom):
    nodelist = dom.getElementsByTagName("loc")
    urls = set()
    for node in nodelist:
        urls.add(node.firstChild.data) #i know all loc nodes have a single child text node, text nodes have a data property
    return urls

main()

Calculating Search Keywords Density Over Time

This is calculated against an export from the search logs feature under status and reports. You export the timeframe you want to compare over, and then run this against the log file. You get the top 100 keywords that people searched for, and the counts of how many times they were searched. Expects a local file log.log.

from datetime import datetime
from operator import itemgetter

def getQueryCounts(f):
       import re
       words = {}
       qReg = re.compile('.*?&q=(.*?)&')
       for l in f:
              keyword = qReg.findall(l)
              if(len(keyword) and len(keyword[0])):
                  words[keyword[0]] = words.get(keyword[0], 0) + 1
       return words

start = datetime.now()
f=open('log.log')
words = getQueryCounts(f)
f.close()
top = sorted(words.iteritems(),key=itemgetter(1),reverse=True)[:100]
print 'Top Words'
print '---------'
for word, num in top:
       print word, num
print 'runtime:', datetime.now() - start
raw_input("press enter")

No responses yet

Dec 20 2007

python. (not ruby)

Published by jfrank under python

I finally got shared folders up and running on my virtual fedora box. This required a little kernel/kernel headers upgrading, and compiling the vmware tools for my box, but it works like a charm. It even gives me cut and paste to the win xp desktop, which is… cool i guess.

I decided to go with python, which has a plethora of tools. Pylons is a piecemeal web framework that is closest to my liking, migrate is a library for schema migration, which works nicely with sqlalchemy, a monster orm.

sqlalchemy is cool because you can use parts of it totally independently. Coming from a CF background I am used to having nice named/pooled connections that I don’t have to think about. The base layer of sqlalchemy is that, a database type abstraction and pooling. Then you are free to go crazy with ORMish things or not, its up to you.

It is so reusable many people have written layers on top of it for even more magical coding… but its nice to have all the options.

Migrate, a RoR knockoff is the real find though, it looks young (as far as a project goes) but I watched a demo of it used in another python framework and it was exactly as I expected, like something we use at work for CF.  It has a schema version table, that holds app state version, and version files with ‘up/down’ methods. My main issue with many of these ’scafolmagic’ things is that no one bothered to mention how you get from one version to the next… or back again. You can’t build the model right the first time, and iterative programming is a fact of life. This library addresses that.

No responses yet

Dec 19 2007

cf admin api

Published by jfrank under coldfusion

Cookie name “CFAUTHORIZATION_SPLAT SPLAT” is a reserved token
The error occurred in administrator.cfc: line 116

Today I ran into a weird bug in the cf admin api, if you attempt to perform a login such as this:

<cfscript>
loggedin = createObject(“component”,”cfide.adminapi.administrator”).login(‘dsafdsafsad’);
</cfscript>

It will bomb with the above error if your application name contains a space, the error is slightly different whether you use Application.cfc or <cfapplication> style.

The fix, thanks to Barney is to remove the space.

No responses yet

Dec 15 2007

virtualization

Published by jfrank under setup

I am using a virtual linux server via free vmware player on my development box, which is winxp. I found several groups who produce free stock distributions packaged in virtual machine format, which means I can run on literally the same stack as my real server locally.

I haven’t got there quite yet, but I intend to use a windows eclipse ide mapped into the virtual box via shared folders.

The subversion usage up to this point has left me with simple tasks to sync the two server’s configurations.

next up…. python or ruby

No responses yet

Dec 07 2007

WordPress is easy.

Published by jfrank under setup

Hi everyone who isn’t there.

I have really enjoyed setting up my new server, and my first project was to get a blog up and running under my domain name. I have had this domain for years but haven’t gotten around to developing anything on it. So here it is. It didn’t take me long to get it set up, and I version controlled the whole thing as I was doing it. So if someone were to wipe out my server right now, I would still be able to regenerate this ..

er i take it back.

I haven’t set up backups yet… and although what I said was true because I have a blog working copy, I would lose this post because I don’t have mysql backing up yet. What I was saying was, I could regenerate this blog in a couple commands.

Anyway so the first few posts are going to be about server setup, and me learning linux.

One response so far

« Prev