<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jfrank &#187; python</title>
	<atom:link href="http://www.joshuafrankamp.com/blog/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.joshuafrankamp.com/blog</link>
	<description>technology and some random stuff</description>
	<lastBuildDate>Wed, 01 Sep 2010 22:18:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>python packaging</title>
		<link>http://www.joshuafrankamp.com/blog/python/</link>
		<comments>http://www.joshuafrankamp.com/blog/python/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 09:12:11 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=19</guid>
		<description><![CDATA[Python packaging is a pain in the ass.  There are some tools to make it easy, so easy in fact that it becomes even worse&#8230;
easy_install is the easiest thing since sliced bread. What does it do? Everything. Its so magic it probably installs itself recursively just for fun.
You want a package?
Ok just type this: [...]]]></description>
			<content:encoded><![CDATA[<p>Python packaging is a pain in the ass.  There are some tools to make it easy, so easy in fact that it becomes even worse&#8230;</p>
<p>easy_install is the easiest thing since sliced bread. What does it do? Everything. Its so magic it probably installs itself recursively just for fun.</p>
<p>You want a package?</p>
<p>Ok just type this: easy_install sqlalchemy (for the awesome ORM package for python)</p>
<p>It magically goes and finds sqlalchemy, and installs it INTO your system python installed path.</p>
<p>Why is the standard assumption that if I want to use a python package that is say a dependency for my project, that I want to INSTALL IT INTO PYTHON running on my system?</p>
<p>What kind of crazy idea is this? It causes all kinds of issues. The first and most obvious is: What If I have two programs that expect different versions of a given package? Since the packages are installed in to the runtime and not my app, you have to know about this issue and work around it.</p>
<p>If packages were managed the java way, the assumption would be that I want to install the package in the app that I am working on, not into /systemjdk/extensions/somePackage</p>
<p>The only argument FOR doing it this way that I can think of is saving disk space. Disk space is cheap. </p>
<p>/rant.</p>
<p>Ok so honestly,  can anyone tell me why this is?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GSA Statistics python scripts</title>
		<link>http://www.joshuafrankamp.com/blog/gsa-statistics-python-scripts/</link>
		<comments>http://www.joshuafrankamp.com/blog/gsa-statistics-python-scripts/#comments</comments>
		<pubDate>Mon, 11 Feb 2008 23:06:23 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[gsa statistics]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=9</guid>
		<description><![CDATA[I have been tasked with upgrading Google search appliances and in doing so I wanted to calculate some statistics.
Comparing Crawled Pages Across GSA&#8217;s (or mini&#8217;s)
You could use this to compare any two url xml files from the Crawl Diagnostics &#8211;&#62; Export All Pages To a File
I could have called them A and B, but I [...]]]></description>
			<content:encoded><![CDATA[<p>I have been tasked with upgrading Google search appliances and in doing so I wanted to calculate some statistics.</p>
<p><strong>Comparing Crawled Pages Across GSA&#8217;s (or mini&#8217;s)</strong></p>
<p>You could use this to compare any two url xml files from the Crawl Diagnostics &#8211;&gt; Export All Pages To a File<br />
I could have called them A and B, but I was comparing a mini and a gsa at the time, so the naming convention in the file goes.  It uses simple python sets to see what is crawled in one machine, but not the other etc. Expects two local files, mini-urls.xml and gsa-urls.xml; either could be a gsa or mini export.</p>
<pre>
import xml.dom.minidom

def main():
    miniUrlSet = extractUrls(xml.dom.minidom.parse('mini-urls.xml'))
    gsaUrlSet = extractUrls(xml.dom.minidom.parse('gsa-urls.xml'))
    print 'mini', len(miniUrlSet)
    print 'gsa', len(gsaUrlSet)
    print 'intersections', len(miniUrlSet &amp; gsaUrlSet)
    print 'mini is sub of gsa?', miniUrlSet &lt;= gsaUrlSet
    gsaNotMini = gsaUrlSet - miniUrlSet
    print 'things in gsa but not mini:', len(gsaNotMini)
    for i in gsaNotMini:
        print i
    miniNotGsa = miniUrlSet - gsaUrlSet
    print 'things in mini but not gsa', len(miniNotGsa)
    for i in miniNotGsa:
        print i

def extractUrls(dom):
    nodelist = dom.getElementsByTagName("loc")
    urls = set()
    for node in nodelist:
        urls.add(node.firstChild.data) #i know all loc nodes have a single child text node, text nodes have a data property
    return urls

main()</pre>
<p><strong>Calculating Search Keywords Density Over Time </strong></p>
<p>This is calculated against an export from the search logs feature under status and reports. You export the timeframe you want to compare over, and then run this against the log file. You get the top 100 keywords that people searched for, and the counts of how many times they were searched. Expects a local file log.log.</p>
<pre>
from datetime import datetime
from operator import itemgetter

def getQueryCounts(f):
       import re
       words = {}
       qReg = re.compile('.*?&amp;q=(.*?)&amp;')
       for l in f:
              keyword = qReg.findall(l)
              if(len(keyword) and len(keyword[0])):
                  words[keyword[0]] = words.get(keyword[0], 0) + 1
       return words

start = datetime.now()
f=open('log.log')
words = getQueryCounts(f)
f.close()
top = sorted(words.iteritems(),key=itemgetter(1),reverse=True)[:100]
print 'Top Words'
print '---------'
for word, num in top:
       print word, num
print 'runtime:', datetime.now() - start
raw_input("press enter")</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/gsa-statistics-python-scripts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>python. (not ruby)</title>
		<link>http://www.joshuafrankamp.com/blog/python-not-ruby/</link>
		<comments>http://www.joshuafrankamp.com/blog/python-not-ruby/#comments</comments>
		<pubDate>Thu, 20 Dec 2007 08:19:16 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[vmware setup]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=7</guid>
		<description><![CDATA[I finally got shared folders up and running on my virtual fedora box. This required a little kernel/kernel headers upgrading, and compiling the vmware tools for my box, but it works like a charm. It even gives me cut and paste to the win xp desktop, which is&#8230; cool i guess.
I decided to go with [...]]]></description>
			<content:encoded><![CDATA[<p>I finally got shared folders up and running on my virtual fedora box. This required a little kernel/kernel headers upgrading, and compiling the vmware tools for my box, but it works like a charm. It even gives me cut and paste to the win xp desktop, which is&#8230; cool i guess.</p>
<p>I decided to go with python, which has a plethora of tools. <a href="http://pylonshq.com/">Pylons</a> is a piecemeal web framework that is closest to my liking, <a href="http://code.google.com/p/sqlalchemy-migrate/">migrate</a> is a library for schema migration, which works nicely with <a href="http://www.sqlalchemy.org/">sqlalchemy</a>, a monster orm.</p>
<p>sqlalchemy is cool because you can use parts of it totally independently. Coming from a CF background I am used to having nice named/pooled connections that I don&#8217;t have to think about. The base layer of sqlalchemy is that, a database type abstraction and pooling. Then you are free to go crazy with ORMish things or not, its up to you.</p>
<p>It is so reusable many people have written layers on top of it for even more magical coding&#8230; but its nice to have all the options.</p>
<p>Migrate, a RoR knockoff is the real find though, it looks young (as far as a project goes) but I watched a demo of it used in another python framework and it was exactly as I expected, like something we use at work for CF.  It has a schema version table, that holds app state version, and version files with &#8216;up/down&#8217; methods. My main issue with many of these &#8217;scafolmagic&#8217; things is that no one bothered to mention how you get from one version to the next&#8230; or back again. You can&#8217;t build the model right the first time, and iterative programming is a fact of life. This library addresses that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/python-not-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
