<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jfrank &#187; python</title>
	<atom:link href="http://www.joshuafrankamp.com/blog/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.joshuafrankamp.com/blog</link>
	<description>technology and some random stuff</description>
	<lastBuildDate>Wed, 04 Jan 2012 20:29:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>It&#8217;s Not the Critic Who Counts: 2011 Part 1</title>
		<link>http://www.joshuafrankamp.com/blog/not-the-critic-2011-1/</link>
		<comments>http://www.joshuafrankamp.com/blog/not-the-critic-2011-1/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 23:23:39 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[magnolia]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=263</guid>
		<description><![CDATA[This has been a big year for me. So big that you all get a year-end recap of it because you are here, reading my blog. Except it&#8217;s too big to write in one post. So if you didn&#8217;t think you were going to get personal stuff mixed in to this mostly tech blog, now [...]]]></description>
			<content:encoded><![CDATA[<p>This has been a big year for me. So big that you all get a year-end recap of it because you are here, reading my blog. Except it&#8217;s too big to write in one post. So if you didn&#8217;t think you were going to get personal stuff mixed in to this mostly tech blog, now is the time to unsubscribe.</p>
<p>One of my favorite quotes is this:</p>
<blockquote><p>It is not the critic who counts: not the man who points out how the strong man stumbles or where the doer of deeds could have done better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood, who strives valiantly, who errs and comes up short again and again, because there is no effort without error or shortcoming, but who knows the great enthusiasms, the great devotions, who spends himself for a worthy cause; who, at the best, knows, in the end, the triumph of high achievement, and who, at the worst, if he fails, at least he fails while daring greatly, so that his place shall never be with those cold and timid souls who knew neither victory nor defeat &#8211; Roosevelt</p></blockquote>
<p>What this means to me is that I am getting comfortable not with success, but with failure. And with failure, slowly, haltingly, comes some measure of success.</p>
<h2>cloud surfing</h2>
<p>At the beginning of 2010 at my job at Mentor I enjoyed moving some more systems to AWS from an acquisition&#8217;s internal server farms. I closed out six years at Mentor; an excellent chapter in my life that I was sad to see end. I miss my co-workers, and agree with <a href="https://twitter.com/#!/barneyb/status/147888504733573120">Barney</a> &#8220;Twitter is such a poor excuse for seeing them every day.&#8221;  When I put my notice in, most were happy for me, some could hardly believe what I was doing. Others didn&#8217;t actually believe that I did not have a position to go to at another safe corporation. They kept asking what my real plan was, and I kept replying: &#8220;I&#8217;m taking a year off to study, to grow, to try new things.&#8221;</p>
<p><em>Technologies used: Bash, Railo, AWS API&#8217;s, Python</em></p>
<h2>life changes. no really it does!</h2>
<p>I<br />
quit my job in March,<br />
moved out of down town in June,<br />
rented a big old house a week later,<br />
became a foster parent in July,<br />
and enrolled as a fake student at PSU.</p>
<h2>it slices, it dices. well no, actually it only slices.</h2>
<p>I had a poor experience with a cloud dashboard company and built autosnappy.com as a response. It makes snapshots for AWS volumes on a schedule. So simple. Happy dance. It broke even almost immediately, and although it doesn&#8217;t make tons of money, there is a lot of room here for growth. A customer is asking to pay for me to develop new features for it currently so I may revisit it and roll out new things. I wanted to build this as compartmentalized as possible. One of my design goals was that the front end know as little as possible about the backing AWS services as it could. So it talks exclusively to the middle python tier, even though it is powerful enough to accomplish both functions. This means that if I needed to I could scale those components separately, and keep the user facing process in a separate linux user/group as the process that talks to AWS and has the security keys.</p>
<p><em>Technologies used: Java Magnolia and Railo templating front end, python web service backend,  AWS SimpleDb storage. Scaleable! Stateless! Cloud!</em></p>
<h2>on being a fake student</h2>
<p>Being a <em>fake</em> student is the best! When you take a one credit class at <a href="http://www.pdx.edu">PSU</a> you have access to super high speed internet, a research library, and a place to go work. I spent a large part of my year hacking code in Food For Thought cafe, next to artists, musicians, and hippies. But that&#8217;s not all. I also have access to the new(ish) Rec Center which has a pool, spa, rock wall and all the normal workout stuff.</p>
<p>Oh, and my one credit class? Yoga. So stressful.</p>
<p>Next post: <a href="http://www.joshuafrankamp.com/blog/not-the-critic-2011-2/">Basic Tools of Science and Flying Helicopters Upside Down</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/not-the-critic-2011-1/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Un-Shredding</title>
		<link>http://www.joshuafrankamp.com/blog/unshredding/</link>
		<comments>http://www.joshuafrankamp.com/blog/unshredding/#comments</comments>
		<pubDate>Fri, 09 Dec 2011 23:35:43 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=229</guid>
		<description><![CDATA[About a week before it closed I decided to start working on the Darpa Shredder Challenge. This challenge is to reassemble shredded documents using only the shredded pieces. Here is a quick video depicting an early part of the algorithm that I worked on.

Algorithm Notes
This video is a bunch of visualizations put back to back [...]]]></description>
			<content:encoded><![CDATA[<p>About a week before it closed I decided to start working on the <a href="http://www.shredderchallenge.com/">Darpa Shredder Challenge</a>. This challenge is to reassemble shredded documents using only the shredded pieces. Here is a quick video depicting an early part of the algorithm that I worked on.</p>
<p><iframe width="480" height="360" src="http://www.youtube.com/embed/lJPmdJ4YQYM" frameborder="0" allowfullscreen></iframe></p>
<h2>Algorithm Notes</h2>
<p>This video is a bunch of visualizations put back to back of the process described below. The algorithm is run against piece 1 of puzzle 1 of the challenge, shown below. You&#8217;ll note in my visualization it is upside down, this is simply due to the start of x,y in the graphing application is at the lower left, while the image library gave me x and y starting from upper left.</p>
<p><a href="http://www.joshuafrankamp.com/blog/wp-content/uploads/2011/12/1.png"><img class="alignnone size-full wp-image-250" title="1" src="http://www.joshuafrankamp.com/blog/wp-content/uploads/2011/12/1.png" alt="" width="157" height="486" /></a></p>
<h3>Edging</h3>
<p>Since I had very little time, I wanted to use as much existing code as possible. To find edges I used gimp/python-fu to posterize each piece 5 times. Each time that ran, it simplified the colors of the image. I then saved out the green channel. This made the edge finding as simple as finding the uniform green edge pieces and rolling around each piece. The edge I found is shown in the right hand side as black pixels.</p>
<h3>Segmentation</h3>
<p>Because each piece could match on only part of another piece, I wanted to find good matches based on local shape part matching. I chose to segment the piece by rolling through each location with surrounding bits by distance. The red X represents the current location, surrounded on either side by an arbitrary sized window shown as blue highlight.</p>
<h3>Rotation Invariance</h3>
<p>Each piece is given to you in a random rotation. So the segments between pieces needed to be fitted together in such a way as to ignore which way they were originally rotated. So next I found a &#8216;landscape horizon&#8217; using a linear average of the segment. I chose PCA because of the x/y arbitraryness in this problem. This is because <a href="http://en.wikipedia.org/wiki/Ordinary_least_squares">ordinary least squares</a> only penalizes on the y term, and I needed <a href="http://en.wikipedia.org/wiki/Total_least_squares">total least squares</a> so it would penalize on both terms equally. This is shown as a red line.</p>
<h3>Rotate!</h3>
<p>After finding the horizon, this becomes the new relative x axis for that segment. I use a linear transformation to rotate to a normalized inward facing view, and then transpose it to be centered at zero zero. This is the lower left portion of the video. Most flat sides are nearly equal to the x axis at that point, with variation easily seen. If you watch this window alone as you see the video it is as if a camera is going around the piece looking in on it with smooth transitions.</p>
<h3>Representing Shape</h3>
<p>Next I built a representation of surrounding shape at every edge location based on the new normalized x,y from the previous step. Surrounding shape should include other markings page lines and pen marks, but I haven&#8217;t included them yet. I converted the rotated Cartesian coordinates to a log polar system, and then used binning to produce a log polar 2d histogram of the surrounding shape. This is the upper right picture. I adjusted my binning to be more sensitive to changes in y at the horizon rather than in the middle because most shapes are fairly close to the x axis after they have been rotated. Each histogram results in 100 total bins, each with a count of the items in the segment. The degrees are left to right, while distance is the vertical dimension. Inspiration for this shape representation came from <a href="http://www.cs.berkeley.edu/~malik/papers/BMP-shape.pdf">Shape Matching and Object Recognition</a>.</p>
<h3>Cluster</h3>
<p>Using the 100 integer histogram arrays of shape I then clustered them using K means so that I could simplify the comparisons between groups of ordered segments. The result is that I can classify a segment&#8217;s shape as a single cluster membership, and an array of segments (a side of a piece) as an ordered list of those cluster assignments. (Not Shown)</p>
<h3>Fuzzy Segment-Cluster-Array Matching and Reassembly</h3>
<p>The next task is to match segment cluster arrays with others that are closest, indicating a general shape match. (Not Shown) This produces lists of candidate matches to aid in a manual or automatic assembly.</p>
<h3>Running Out Of Time</h3>
<p>
At about this point the contest ended. I am still working on getting the first puzzle completely automatically assembled. I learned a lot, and had a good time but the primary takeaway was this: visualize, visualize and then when in doubt, visualize. Visualization costs a lot, and building pretty graphs for yourself is a complete waste of time if you are a perfect coder. Countless times throughout this process I had implemented a part of it, and moved on but only later came back and visualized what I had actually done. Often that immediately showed what went wrong. One mistake that it helped me find was that the rotation and transposition that I had originally done were rotated correctly but transposed based on my original center, not the rotated center, so the histograms were erratic. Once I could see that the middle was not 0,0 (in the lower left picture) I immediately knew what the issue was. Slowly I came to learn that every minute spent visualizing a problem like this pays off immediately.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/unshredding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>python packaging</title>
		<link>http://www.joshuafrankamp.com/blog/python/</link>
		<comments>http://www.joshuafrankamp.com/blog/python/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 09:12:11 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=19</guid>
		<description><![CDATA[Python packaging is a pain in the ass.  There are some tools to make it easy, so easy in fact that it becomes even worse&#8230;
easy_install is the easiest thing since sliced bread. What does it do? Everything. Its so magic it probably installs itself recursively just for fun.
You want a package?
Ok just type this: [...]]]></description>
			<content:encoded><![CDATA[<p>Python packaging is a pain in the ass.  There are some tools to make it easy, so easy in fact that it becomes even worse&#8230;</p>
<p>easy_install is the easiest thing since sliced bread. What does it do? Everything. Its so magic it probably installs itself recursively just for fun.</p>
<p>You want a package?</p>
<p>Ok just type this: easy_install sqlalchemy (for the awesome ORM package for python)</p>
<p>It magically goes and finds sqlalchemy, and installs it INTO your system python installed path.</p>
<p>Why is the standard assumption that if I want to use a python package that is say a dependency for my project, that I want to INSTALL IT INTO PYTHON running on my system?</p>
<p>What kind of crazy idea is this? It causes all kinds of issues. The first and most obvious is: What If I have two programs that expect different versions of a given package? Since the packages are installed in to the runtime and not my app, you have to know about this issue and work around it.</p>
<p>If packages were managed the java way, the assumption would be that I want to install the package in the app that I am working on, not into /systemjdk/extensions/somePackage</p>
<p>The only argument FOR doing it this way that I can think of is saving disk space. Disk space is cheap. </p>
<p>/rant.</p>
<p>Ok so honestly,  can anyone tell me why this is?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GSA Statistics python scripts</title>
		<link>http://www.joshuafrankamp.com/blog/gsa-statistics-python-scripts/</link>
		<comments>http://www.joshuafrankamp.com/blog/gsa-statistics-python-scripts/#comments</comments>
		<pubDate>Mon, 11 Feb 2008 23:06:23 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[gsa statistics]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=9</guid>
		<description><![CDATA[I have been tasked with upgrading Google search appliances and in doing so I wanted to calculate some statistics.
Comparing Crawled Pages Across GSA&#8217;s (or mini&#8217;s)
You could use this to compare any two url xml files from the Crawl Diagnostics &#8211;&#62; Export All Pages To a File
I could have called them A and B, but I [...]]]></description>
			<content:encoded><![CDATA[<p>I have been tasked with upgrading Google search appliances and in doing so I wanted to calculate some statistics.</p>
<p><strong>Comparing Crawled Pages Across GSA&#8217;s (or mini&#8217;s)</strong></p>
<p>You could use this to compare any two url xml files from the Crawl Diagnostics &#8211;&gt; Export All Pages To a File<br />
I could have called them A and B, but I was comparing a mini and a gsa at the time, so the naming convention in the file goes.  It uses simple python sets to see what is crawled in one machine, but not the other etc. Expects two local files, mini-urls.xml and gsa-urls.xml; either could be a gsa or mini export.</p>
<pre>
import xml.dom.minidom

def main():
    miniUrlSet = extractUrls(xml.dom.minidom.parse('mini-urls.xml'))
    gsaUrlSet = extractUrls(xml.dom.minidom.parse('gsa-urls.xml'))
    print 'mini', len(miniUrlSet)
    print 'gsa', len(gsaUrlSet)
    print 'intersections', len(miniUrlSet &amp; gsaUrlSet)
    print 'mini is sub of gsa?', miniUrlSet &lt;= gsaUrlSet
    gsaNotMini = gsaUrlSet - miniUrlSet
    print 'things in gsa but not mini:', len(gsaNotMini)
    for i in gsaNotMini:
        print i
    miniNotGsa = miniUrlSet - gsaUrlSet
    print 'things in mini but not gsa', len(miniNotGsa)
    for i in miniNotGsa:
        print i

def extractUrls(dom):
    nodelist = dom.getElementsByTagName("loc")
    urls = set()
    for node in nodelist:
        urls.add(node.firstChild.data) #i know all loc nodes have a single child text node, text nodes have a data property
    return urls

main()</pre>
<p><strong>Calculating Search Keywords Density Over Time </strong></p>
<p>This is calculated against an export from the search logs feature under status and reports. You export the timeframe you want to compare over, and then run this against the log file. You get the top 100 keywords that people searched for, and the counts of how many times they were searched. Expects a local file log.log.</p>
<pre>
from datetime import datetime
from operator import itemgetter

def getQueryCounts(f):
       import re
       words = {}
       qReg = re.compile('.*?&amp;q=(.*?)&amp;')
       for l in f:
              keyword = qReg.findall(l)
              if(len(keyword) and len(keyword[0])):
                  words[keyword[0]] = words.get(keyword[0], 0) + 1
       return words

start = datetime.now()
f=open('log.log')
words = getQueryCounts(f)
f.close()
top = sorted(words.iteritems(),key=itemgetter(1),reverse=True)[:100]
print 'Top Words'
print '---------'
for word, num in top:
       print word, num
print 'runtime:', datetime.now() - start
raw_input("press enter")</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/gsa-statistics-python-scripts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>python. (not ruby)</title>
		<link>http://www.joshuafrankamp.com/blog/python-not-ruby/</link>
		<comments>http://www.joshuafrankamp.com/blog/python-not-ruby/#comments</comments>
		<pubDate>Thu, 20 Dec 2007 08:19:16 +0000</pubDate>
		<dc:creator>jfrank</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[vmware setup]]></category>

		<guid isPermaLink="false">http://www.joshuafrankamp.com/blog/?p=7</guid>
		<description><![CDATA[I finally got shared folders up and running on my virtual fedora box. This required a little kernel/kernel headers upgrading, and compiling the vmware tools for my box, but it works like a charm. It even gives me cut and paste to the win xp desktop, which is&#8230; cool i guess.
I decided to go with [...]]]></description>
			<content:encoded><![CDATA[<p>I finally got shared folders up and running on my virtual fedora box. This required a little kernel/kernel headers upgrading, and compiling the vmware tools for my box, but it works like a charm. It even gives me cut and paste to the win xp desktop, which is&#8230; cool i guess.</p>
<p>I decided to go with python, which has a plethora of tools. <a href="http://pylonshq.com/">Pylons</a> is a piecemeal web framework that is closest to my liking, <a href="http://code.google.com/p/sqlalchemy-migrate/">migrate</a> is a library for schema migration, which works nicely with <a href="http://www.sqlalchemy.org/">sqlalchemy</a>, a monster orm.</p>
<p>sqlalchemy is cool because you can use parts of it totally independently. Coming from a CF background I am used to having nice named/pooled connections that I don&#8217;t have to think about. The base layer of sqlalchemy is that, a database type abstraction and pooling. Then you are free to go crazy with ORMish things or not, its up to you.</p>
<p>It is so reusable many people have written layers on top of it for even more magical coding&#8230; but its nice to have all the options.</p>
<p>Migrate, a RoR knockoff is the real find though, it looks young (as far as a project goes) but I watched a demo of it used in another python framework and it was exactly as I expected, like something we use at work for CF.  It has a schema version table, that holds app state version, and version files with &#8216;up/down&#8217; methods. My main issue with many of these &#8217;scafolmagic&#8217; things is that no one bothered to mention how you get from one version to the next&#8230; or back again. You can&#8217;t build the model right the first time, and iterative programming is a fact of life. This library addresses that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joshuafrankamp.com/blog/python-not-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

