Hey folks,

I'm rewriting a short url processor for my job. I had originally written it as a multi-threaded Perl script, which works, but has socket problems causing memory leaks. Since I'm rebuilding it to use memcache, and since I was learning Python outside of work anyway, figured I'd rewrite it in Python.

I'm using BaseHTTPServer, overriding do_GET and do_POST, and want to set up a custom logging mechanism so I don't have to rewrite a separate log parser, which I'll eventually rewrite in Python as well.

The problem I'm having, though, is that the BaseHTTPServer setup is outputting what appears to be an apache-style log to STDOUT, but the logging.debug or logging.info calls I make in the code are also going to STDOUT despite my attempt to use logging.basicConfig() overrides and setting a filename, etc.

Here's the basics of what I'm doing. Forgive my code, I've already been told it's "ugly", I'm new to Python and come from a background of Perl/PHP.

import struct
import string,cgi,time
import psycopg
import logging
import re
import memcache
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from time import strftime,localtime

class clientThread(BaseHTTPRequestHandler):
def log_my_request(self,method,request,short_url,http_code,long_url,cached,notes):
                        time.strftime("%Y-%m-%d %H:%M:%S",localtime()),
                        method, # get or post
                        request, # requested entity
short_url, # matching short_url based on entity, if any
                        http_code, # 200, 301, 302, 404, etc
                        long_url, # url to redirect to, if there was one
                        cached, # 'hit', 'miss', 'miss-db', 'error'
                        notes # extra notes for the log file only

        def do_GET(self)
# logic goes here for finding a short url form memcache, then writing the appropriate
                # output data to the socket, then logging happens:

def main():
        if mc.get('dbcheck'): # memcache already has some data
                print("memcache already primed with data")
        else: # nothing in memcache, so load it up from database
                print('Connecting to PG')
                cur.execute("SELECT count(*) FROM short_urls") ;
                mycount = cur.fetchone() ;
                print("fetching %s entries", mycount)
                cur.execute("SELECT short_url,long_url FROM short_urls")
                giant_list = cur.fetchall()

# cache a marker that tells us we've already initialized memcache with db data

# I'm sure there's a MUCH more efficient way of doing this ... multi-set of some sort?
                for i in giant_list:
                        if i[0]:
                                if i[1]:
                                        mc.set(i[0], i[1])

print("finished retrieving %s entries plus set up a new dictionary with all values" % mycount)

#{{ set up the socket, bind to port, and wait for incoming connections
                server = HTTPServer(('',8083), clientThread)
                print 'short url processing has begun'

# this is where I try to tell Python that I only want my message in my log: # no INFO:username prefix, etc., and also to write it to a file
logging.basicConfig(format='%(message)s', filename='/tmp/ian.txt')

        except KeyboardInterrupt:
                print '^C received, shutting down server'

My code runs without any errors, though I have left some code out of this Email that I didn't feel was relevant such as the logic of seeing if a short url exists in memcache, trying to fetch from the db if there was no match, and if the db lookup also fails, force-deleting short urls from memcache based on other instructions, that sort of thing. None of it deals with logging or the BaseHTTPServer code.

To recap, the code runs, redirects are working, but ALL output goes to STDOUT. I can understand that print statements would go to STDOUT, but the BaseHTTPServer seems to want to write the Apache-style log to STDOUT, and my logging.info() call also prints to STDOUT instead of my file.

I'd love to hear any thoughts from people that have had to deal with this. The logging is the last piece of the puzzle for me.

