Apr 122011
Yesterday I needed to decipher a log file in which a dozen threads were simultaneously logging messages. Surely there must be tools for this out there. But I couldn’t find one, so I wrote a Python script to indent each line differently based on thread id. I then looked at it in LibreOffice, but just reading on the terminal would have sufficed. Here’s a trivial demo:
2011-04-11 09:40:12,004 [INFO] [http-3] Hello 2011-04-11 09:40:13,554 [DEBUG] [http-1] Wikipedia (pronounced /ˌwɪkɨˈpiːdi.ə/ WIK-i-PEE-dee-ə) 2011-04-11 09:40:13,605 [INFO] [http-2] PCC Natural Markets 2011-04-11 09:40:13,688 [INFO] [http-3] World 2011-04-11 09:40:14,015 [INFO] [http-2] began as a food-buying club of 15 families in 1953. 2011-04-11 09:40:16,032 [INFO] [http-1] is a multilingual, web-based, free-content encyclopedia project based on an openly editable model 2011-04-11 09:40:17,775 [INFO] [http-2] Today, it's the largest consumer-owned natural food retail co-operative in the United States.
becomes:
09:40:12,004 Hello 09:40:13,554 Wikipedia 09:40:13,554 (pronounced /ˌwɪkɨˈpiːdi.ə/ WIK-i-PEE-dee-ə) 09:40:13,605 PCC Natural Markets 09:40:13,688 World 09:40:14,015 began as a food-buying club of 15 families in 1953. 09:40:16,032 is a multilingual, web-based, free-content encyclopedia project based on an openly editable model 09:40:17,775 Today, it's the largest consumer-owned natural food retail co-operative in the United States. http-3 http-1 http-2
Just read down the columns vertical for a clear chain of events on each thread. Code below is maintained on GitHub.
#!/usr/bin/env python # Looks at token in a particular position in each line and indents the line # differently for each unique identifier found in the file. For example, given # a log file which contains a thread identifier, contents for each thread will # be separated out into distinct columns. # # Lines not matching the pattern (e.g. stack traces) are presumed to have # occurred at the time of and belong to the same identifier as the preceding line. # # Default pattern is: <date> <stamp> ignored [thread_id] <message> # Yielding output: <stamp><tabs><message> # # An alternate regular expression can be supplied on the command line; it must # include named capture groups 'stamp', 'id', and 'message'. The default regex # is: ^\S+ (?P<stamp>\S+) \S+ \[(?P<id>[^\]]+)\] (?P<message>.*) # # If the input contains very long lines it can be helpful to truncate them # beforehand by e.g. piping through awk '{print substr($0,0,400)}' import sys,re if len(sys.argv) > 1: pattern = re.compile(sys.argv[1]) else: pattern = re.compile('^\S+ (?P<stamp>\S+) \S+ \[(?P<id>[^\]]+)\] (?P<message>.*)') delimiter='\t' max_level = 1 categories = {} legend = None indent = "" stamp = "" try: for line in [l.strip() for l in sys.stdin]: m = pattern.match(line) if m: stamp,identifier,message = [m.group(x) for x in ['stamp','id','message']] indent = categories.get(identifier) if not legend: legend = " " * len(stamp) if not indent: indent = delimiter * max_level categories[identifier] = indent max_level += 1 legend += delimiter + identifier print stamp + indent + message else: # carry over stamp and indent from previous line print stamp + indent + line print legend except IOError: pass