Thursday, July 19, 2012

Timeline Analysis - What's missing & What's coming..

If you missed my SANS 360 on timeline analysis...

What the heck is timeline analysis??  

Timeline creation and presentation is the concept of normalizing event data by time and presenting it in chronological order for review. This sequence of event data becomes a narrative “a story” of events over a period of time. Furthermore, it can be used to put events into context, interpret complex data and identify anomalies or patterns. The concept of timeline creation and presentation is widely used amongst many practices including Digital Forensics and Incident Response (DFIR)

For DFIR purposes, timeline creation and presentation primarily consists of recursively scanning through a file system (or linear through a physical or partition disk image) and extracting forensic artifacts and associated timestamp data. The data is then converted to a normalized structured format in which it can be subsequently reviewed in chronological order.

Creation and Filtering

A tool named “log2timeline”, by Kristinn Gudjonsson, is a example of a framework for automatic creation of forensic timeline data. If you are interested in learning more about timeline creation and analysis using log2timeline I suggest starting with Kristinn's list of resource or taking the NEW SANS 508 class (here's a review I authored based on my experience). The main purpose of log2timeline is to provide a single interface to parse various log files and artifacts found in evidence such as file system data, windows event logs, windows registry last written times and Internet history. The data is then output to a structured format such as CSV, SQLite, or TLN.

After the timeline is created, it can be filtered using “l2t_process”. This tool allows a user to “reduce” the size of the timeline by creating a subset of data responsive to certain keywords or time/date restrictions. For instance, a 5 GB timeline file could be filtered to 3 GB by running l2t_process using a date filter to only show events that occurred between 2009 - 2010.

Presentation

At the time of writing this there is no commercial or open-source tool specifically designed for DFIR professionals to review the output of log2timeline or forensic timeline data in general. Therefore, DFIR professionals are limited to using tools not specifically designed for forensic timeline data presentation such as Microsoft Office Excel, Splunk or grep. This limitation decreases productivity and increases the risk of error.

Some deficiencies of current presentation options
Microsoft Office Excel is a common method of reviewing forensic timeline data. Although Microsoft Excel is a intuitive and robust, it has fundamental limitations. For example the average output from log2timeline (based on a 320GB hard drive) is 5-10 million rows of data equaling approximately 3-5GB. Microsoft Excel version 2010 has a row limitation of 1,048,576 rows and version 2003 has a limitation of 65,536 rows. This severely limits DFIR professionals to view parts (“mini-timelines”) of the overall timeline often based on filtering criteria (pivot points) such as date ranges, keyword searches, or source types. In result, context can be taken away by not having the entire timeline to review. It also can make reviewing timeline data a iterative process by having to review multiple mini-timelines. 

Slide from my SANS 360 talk
On November 19, 2011 Klien&Co published an article documenting how to empower Splunk to review timeline data. Splunk is a  robust enterprise based application that collects and indexes data from various data sources. Splunk will store both raw and rich data indexes in an efficient, compressed, filesystem-based datastore, with optional data signing and auditing to prove data integrity. However, Splunk is complicated to use as it requires knowledge of Command Line Interface (CLI) and specific training on the tool. It is also difficult to generate reports and administer as a user.

grep, a CLI tool, is another option to parse and review forensic timeline data. However, for the average DFIR professional who is not familiar with CLI it can be a complicated and a inefficient method.

The Need

A better #$%!$@ way to review timelines [period].

The goal of my first phase of development was to create a forensic presentation tool specifically for timeline data. This would be a robust Graphical User Interface application that does the following:
  • Import structured timeline data such as log2timeline CSV file into a structured database. This would allow for fast indexed searches across large data sets.
  • Upon import, the application would allow the user to preserve source information. This will allow a practitioner to review data from multiple data sources in a SUPER timeline and easily correlate events across these different sources.
  • Subsequently, the forensic timeline data will be displayed for review in a Graphical User Interface (GUI) data grid similar to Microsoft Excel. It will have familiar features such as the ability to sort, filter, and color code rows by column headings or values.    For instance, a user could have the ability to import timeline data from 10 different hosts, filter to only show successful logons (based on evt log source types) between 2009 and 2010 and color color code the results by host to make the review process easy on the eyes :-)
  • Unlike Excel make filtering transparent.. visually see and understand  how the buttons you are pressing interact with the database and the results you are presented with -- sql query builder.
  • The interface would also be intuitive to the extent a user could create user defined tags, comments, and bookmarks for the purpose of reporting, filtering and assisting review. For instance, a user could create the tag “evidence of IP theft” and subsequently select one or multiple rows in the data grid and associate them with this tag -- just like you can in eDiscovery!!
  • At any point timeline data generate or reports or export data from the grid view. For example, export a filtered subset of data back into the CSV format to open in Excel or send to someone else? 
  • Ability to create custom queries.. so user is not limited by the GUI - think plugins!!!
  • Also, basic charting capability because "a picture can sometimes tell a thousand words".
The Solution
Let me start of by saying does anyone know what it feels like to stare at code for 5 hours (on a Saturday afternoon when its 80 degrees and sunny out , with no bathroom/food breaks, and all of your friends are at the beach?) trying to figure out why your code is broken, then to find out it's because your missing a single curly bracket somewhere? Well that's been my life for the last 12 months since I started my coding project. If you don't believe me -- ask my friends, oh wait I don't have any anymore - this tool has ruined my life :-)
Picture of new GUI with undockable panes for multiple monitor setups

If you have not had an opportunity to watch the recorded video (1:06:38 mark) of my SANS DFIR Summit 360 talk from the or review slides , I introduced the proof-of-concept tool I have been coding. Here is a short video (no sound) of the tool in action (note this is the first release - and the GUI has significantly changed since)

The tool consists of:
  • WX GUI from-end
  • Python Code
  • SQLite backend
Shout out to my high school sciene teacher, Mr. Wilson, who introduced me to Python. I used Python because it's cross-platform. My development and testing platform is Windows 7. At the DFIR Summit, I gave Tom Yarrish a copy of my tool and within minutes he had it running on his Macbook Pro running OSX. Pretty cool..

You can see auto-highlighting by source type and POC charting here..
I will never understand why people prefer in the year 2012 to still type things into green and black console windows? Therefore, I used WX as a GUI front-end. Why did I use WX? Simple because it's the first thing that came up in my Google search for "Python+GUI+programming". In hindsight I wish Google told me just to quit.

Also used SQLite3 as a back-end because A.) It's lightweight - no install required B.) You know its fast if high-frequency traders use it C.) It's scalable enough to review timeline data.
Overview of current process and development phases:
 


Overview of data flow:

In red I am working on in Phase 2.

When can you get it?
 
I currently have someone doing a code review. It will be posted VERY soon on the log2timeline-tools google code page -  http://code.google.com/p/l2t-tools/

As I stated in my SANS 360 talk, "it will be free to corporate and private but LE has to pay for this one.. you guys need to pay me back for all those parking tickets!"-- I might also post a donation page or something.. so I can buy myself a vacation or something. 


Also I really look forward to feedback positive/negative so I can improve and include thoughts in my future employer performance discussions so I dont wind up becoming a Walmart Greeter :-)


Saturday, July 7, 2012

SANS DFIR Summit, Forensic4cast award, my presentations, now back to work!


The SANS Digital Forensic Incident Response Summit in Austin ROCKED! Rob Lee and all the SANS folks put on an awesome show.

SANS 508

For me it started with the new SANS 508 class. If you haven't seen the advertisements, check out "The APT is already in your network. Time to go hunting -- Learn how in the new training course SANS FOR508". All I can say is it's true.. Here's a few reasons why:

  • Conducting APT investigations requires "outside of the box" thinkers and 508 framed that picture well with a cutting edge curriculum. For instance, you learn necessities that are not even in commercial products yet such as Volume Shadow Copies. How you going to mount these with Encase or FTK??
  • I have experience teaching and know first hand how difficult it is to create labs. It was obvious that months if not years of effort where put into the new 508 lab. Also, having had real world experience conducting APT investigations, I can tell you that the labs are so real its scary! No joke, 2 weeks later and I am still playing with the lab images provided.
  •  
  • Speaking of labs, almost every section in the course has a lab associated to it. So not only do you learn about concepts you get to apply them hands on. The labs aren't point and click like some other training providers, these actually require thinking! I was also told that the labs build on each other throughout other SANS courses. For example, the malware you recover in the SANS 508 lab is the same malware you analyze in the SANS 610 - Reverse Engineering class.
  •  
  • Unlike other classes where I have always been the first one to finish and solve all the problems. I can honestly say I was challenged in 508 (Yes, Rob Lee, I was paying attention between conference calls :-)). For me, the memory analysis (Volatility and Redline) section was the biggest learning curve. Advanced topics like these can be eye openers to the fact there is always room to improve skills and keep learning at any level. 
  •  
  •  I was most impressed by how all the section content (e.g. file systems, memory analysis, timeline analysis, etc) all came together. Every investigation starts the same way, or at least should, with a analysis plan. 508 did a great just explaining how and when to use the various tools/methods introduced from a tactical perspective.
  •  Oh yeah, how can I forget .. We also got a copy of F-Response tactical, a 64 GB thumb drive, and the book File System Forensic Analysis .. now that is awesome!! 
Overall SANS 508 was an awesome class. I have to give a shout out to Alissa Torres who taught part of the memory analysis section and did a GREAT job. She also gave one of the best presentations at the summit, "Reasons Not to "Stay in Your Lane" as a Digital Forensics Examiner".

Also, the SANS @ Night presentation by Paul Henry on setting up VMware ESXi on Mac Minis was really different and cool. I might have to go buy a few Mac minis now..

SANS DFIR Summit

Now on to the summit part.. Having been to a lot of industry conferences, if you are a person who enjoys hard core DFIR and don't want to be annoyed by eDiscovery nuisances, the SANS DFIR summit is the premier place to learn, network and collaborate.

In my opinion, the networking and collaboration opportunities alone are worth attending for. By all means I have some good friends at home in Chicago, but there not geeks. Some time all I want to do is talk DFIR. On that note, I did nothing but talk geek to folks (too many to list) who I had never met in person before and old friends. In fact, I think David Kovar, Tom Yarrish and I collaborated a little too much... we could keep a team of programmers (or maybe just Steve Gibson) busy for the next year with all the great ideas we cooked up. Speaking of Steve Gibson, thanks for being a great local host in your home town, Austin.

Stemming from a conversation with David Kovar and Rob Lee's panel, if I could give one suggestion for next year, it be great to have some round table discussions on various topics. For instance, bring representatives from Guidance, Access Data, Internet Evidence Finder, etc in a room with the community and discuss how we can standardize things such as timeline outputs, evidence file formats, etc.. in open forum. Alternatively, perhaps have small break-off round table discussions (focus groups) with experts leading it.. so you could have like Kristen lead a break off on log2timeline, and have a bunch of fans or interested users talk openly open thoughts, wish list items, challenges, etc.

Oh yeah, if you havent seen the Closing Remarks for the SANS DFIR Summit. This is a MUST WATCH!!!

Forensic4cast Award

My Forensic4cast Award!
I am happy to announce that I won a forensic4cast award last week -- for writing "best forensic article of the year". For anyone that has not familiar, the article was titled Digital Forensics Sifting Cheating Timelines with log2timeline and had a accompanying reference guide that could be downloaded.

Thank you everyone who voted for me. It's great motivation to continuing to take initiative. What's next? Vote "davnads" for prezident!

Also thank you Lee Whitfield for putting this all together!

My DFIR presentations CEIC and SANS

I received positive feedback from a blog I posted a few months ago on Intellectual Property theft. So I decided to expand on this topic at Guidance Software's CEIC user conference. Ed Goings, Rick Lutkus, Dave Skidmore, and I organized a panel, titled "Investigating Intellectual Property theft", . This was turned out fantastic with our combined legal, corporate, and consulting perspectives. In fact I was shocked we had people standing at the back of the room at 8 AM in Las Vegas. In fact, I wasn't even sure if I would make it ;-) If you would like a copy of the presentation feel free to contact me.

Chad Tilburry, an AWESOME forensicator and SANS instructor, invited me to speak on his SANS DFIR Summit panel regarding "Building and Maintaining Digital Froensic Labs". I was excited to hear from people including Ken Johnson , who blogged about it, DFIR SUMMIT - Through the Eyes of a Summit Noob , that they found this presentation valuable.

I also gave a SANS 360 talk on the tool I have been developing. This was was recorded for viewing and my presentation can be found last (1:06:38 mark). Sorry the sound quality is not so great and the SANS laptop had technical difficulties (awk!) displaying the embedded video in  my presentation. The actual embedded movie can be downloaded as well (there is no sound). 



A slide from my SANS 360 talks
More to come on my tool soon -- In summary if you are not familiar with Kristinn Gudjonsson’s log2timeline, a framework for automatic creation of timeline data,  it's a "go to" tool for anything DFIR timeline analysis. If you have used the tool, you’ll also know that the amount of output for even just one computer can be a tremendous amount of data to review. Also there is no method specifically designed to review timeline data.  

Therefore I created a proof of concept front end for log2timeline data output.  It allows for easy filtering and reviewing timeline data. It is coded in Python (cross-platform) with a SQLite database backend and WX GUI. An example of its use is to aggregate timeline data from multiple hosts into one timeline to see lateral movement.

All SANS Summit presentations can be downloaded.

Now that my speaking engagements, conferences, and training budget is all dried up. I will get back to saving the world one megabyte a day ( :-)