Hi everyone!

I’m writing regarding a pull request I submitted (#614).

My workgroup is currently working on a project utilizing machine-learning and 
software-defined networking to detect and respond to malicious network 
activity. We are currently focused on internal Ethernet traffic, and one of our 
big challenges is capturing enough (network) data to sufficiently train our 
models. We are working with a number of organizations that wish to share data 
but want some basic levels of sanitization. Lack of modern, internal and benign 
traffic is a challenge for data science teams.

In order to better facilitate data sharing between collaborating organizations, 
we attempted to address some common privacy/sensitivity issues by expanding 
tcpdump to create the following options:

-          Strip out the packet payload after TCP/UDP headers; and

-          Mask external IP addresses (i.e., those not included in the RFC 5735 
reserved netblocks).

We have been using our modifications internally and they appear to be stable. 
Our initial testing using machine learning based on this approach was pretty 
successful, and we would like to open up our research to collaboration with 
other entities. Tcpdump is so common in our circles that when we suggested 
enhancing it everyone we work with agreed it was a great option. Our proposed 
modification performs the above operations when writing to a savefile. The two 
flags that I’ve added were:

-          -0 to zero out packet data after TCP/UDP headers

-          -00 to truncate the packet data entirely (this saves space for large 
packet captures)

-          -* [mask_ip] to mask external IP addresses with a user-specified IP.

In our enhancements these flags are available both when reading from an 
existing pcap file and when performing a live capture. The caveats are, this 
currently works solely for the Ethernet link layer (the scope of our project), 
the IPv6 protocol has not yet been supported, and it does not work when 
printing to screen (although the user will be warned at the outset). However, 
my workgroup would love to open this up to the rest of the open source 
community to facilitate broader information sharing and make network 
collections more accessible to data scientists.

If there are other enhancements that might be helpful toward this topic, please 
let me know!


Thanks,
Alice
(@lilchurro on github)

P.S. If folks are curious, we have published some of our work, including:
https://blog.cyberreboot.org/deep-session-learning-for-cyber-security-e7c0f6804b81


--
🙋 Alice Chang
👾 Cyber Reboot Software Engineer @ In-Q-Tel




"This e-mail, and any attachments hereto, may contain information that is 
privileged, proprietary, confidential and/or exempt from disclosure under law 
and are intended only for the designated addressee(s). If you are not the 
intended recipient of this message, or a person authorized to receive it on 
behalf of the intended recipient, you are hereby notified that you must not 
use, disseminate, copy in any form, or take any action based upon the email or 
information contained therein. If you have received this email in error, please 
permanently and immediately delete it and any copies of it, including any 
attachments, and promptly notify the sender at In-Q-Tel by reply e-mail, fax: 
703-248-3001, or phone: 703-248-3000. Thank you for your cooperation."
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Reply via email to