Hi everyone! I’m writing regarding a pull request I submitted (#614).
My workgroup is currently working on a project utilizing machine-learning and software-defined networking to detect and respond to malicious network activity. We are currently focused on internal Ethernet traffic, and one of our big challenges is capturing enough (network) data to sufficiently train our models. We are working with a number of organizations that wish to share data but want some basic levels of sanitization. Lack of modern, internal and benign traffic is a challenge for data science teams. In order to better facilitate data sharing between collaborating organizations, we attempted to address some common privacy/sensitivity issues by expanding tcpdump to create the following options: - Strip out the packet payload after TCP/UDP headers; and - Mask external IP addresses (i.e., those not included in the RFC 5735 reserved netblocks). We have been using our modifications internally and they appear to be stable. Our initial testing using machine learning based on this approach was pretty successful, and we would like to open up our research to collaboration with other entities. Tcpdump is so common in our circles that when we suggested enhancing it everyone we work with agreed it was a great option. Our proposed modification performs the above operations when writing to a savefile. The two flags that I’ve added were: - -0 to zero out packet data after TCP/UDP headers - -00 to truncate the packet data entirely (this saves space for large packet captures) - -* [mask_ip] to mask external IP addresses with a user-specified IP. In our enhancements these flags are available both when reading from an existing pcap file and when performing a live capture. The caveats are, this currently works solely for the Ethernet link layer (the scope of our project), the IPv6 protocol has not yet been supported, and it does not work when printing to screen (although the user will be warned at the outset). However, my workgroup would love to open this up to the rest of the open source community to facilitate broader information sharing and make network collections more accessible to data scientists. If there are other enhancements that might be helpful toward this topic, please let me know! Thanks, Alice (@lilchurro on github) P.S. If folks are curious, we have published some of our work, including: https://blog.cyberreboot.org/deep-session-learning-for-cyber-security-e7c0f6804b81 -- 🙋 Alice Chang 👾 Cyber Reboot Software Engineer @ In-Q-Tel "This e-mail, and any attachments hereto, may contain information that is privileged, proprietary, confidential and/or exempt from disclosure under law and are intended only for the designated addressee(s). If you are not the intended recipient of this message, or a person authorized to receive it on behalf of the intended recipient, you are hereby notified that you must not use, disseminate, copy in any form, or take any action based upon the email or information contained therein. If you have received this email in error, please permanently and immediately delete it and any copies of it, including any attachments, and promptly notify the sender at In-Q-Tel by reply e-mail, fax: 703-248-3001, or phone: 703-248-3000. Thank you for your cooperation." _______________________________________________ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers