Summary of the netfilter developer workshop at Linux-Kongress 2001 Enschede, NL, 26/27/28 November 2001 ====================================================================== The netfilter/iptables project: Firewalling for Linux >= 2.4.x http://www.netfilter.org/ Participants: Paul 'Rusty' Russell coreteam, founder Marc Boucher coreteam Harald Welte coreteam Andras Kis-Szabo IPv6 maintainer Jozsef Kadlecsik conntrack, newnat Balazs Scheidler transparent proxies Fabrice Marie various extensions Jay Schulist nfnetlink, ctnetlink Lennert Buytenhek netfilter bridging code Tommi Virtanen debian developer Gianni Tedesco NETLINK target Michael Bellion student Thomas Heinz student 1. Conntrack timeouts (ACK/FIN issue) Several users have been reporting erroneosly 'INVALID' ACK/FIN TCP packets. The cause are half-open TCP connections without traffic. It seems that there are several implementations of the HTTP 1.1 keepalive mechanism that produce half-closed TCP connections without any payload for a significan amount of time. Rusty has a proposed patch which increments the CLOSE_WAIT timeout from 60 to 120 seconds. Jozsef has done some testing while incrementing the timeout to 5 days. Neither of the two solutions sounds good, 120secs most likely not long enough, 5 days too long (somebody reboots his box and the connection is never closed => conntrack entries stay for 5 days). To find a solution (some apropriate value), we should gather some empirical statistics. Jozsef is going to prepare a patch against current stable kernel which produces this output. The patch increases the timeout to some big value (some hours or one day) and prints the remainining timeout at the time the connection is closed in the other direction. By putting this patch into a couple of setups, we should get statistical data about how long most of the half-closed connections remain in this state before being closed. 2. TCP window tracking Jozsef has provided us with an implementation of full TCP window tracking. The code is stable for more than a year. He's keeping it up to date with current kernels. The coreteam likes the code, and it should have gotten into the mainstream kernel for a long time. We definitely want to put it into the kernel, but first want to make sure that we don't have any false positives (out of window packets, ...). We need more debugging on when the out of window packets appear. To debug this, people need to fully tcpdump their traffic and then get a out-of-window log with exact timestamps (to verify with tcpdump logs). There's also one remaining issue with TCP window tracking in general: As soon as we do a connection pickup without seeing the SYN packets, we don't know if a window scaling option was used. There is no way to find this out, merely by heuristics. As connection pickup shouldn't happen that often, this is considered a minor issue. The patch will contain a debug-mode and a normal mode. In normal mode there are no sysctl values, in debug mode sysctl's for all timeouts are available. 3. newnat infrastructure As the current netfilter conntrack/nat helper API can only deal with one expectation per master connection at a time. As this works for simple protocol helpers (FTP), it doesn't work with IRC and more sophisticated protocol like H.323, RealAudio, ... Harald Welte redesigned the conntrack/nat framework in the beginning of this year in order to support multiple outstanding expectations at a time. Jozsef picked up on Haralds initial patches and is finishing development right now. A slightly out-of date description about what architectural changes the newnat patches introduce can be found at netfilter/patches/newnat-summary.txt in our CVS tree (http://pserver.samba.org/cgi-bin/cvsweb/netfilter/patches/newnat-summary.txt?rev=1.3). The current newnat code (as in patch-o-matic) will get into the stable kernel soon. All new conntrack/nat helpers should be developed for the new API rather than for the current kernel. In order to make life for protocols with dynamic port numbers (h323, ...) easier, we will remove the re-set of expectfn in alter_reply. Additionally, the #ifdef CONFIG_IP_NF_FTP / #endif style construct in ip_conntrack.h is going to be removed for all helpers. This means we will have the same size of struct ip_conntrack independent of which protocol helpers ar compiled or not. 4. bridging firewall Lennert has given a nice overview about his current work with regard to bridging firewalls. He's put a significant amount of time into developing complete firewalling and NAT support on a bridge. Having packet filters, even stateful packet filters on a bridge rater than a router is not something completely new. But doing network address translation on a bridge seems a totally new concept. There are no big issues with his code. It's complete and fairly stable. The only problematic part is interface matching. If people want to match on incoming/outgoing interfaces, they could only match the whole bridge interface (br0) instead of individual ethernet interfaces (eth0). The patch for making this change in the core kernel adds new members to struct sk_buff, which is a religious issue with the core networking people. We made a decision on how to move ahead: - Lennert submits support for bridging packet filtering to netfilter core team, which will put this patch in patch-o-matic, and after some testing submit it to the main kernel. This patch is not allowed to create new members of struct sk_buff, for obvious reasons. - bridge support for NAT will stay outside of the 2.4.x kernels but exist as an incremental patch inside netfilter patch-o-matic. 5. Conntrack exemptions Our current connection tracking system either tracks all connections going through the machine or none. Though this is the right thing to do for 99.9% of all cases, there are some very special setups needing conntrack exemptions. The exepmtions are to be treated very carefully since it is extremely easy to break conntrack-using systems like NAT. In order to do exemptions to connection tracking, we would need to have a table attached to the PRE_ROUTING hook with a priority before conntrack gets executed. The table's name is 'notrack'. The table would set up the nfct field of the skb to point to some dummy conntrack entry. The connection tracking core would then check against this dummy entry before setting nfct. The state match is going to be extended with a --state UNTRACKED extension, which checks nfct against the special dummy conntrack entry. This feature will only get in patch-o-matic and 2.5.x. 6. Five hooked mangle table The current mangle table only attaches to three of the five netfilter hooks. There are complex usage scenarios with policy routing, where users would need to be able to MARK packets at different hooks. As marking is only possible in the mangle table, the mangle table needs to be extended to all five netfilter hooks. Brad Chapman has provided a patch. This patch is going to be added to patch-o-matic and submitted to the core kernel soon. 7. Future of the in-kernel data structure of an IP table Currently an IP table is a contiguous chunk of memory containing all the rules with lots of relative pointers inside. This has the advantage of atomically replacing the whole table, but at a high cost: Dynamic rule changes are extremely expensive, and the kernel has to do lots of checking. We consider this as a mistake and want to change the structure for 2.5.x back to linked lists of rules. A table is thus a linked list of chains, which are themselves a linked list of rules. This will go hand in hand with iptnetlink, as described below. 8. nfnetlink / ctnetlink / iptnetlink Jay Schulist has done some excellent work regarding the kernel/userspace interface of netfilter/iptables. It began with a userspace interface to manipulate conntrack entries via netlink, and after some discussions with the core team we decided to have some generalized netlink-based kernel/userspace interface in the future. Jay gave an overview about his work, which is greatly appreciated by the core team. As for the future: - nfnetlink will be a generic layer for all netfilter-related kernel/userspace communication - ctnetlink will sit on top of nfnetlink and provide mechanisms for manipulating conntrack entries, expectations and nat mappings - iptnetlink will obsolete the current getsockopt/setsockopt interface for IP Table manipulations. It will offer manipulation of IP tables on a per-rule granularity in combination with the linked-list ip tables (see 7.) Administrative: - nfnetlink and ctnetlink are aimed for 2.4.x inclusion - ULOG and ip_queue will be ported to nfnetlink for 2.5.x. This would mean that all firewalling related kernel/userspace communication would use one unified interface. 9. Userspace commandline tool 'iptables' The userspace commandline too iptables is used for manipulation of in-kernel IP tables. It is based on a thin low-level library (libiptc) which encapsulates rule+counter manipulation as well as kernel/userspace communication. For new extension kernel modules, iptables provides plugins. Every new match or target has to be accompanied by its respective userspace counterpart, an iptables plugin There are a couple of problems with the current approach of iptables: - the plugins are bound to commandline parsing (getopt, ...) and thus not flexible enough for alternative interfaces (gui, firewall languages, ...) - libiptc is extremely low-level and of no use if you don't know about the plugins and their datastructures - the macro-based approach for iptables / ip6tables shared code is not the best idea. Results of the discussion: - libiptc is going to disappear, since in 2.5.x all rule changes (of the new linked list internal data structures) will be made using iptnetlink. - the plugin handling and big parts of iptables will move into a new library called libiptables. This library can be used to query the available matches and targets as well as their parameters, valid values, help, ... - multiple frontends (iptables style, ip/tc style, ...) will run on top of this high-level library. iptables style will be supported by the netfilter project, others by 3rd party. 10. Transparent proxying The transparent proxying shortcomings in the 2.4.x kernel are obvious. Balasz and Rusty discussed possible implementation details. Transparent Proxies are not very closely related to iptables/netfilter, thus the discussion should take place within the general linux kernel networking development community (netdev@oss.sgi.com). Another issue coming into discussion was a two-phase-accept, where a transparent proxy could become even more transparent to the application, especially in the case where the real server returns a connection refused, which is not propageted to the client but results in a connection reset after the connection to the proxy has been established. Marc and Lennert are especially interested in this, and they will take care of further discussion + implementation. 11. Debugging aids Debugging complex rulesets within the different tables, policy routing, etc. is extremely difficult. Ideally we would have some packet tracing functionality, where a user could get a detailed log of what happened to his packet: When the packet traversed which chain, which decision was made, where it was altered in which way, which routing table made the routing decision and where to, ... Of course we cannot just do this for every packet, because most people would want to run this in a production enviroment where we would produce tons of uninteresting logs per second. So we need some classification for telling the packet tracing system about which packets we want to trace. The proposed way is to add a debug table (priority before everything else) which marks the packet in a certain way (_not_ nfmark) and then have a special macro called at peculiar places in the network stack. The macro generates event messages about every debug-marked packet. The event messages are sent to userspace (netlink/syslog/...) for further analyzation. 12. failover / state replication Lots of people are interested in high available firwalls (failover). As this works without any problems with stateless packet filters, we run into problem as soon as the firewall keeps any state (conntrack/nat). Harald has set up a netfilter-failover mailinglist, and started discussion with interested people. Unfortunately the project came to a standstill, as Harald didn't have the time anymore. Hopefully we will find some sponsoring in the future. There are two basic ways of implementation 1 Full state replication from one master to multiple clients using ctnetlink and a userspace process which distributes state changes to the slave boxes. + reliable, solid solution + does support NAT out of the box - complex implementation - lots of overhead on the master - lots of state update traffic between the firewalls The overhead could be improved if we'd integrate the state update protocol sender/receiver within the kernel. But it's still big. 2 "poor man's failover", where all firewalls are connected to all interface, and each one does it's own tracking. Only one box has forwarding enabled, all other ones just do state tracking and once failover occurs enable forwarding. + extremely easy to implement + same amount of work on all machines, no performance degrade at all - no NAT support - what to do after one machine is rebooted (initial sync?) - would need shared medium (hubs) on all interfaces => no full duplex Currently are no implementation plans (missing sponsor), although Harald thinks about implementing the poor-man's approach once he has some time ;) 13. multi-packet expectation causes We have a problem with all of our conntrack and NAT helpers as soon as an expectation-cause (e.g. PORT command) is spread over multiple packets. Currently we just drop the packet, hoping that before retransmission the sending TCP stack is coalescing the packets. Hoever, there seem some implementations which don't behave like this. We should ask our users to report 'partial match' messages in syslog to us, so we can get an overview of how often this happens and results in a permanent error. The code is going to be changed in order to only report cases where the 'drop and wait for retransmit' strategy fails. 14. Organizational stuff - patch-o-matic scoreboard Users should be able to vote for patches in patch-o-matic and give more feedback about what is working for them or not. - testsuite Rusty will update the testsuite or delete it from CVS. - milestones we don't want milestones like other projects (Mozilla, ...). We don't think that this is necessarry. - bugtracking system We need to find a volunteer who will alter and install a suitable system on netfilter.gnumonks.org - new homepage Page looks nice but the way it was built from templates is unsuitable for our needs since it requires proprietary windows software (eek!). New homepage will appear under www.netfilter.org / www.iptables.org, which are round-robin dns entries for the three (or even more) sites (Update: New homepage in place since 09 Jan 2002) - example repository put cgi on new homepage and make announcement asking people to contribute their firewall scripts as examples. - cvs snapshots ... are a good idea. Harald needs to write a small script for creating them. - CVS server will move to a seperate machine at gnumonks.org and get rsync'ed to samba.org for public access. This way we can give more people cvs write access without needing more samba.org accounts. 15. Thanks The netfilter coreteam and developers want to thank the Organizers of Linux Kongress 2001 for hosting this first netfilter developer workshop. We also want to thank the generous sponsors of this event, especially the German Ministry of Education and Research. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*)