$Id: tunnel-alt.html,v 1.5 2005/01/19 18:13:10 jrandom Exp $

1) Tunnel overview
2) Tunnel operation
2.1) Message preprocessing
2.2) Gateway processing
2.3) Participant processing
2.4) Endpoint processing
2.5) Padding
2.6) Tunnel fragmentation
2.7) Alternatives
2.7.1) Adjust tunnel processing midstream
2.7.2) Use bidirectional tunnels
2.7.3) Backchannel communication
2.7.4) Variable size tunnel messages
3) Tunnel building
3.1) Peer selection
3.1.1) Exploratory tunnel peer selection
3.1.2) Client tunnel peer selection
3.2) Request delivery
3.3) Pooling
3.4) Alternatives
3.4.1) Telescopic building
3.4.2) Non-exploratory tunnels for management
4) Tunnel throttling
5) Mixing/batching

1) Tunnel overview

Within I2P, messages are passed in one direction through a virtual tunnel of peers, using whatever means are available to pass the message on to the next hop. Messages arrive at the tunnel's gateway, get bundled up and/or fragmented into fixed sizes tunnel messages, and are forwarded on to the next hop in the tunnel, which processes and verifies the validity of the message and sends it on to the next hop, and so on, until it reaches the tunnel endpoint. That endpoint takes the messages bundled up by the gateway and forwards them as instructed - either to another router, to another tunnel on another router, or locally.

Tunnels all work the same, but can be segmented into two different groups - inbound tunnels and outbound tunnels. The inbound tunnels have an untrusted gateway which passes messages down towards the tunnel creator, which serves as the tunnel endpoint. For outbound tunnels, the tunnel creator serves as the gateway, passing messages out to the remote endpoint.

The tunnel's creator selects exactly which peers will participate in the tunnel, and provides each with the necessary configuration data. They may have any number of hops, but may be constrained with various proof-of-work requests to add on additional steps. It is the intent to make it hard for either participants or third parties to determine the length of a tunnel, or even for colluding participants to determine whether they are a part of the same tunnel at all (barring the situation where colluding peers are next to each other in the tunnel).

Beyond their length, there are additional configurable parameters for each tunnel that can be used, such as a throttle on the frequency of messages delivered, how padding should be used, how long a tunnel should be in operation, whether to inject chaff messages, and what, if any, batching strategies should be employed.

In practice, a series of tunnel pools are used for different purposes - each local client destination has its own set of inbound tunnels and outbound tunnels, configured to meet its anonymity and performance needs. In addition, the router itself maintains a series of pools for participating in the network database and for managing the tunnels themselves.

I2P is an inherently packet switched network, even with these tunnels, allowing it to take advantage of multiple tunnels running in parallel, increasing resilience and balancing load. Outside of the core I2P layer, there is an optional end to end streaming library available for client applications, exposing TCP-esque operation, including message reordering, retransmission, congestion control, etc.

2) Tunnel operation

Tunnel operation has four distinct processes, taken on by various peers in the tunnel. First, the tunnel gateway accumulates a number of tunnel messages and preprocesses them into something for tunnel delivery. Next, that gateway encrypts that preprocessed data, then forwards it to the first hop. That peer, and subsequent tunnel participants, unwrap a layer of the encryption, verifying that it isn't a duplicate, then forward it on to the next peer. Eventually, the message arrives at the endpoint where the messages bundled by the gateway are split out again and forwarded on as requested.

Tunnel IDs are 4 byte numbers used at each hop - participants know what tunnel ID to listen for messages with and what tunnel ID they should be forwarded on as to the next hop, and each hop chooses the tunnel ID which they receive messages on. Tunnels themselves are short lived (10 minutes at the moment), but depending upon the tunnel's purpose, and though subsequent tunnels may be built using the same sequence of peers, each hop's tunnel ID will change.

2.1) Message preprocessing

When the gateway wants to deliver data through the tunnel, it first gathers zero or more I2NP messages, selects how much padding will be used, fragments it across the necessary number of 1KB tunnel messages, and decides how each I2NP message should be handled by the tunnel endpoint, encoding that data into the raw tunnel payload:

the first 4 bytes of the SHA256 of the remaining preprocessed data concatenated with the preIV, using the preIV as will be seen on the tunnel endpoint (for outbound tunnels) or the preIV as was seen on the tunnel gateway (for inbound tunnels) (see below for preIV processing).
0 or more bytes containing random nonzero integers
1 byte containing 0x00
a series of zero or more { instructions, message } pairs

The instructions are encoded with a single control byte, followed by any necessary additional information. The first bit in that control byte determines how the remainder of the header is interpreted - if it is not set, the message is eithernot fragmented or this is the first fragment in the message. If it is set, this is a follow on fragment.

With the first bit being 0, the instructions are:

1 byte control byte:

      bit 0: is follow on fragment?  (1 = true, 0 = false, must be 0)
   bits 1-2: delivery type
             (0x0 = LOCAL, 0x01 = TUNNEL, 0x02 = ROUTER)
      bit 3: delay included?  (1 = true, 0 = false)
      bit 4: fragmented?  (1 = true, 0 = false)
      bit 5: extended options?  (1 = true, 0 = false)
   bits 6-7: reserved

if the delivery type was TUNNEL, a 4 byte tunnel ID
if the delivery type was TUNNEL or ROUTER, a 32 byte router hash

if the delay included flag is true, a 1 byte value:

      bit 0: type (0 = strict, 1 = randomized)
   bits 1-7: delay exponent (2^value minutes)

if the fragmented flag is true, a 4 byte message ID

if the extended options flag is true:

   = a 1 byte option size (in bytes)
   = that many bytes

2 byte size of the I2NP message or this fragment

If the first bit being 1, the instructions are:

1 byte control byte:

      bit 0: is follow on fragment?  (1 = true, 0 = false, must be 1)
   bits 1-6: fragment number
      bit 7: is last? (1 = true, 0 = false)

4 byte message ID (same one defined in the first fragment)
2 byte size of this fragment

The I2NP message is encoded in its standard form, and the preprocessed payload must be padded to a multiple of 16 bytes.

2.2) Gateway processing

After the preprocessing of messages into a padded payload, the gateway builds a random 16 byte preIV value, iteratively encrypting it and the tunnel message as necessary, and forwards the tuple {tunnelID, preIV, encrypted tunnel message} to the next hop.

How encryption at the gateway is done depends on whether the tunnel is an inbound or an outbound tunnel. For inbound tunnels, they simply select a random preIV, postprocessing and updating it to generate the IV for the gateway and using that IV along side their own layer key to encrypt the preprocessed data. For outbound tunnels they must iteratively decrypt the (unencrypted) preIV and preprocessed data with the layer keys for all hops in the tunnel. The result of the outbound tunnel encryption is that when each peer encrypts it, the endpoint will recover the initial preprocessed data.

The preIV postprocessing should be a secure invertible transform of the received value capable of providing the full 16 byte IV necessary for AES256. At the moment, the plan is to use AES256 against the received preIV using that layer's IV key (a seperate session key delivered to the tunnel participant by the creator).

2.3) Participant processing

When a peer receives a tunnel message, it checks that the message came from the same previous hop as before (initialized when the first message comes through the tunnel). If the previous peer is a different router, the message is dropped. The participant then postprocesses and updates the preIV received to determine the current hop's IV, using that with the layer key to encrypt the tunnel message. The IV is added to a bloom filter maintained for that tunnel - if it is a duplicate, it is dropped The details of the hash functions used in the bloom filter are not yet worked out. Suggestions?. They then forwarding the tuple {nextTunnelID, nextPreIV, encrypted tunnel message} to the next hop.

2.4) Endpoint processing

After receiving and validating a tunnel message at the last hop in the tunnel, how the endpoint recovers the data encoded by the gateway depends upon whether the tunnel is an inbound or an outbound tunnel. For outbound tunnels, the endpoint encrypts the message with its layer key just like any other participant, exposing the preprocessed data. For inbound tunnels, the endpoint is also the tunnel creator so they can merely iteratively decrypt the preIV and message, using the layer keys (both message and IV keys) of each step in reverse order.

At this point, the tunnel endpoint has the preprocessed data sent by the gateway, which it may then parse out into the included I2NP messages and forwards them as requested in their delivery instructions.

2.5) Padding

Several tunnel padding strategies are possible, each with their own merits:

No padding
Padding to a random size
Padding to a fixed size
Padding to the closest KB
Padding to the closest exponential size (2^n bytes)

Which to use? no padding is most efficient, random padding is what we have now, fixed size would either be an extreme waste or force us to implement fragmentation. Padding to the closest exponential size (ala freenet) seems promising. Perhaps we should gather some stats on the net as to what size messages are, then see what costs and benefits would arise from different strategies? See gathered stats. The current plan is to pad to a fixed 1024 byte message size with fragmentation.

2.6) Tunnel fragmentation

To prevent adversaries from tagging the messages along the path by adjusting the message size, all tunnel messages are a fixed 1KB in size. To accommodate larger I2NP messages as well as to support smaller ones more efficiently, the gateway splits up the larger I2NP messages into fragments contained within each tunnel message. The endpoint will attempt to rebuild the I2NP message from the fragments for a short period of time, but will discard them as necessary.

2.7) Alternatives

2.7.1) Adjust tunnel processing midstream

While the simple tunnel routing algorithm should be sufficient for most cases, there are three alternatives that can be explored:

Have a peer other than the endpoint temporarily act as the termination point for a tunnel by adjusting the encryption used at the gateway to give them the plaintext of the preprocessed I2NP messages. Each peer could check to see whether they had the plaintext, processing the message when received as if they did.
Allow routers participating in a tunnel to remix the message before forwarding it on - bouncing it through one of that peer's own outbound tunnels, bearing instructions for delivery to the next hop.
Implement code for the tunnel creator to redefine a peer's "next hop" in the tunnel, allowing further dynamic redirection.

2.7.2) Use bidirectional tunnels

The current strategy of using two separate tunnels for inbound and outbound communication is not the only technique available, and it does have anonymity implications. On the positive side, by using separate tunnels it lessens the traffic data exposed for analysis to participants in a tunnel - for instance, peers in an outbound tunnel from a web browser would only see the traffic of an HTTP GET, while the peers in an inbound tunnel would see the payload delivered along the tunnel. With bidirectional tunnels, all participants would have access to the fact that e.g. 1KB was sent in one direction, then 100KB in the other. On the negative side, using unidirectional tunnels means that there are two sets of peers which need to be profiled and accounted for, and additional care must be taken to address the increased speed of predecessor attacks. The tunnel pooling and building process outlined below should minimize the worries of the predecessor attack, though if it were desired, it wouldn't be much trouble to build both the inbound and outbound tunnels along the same peers.

2.7.3) Backchannel communication

At the moment, the preIV values used are random values. However, it is possible for that 16 byte value to be used to send control messages from the gateway to the endpoint, or on outbound tunnels, from the gateway to any of the peers. The inbound gateway could encode certain values in the preIV once, which the endpoint would be able to recover (since it knows the endpoint is also the creator). For outbound tunnels, the creator could deliver certain values to the participants during the tunnel creation (e.g. "if you see 0x0 as the preIV, that means X", "0x1 means Y", etc). Since the gateway on the outbound tunnel is also the creator, they can build a preIV so that any of the peers will receive the correct value. The tunnel creator could even give the inbound tunnel gateway a series of preIV values which that gateway could use to communicate with individual participants exactly one time (though this would have issues regarding collusion detection)

This technique could later be used deliver message mid stream, or to allow the inbound gateway to tell the endpoint that it is being DoS'ed or otherwise soon to fail. At the moment, there are no plans to exploit this backchannel.

2.7.4) Variable size tunnel messages

While the transport layer may have its own fixed or variable message size, using its own fragmentation, the tunnel layer may instead use variable size tunnel messages. The difference is an issue of threat models - a fixed size at the transport layer helps reduce the information exposed to external adversaries (though overall flow analysis still works), but for internal adversaries (aka tunnel participants) the message size is exposed. Fixed size tunnel messages help reduce the information exposed to tunnel participants, but does not hide the information exposed to tunnel endpoints and gateways. Fixed size end to end messages hide the information exposed to all peers in the network.

As always, its a question of who I2P is trying to protect against. Variable sized tunnel messages are dangerous, as they allow participants to use the message size itself as a backchannel to other participants - e.g. if you see a 1337 byte message, you're on the same tunnel as another colluding peer. Even with a fixed set of allowable sizes (1024, 2048, 4096, etc), that backchannel still exists as peers could use the frequency of each size as the carrier (e.g. two 1024 byte messages followed by an 8192). Smaller messages do incur the overhead of the headers (IV, tunnel ID, hash portion, etc), but larger fixed size messages either increase latency (due to batching) or dramatically increase overhead (due to padding).

Perhaps we should have I2CP use small fixed size messages which are individually garlic wrapped so that the resulting size fits into a single tunnel message so that not even the tunnel endpoint and gateway can see the size. We'll then need to optimize the streaming lib to adjust to the smaller messages, but should be able to squeeze sufficient performance out of it. However, if the performance is unsatisfactory, we could explore the tradeoff of speed (and hence userbase) vs. further exposure of the message size to the gateways and endpoints. If even that is too slow, we could then review the tunnel size limitations vs. exposure to participating peers.

3) Tunnel building

When building a tunnel, the creator must send a request with the necessary configuration data to each of the hops in turn, starting with the endpoint, waiting for their reply, then moving on to the next earlier hop. These tunnel request messages and their replies are garlic wrapped so that only the router who knows the key can decrypt it, and the path taken in both directions is tunnel routed as well. There are three important dimensions to keep in mind when producing the tunnels: what peers are used (and where), how the requests are sent (and replies received), and how they are maintained.

3.1) Peer selection

Beyond the two types of tunnels - inbound and outbound - there are two styles of peer selection used for different tunnels - exploratory and client. Exploratory tunnels are used for both network database maintenance and tunnel maintenance, while client tunnels are used for end to end client messages.

3.1.1) Exploratory tunnel peer selection

Exploratory tunnels are built out of a random selection of peers from a subset of the network. The particular subset varies on the local router and on what their tunnel routing needs are. In general, the exploratory tunnels are built out of randomly selected peers who are in the peer's "not failing but active" profile category. The secondary purpose of the tunnels, beyond merely tunnel routing, is to find underutilized high capacity peers so that they can be promoted for use in client tunnels.

3.1.2) Client tunnel peer selection

Client tunnels are built with a more stringent set of requirements - the local router will select peers out of its "fast and high capacity" profile category so that performance and reliability will meet the needs of the client application. However, there are several important details beyond that basic selection that should be adhered to, depending upon the client's anonymity needs.

For some clients who are worried about adversaries mounting a predecessor attack, the tunnel selection can keep the peers selected in a strict order - if A, B, and C are in a tunnel, the hop after A is always B, and the hop after B is always C. A less strict ordering is also possible, assuring that while the hop after A may be B, B may never be before A. Other configuration options include the ability for just the inbound tunnel gateways and outbound tunnel endpoints to be fixed, or rotated on an MTBF rate.

3.2) Request delivery

As mentioned above, once the tunnel creator knows what peers should go into a tunnel and in what order, the creator builds a series of tunnel request messages, each containing the necessary information for that peer. For instance, participating tunnels will be given the 4 byte nonce with which to reply with, the 4 byte tunnel ID on which they are to send out the messages, the 32 byte hash of the next hop's identity, the 32 byte layer key used to remove a layer from the tunnel, and a 32 byte layer IV key used to transform the preIV into the IV. Of course, outbound tunnel endpoints are not given any "next hop" or "next tunnel ID" information. To allow replies, the request contains a random session tag and a random session key with which the peer may garlic encrypt their decision, as well as the tunnel to which that garlic should be sent. In addition to the above information, various client specific options may be included, such as what throttling to place on the tunnel, what padding or batch strategies to use, etc.

After building all of the request messages, they are garlic wrapped for the target router and sent out an exploratory tunnel. Upon receipt, that peer determines whether they can or will participate, and if it will, it selects the tunnel ID on which it will receive messages. It then garlic wraps and tunnel routes that agreement, tunnel ID, and the nonce provided in the request using the supplied information (session tag, garlic session key, tunnel ID to reply to, and router on which that tunnel listens). Upon receipt of the reply at the tunnel creator, the tunnel is considered valid on that hop (if accepted). Once all peers have accepted, the tunnel is active.

3.3) Pooling

To allow efficient operation, the router maintains a series of tunnel pools, each managing a group of tunnels used for a specific purpose with their own configuration. When a tunnel is needed for that purpose, the router selects one out of the appropriate pool at random. Overall, there are two exploratory tunnel pools - one inbound and one outbound - each using the router's exploration defaults. In addition, there is a pair of pools for each local destination - one inbound and one outbound tunnel. Those pools use the configuration specified when the local destination connected to the router, or the router's defaults if not specified.

Each pool has within its configuration a few key settings, defining how many tunnels to keep active, how many backup tunnels to maintain in case of failure, how frequently to test the tunnels, how long the tunnels should be, whether those lengths should be randomized, how often replacement tunnels should be built, as well as any of the other settings allowed when configuring individual tunnels.

3.4) Alternatives

3.4.1) Telescopic building

One question that may arise regarding the use of the exploratory tunnels for sending and receiving tunnel creation messages is how that impacts the tunnel's vulnerability to predecessor attacks. While the endpoints and gateways of those tunnels will be randomly distributed across the network (perhaps even including the tunnel creator in that set), another alternative is to use the tunnel pathways themselves to pass along the request and response, as is done in TOR. This, however, may lead to leaks during tunnel creation, allowing peers to discover how many hops there are later on in the tunnel by monitoring the timing or packet count as the tunnel is built. Techniques could be used to minimize this issue, such as using each of the hops as endpoints (per 2.7.2) for a random number of messages before continuing on to build the next hop.

3.4.2) Non-exploratory tunnels for management

A second alternative to the tunnel building process is to give the router an additional set of non-exploratory inbound and outbound pools, using those for the tunnel request and response. Assuming the router has a well integrated view of the network, this should not be necessary, but if the router was partitioned in some way, using non-exploratory pools for tunnel management would reduce the leakage of information about what peers are in the router's partition.

4) Tunnel throttling

Even though the tunnels within I2P bear a resemblance to a circuit switched network, everything within I2P is strictly message based - tunnels are merely accounting tricks to help organize the delivery of messages. No assumptions are made regarding reliability or ordering of messages, and retransmissions are left to higher levels (e.g. I2P's client layer streaming library). This allows I2P to take advantage of throttling techniques available to both packet switched and circuit switched networks. For instance, each router may keep track of the moving average of how much data each tunnel is using, combine that with all of the averages used by other tunnels the router is participating in, and be able to accept or reject additional tunnel participation requests based on its capacity and utilization. On the other hand, each router can simply drop messages that are beyond its capacity, exploiting the research used on the normal internet.

5) Mixing/batching

What strategies should be used at the gateway and at each hop for delaying, reordering, rerouting, or padding messages? To what extent should this be done automatically, how much should be configured as a per tunnel or per hop setting, and how should the tunnel's creator (and in turn, user) control this operation? All of this is left as unknown, to be worked out for I2P 3.0