AWS NAT Gateway Timeout behaviour

In the last weeks I worked a lot with java netty sockets and run in half-open TCP connections. This mean the connection is only open on the client site but the server closed the connection.

I could not reproduce the issue in my local environment, refactor on the application source didn’t help me and so I burned down a lot of hours to get a stable connection to the socket server without success.

After a while we go down to TCP Level to get more information what happened.

The environment with this issue was on AWS in a private network with a NAT gateway.

The Socket Server send a connection timeout after 15 seconds with a TCP-FIN. This goes throw the AWS NAT-gateway and send a TCP-RST to the application server. In this case the application server believes the connection is established, but it isn’t.

It could be fixed with change the NAT Gateway to a NAT Instance on AWS which will send also the TCP-FIN or add the application server connected directly to the public network.

Add the end there was also some other issues with the NAT Gateway and long-time timeouts on Web-Socket level.

Some Links for more Information are here:

“Timeout behavior: When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).”

Netty Github issue discussion

Half-open Connection on wikipedia