What does WPA2 traffic look like to a packet sniffer that is not connected to the network?

It doesn't ultimately matter whether the attacker is connected to the WPA2-PSK network or not: if an attacker can capture the 4-way handshake and knows the pre-shared key, it's possible to decrypt all traffic.

Theory

That said, if you just look at individual 802.11 datagrams without analysing their flow any further, they are link layer (L2) datagrams containing link layer information: four MAC addresses.

Datagram format

Those addresses represents the transmitter (TA), receiver (RA), source (SA) and destination (DA) addresses, but their meaning and placement in the header depends on the values of the To DS and the From DS bits in the header, as described in the document IEEE 802.11-05/0710r0:

Note that Address 1 always holds the receiver address of the intended receiver (or, in the case of multicast frames, receivers), and that Address 2 always holds the address of the station that is transmitting the frame. Table: Address field contents

WPA2 a.k.a. IEEE 802.11i-2004 doesn't encrypt these L2 headers, but all the information in the frame body, i.e. everything from network layer (L3) up to application layer (L7).

To address your concerns:

  • IP addresses are network layer (L3) information → encrypted.
  • TCP/UDP ports are transport layer (L4) information → encrypted.

On the other hand, there's still some information disclosed:

  • MAC addresses may reveal the manufacturer of the devices, which could be used for finding possibly vulnerable access points or e.g. IoT devices.
  • IPv4 & IPv6 multicast can be detected: although the IP headers including the multicast addresses are encrypted, the destination MAC addresses are standard and known (01-00-5E-, 33-33-).
  • The BSSID associates the datagram to a certain pair of AP & SSID, as an AP as a transmitter and as a receiver is distinguished solely based on the BSSID, and the SSID is typically broadcasted in the IEEE 802.11 Beacon frame. Therefore, the amount of data every device is transmitting can be associated to the network they are using, as well as the timing. E.g. if someone had an IoT washing machine we could identify, we could guess their laundry day.
  • Although the SSID might not be broadcasted with the beacon frames, devices connecting to the network will reveal the SSID in probe request frames. Also, computers trying to connect to saved networks will leak this information periodically even when they are not near any access points of this network. E.g. from my home I can guess several companies my neighbours are working for.

To be more exact, the data is encrypted and authenticated, but the 802.11 headers are only authenticated in both CCMP (AES) and TKIP. That is summarized in these two diagrams from 802.11i Overview, IEEE 802.11-04/0123r1:

CCMP MPDU Format

CCMP MPDU Format

TKIP Design (1) – MPDU Format s

TKIP Design (1) – MPDU Format s


Practice

If you concretely want to see yourself how it looks like in detail, you could e.g.

  1. Monitor a wireless adapter using airmon-ng. No need to connect to the network.

  2. Capture some packets using airodump-ng.

  3. Investigate the capture file (.cap) using Wireshark.

    • First without decrypting anything. (That's what you were asking.)

      Wireshark: list of 802.11 datagrams

      Wireshark: details of a datagram

  4. Configure Wireshark for decrypting the WPA2 and inspect the capture again.

    • Go to Preferences > Protocols > IEEE 802.11,
      [x] Enable decryption
      and add the PSK key for decryption.

    • Open the (.cap) again and let Wireshark decrypt it.