how to stream live videos with no latency (ffplay, mplayer) and what kind of wrapper could be used with ffplay?

Well, for a really low latency streaming scenario, you could try NTSC. Its latency can be under 63us (microseconds) ideally.

For digital streaming with quality approaching NTSC and a 40ms latency budget see rsaxvc's answer at 120hz. If you need Over The Air streaming, this is the best low-latency option I've seen and it's very well thought out and the resolution will scale with hardware capability.

If you mean digital streaming and you want good compression ratios, ie 1080p over wifi, then you are out of luck if you want less than 100ms of latency with today's commodity hardware, because in order for a compression algorithm to give a good compression ratio, it needs a lot of context. For example Mpeg 1 used 12 frames in an ipbbpbbpbbpb GOP (group of pictures) arrangement where i is an 'intra' frame which is effectively a jpeg still, a p is a predictive frame which encodes some movements between i and p frames, and b frames encode some spot fixups where the prediction didn't work very well. Anyhow, 12 frames even at 60fps is still 200ms, so that's 200ms just to capture the data, then some time to encode it, then some time to transmit it, then some time to decode it, then some time to buffer the audio so the soundcard doesn't run out of data while the CPU is sending a new block to the DMA memory region, and at the same time 2-3 frames of video need to be queued up to send to the video display in order to prevent tearing on a digital display. So really there's a minimum of 15 frames or 250ms, plus latency incurred in transmission. NTSC doesn't have such latencies because it's transmitted analog with the only 'compression' being two sneaky tricks: interlacing where only half of each frame is transmitted each time as alternate rows, even on one frame, odd on the next, and then the second trick is colour space compression by using 3 black and white pixels plus its phase discrimination to determine what colour is shown, so colour is transmitted at 1/3 the bandwidth of the brightness (luma) signal. Cool eh? And I guess you could say that the audio has a sort of 'compression' as well in that automatic gain control could be used to make a 20dB analog audio signal appear to provide closer to a 60dB experience by blasting our ears out of our heads at commercials due to the AGC jacking up the volume during the 2-3 seconds of silence between the show and the commercial. Later when we got higher fidelity audio circuits, commercials were actually broadcast louder than shows, but that was just their way of providing the same impact as the older TVs had given the advertisers.

This walk down memory lane brought to you by Nostalgia (tm). Buy Nostalgia brand toilet soap! ;-)

Here's the best I've achieved under Ubuntu 18.04 with stock ffmpeg and mpv. This requires a 3rd gen Intel Core processor or later. See ffmpeg site for directions to use NVidia hardware coding instead.

ffmpeg -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
  -vaapi_device /dev/dri/renderD128 \
  -vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
  -c:v h264_vaapi -qp:v 26 -bf 0 -tune zerolatency -f mpegts \
  udp://$HOST_IP:12345

And then on the Media box:

mpv --no-cache --untimed --no-demuxer-thread --video-sync=audio \
  --vd-lavc-threads=1 udp://$HOST_IP:12345 

This achieves about 250ms latency for [email protected] at around 3Mbps, which is ok for streaming shows over wifi. mpv can adjust for lip sync (CTRL +- during play). It's tolerable for streaming desktop mouse/keyboard interactions for media control, but it's unusable for real-time gaming (see NVidia Shield, Google Stadia for remote gaming)

One other thing: LCD/OLED/Plasma TVs, and some LCD monitors have Frame Interpolation, either via de-interlacing, or via SmoothVision (the "Soap Opera Effect"). This processing adds input lag. You can usually turn it off in the display's settings, or by connecting to the "PC" or "Console" input port if the display has a port marked that way. Some displays have a way to rename the inputs. In that case, selecting "PC" or "Console" may reduce the input lag, but you may notice colour banding, flickering, etc as a result of the extra processing being turned off.

CRT monitors have effectively zero input lag. But you'll get baked with ionizing radiation. Pick your poison.


The problem with traditional media players like VLC, ffmpeg, and to some extent, mplayer, is that they'll try to play at a consistent framerate, and this requires some buffering, which kills the latency target. The alternative is to render the incoming video as fast as you can, and not care about anything else.

@genpfault and I made a custom UDP protocol, planned for flying RC cars and quads. It's targets low latency at the expense of pretty much everything else(resolution, bitrate, packetrate, compression efficiency). At smaller resolutions, we got it to run over 115200 baud UART and XBEE, but video under those restrictions was not as useful as we'd hoped. Today I'm testing in a 320x240 configuration, running on a laptop(Intel i5-2540M), since I no longer have the original setup.

You need to plan your latency budget, here's where I spent mine:

  1. Acquisition - We picked 125FPS PS3 Eye cameras. So our latency here is at most a little over 8mS. 'Smarter' cameras which do compression onboard(either h264 or MJPEG) are to be avoided. Also, if your camera has any sort of auto-exposure timing, you'll need to disable it to lock it in the fastest framerate, or provide ample lighting(Today, my builtin webcam is only doing 8 FPS due to AE).
  2. Conversion - If possible, have the camera emit frames in a format you can compress directly(Generally YUV format, which the Eye supports natively). Then you can skip this step, but I'm spending 0.1mS here.
  3. Encoding - We used a specially tuned H.264. It takes ~2.5mS, and requires no buffering of future frames, at the cost of compression ratio.
  4. Transport - We used UDP over WiFi, <5mS when working correctly without a bunch of other radios interfering.
  5. Decoding - This is pretty much limited by the receiver's CPU. The encoder can help by sending work that is multithread decodable. This is usually faster than encode. ~1.5mS today.
  6. Conversion - Your decoder might do this step for you, but generally encoders/decoders speak YUV, and displays speak RGB, and someone has to convert between them. 0.1mS on my laptop.
  7. Display - Without VSYNC, a 60 FPS monitor has latency of up to ~17mS, plus some LCD latency, maybe 6ms? It really depends on the display and I'm not sure which panel this laptop has.

The total comes to: 40.2mS.

Encoding:

At the time, X264 was the best H264-AnnexB encoder we could find. We had to control for bitrate, slice-max-size, vbv-bufsize, vbv-maxrate. Start with the defaults for "superfast", and "zerolatency", which will disable B-frames.

Additionally, intra-frame refresh is a must! Effectively this allows chopping up the normal 'I' frame and mingling it up with the following P-frames. Without this, you'll have 'bubbles' in the bitrate demand that will temporarily clog your transport, increasing latency.

Encoding-Transport-Planning:

The encoder was tuned to generate UDP-sized H264 NALUs. This way, when a UDP packet was dropped, an entire H264 NALU was dropped, and we didn't have to resynchronize, the decoder just sort of...burped...and continued with some graphical corruption.

Final Results 320x240

enter image description here

It's...faster than I can measure reliably with a cell phone pointed at a camera pointed at my laptop. Compression ratio 320x240x2B = 150kB/frame, compressed down to a little over 3kB/frame.