Windows Media Foundation MFT buffering and video quality issues (Loss of colors, not so smooth curves, especially text)

Most consumer H.264 encoders sub-sample the color information to 4:2:0. (RGB to YUV) This means before the encode process even starts your RGB bitmap losses 75% of the color information. H.264 was more designed for natural content rather than screen capture. But there are codecs that are specifically designed to achieve good compression for screen content. For example: https://docs.microsoft.com/en-us/windows/desktop/medfound/usingthewindowsmediavideo9screencodec Even if you increase the bitrate of your H.264 encode - you are working only with 25% of the original color information to start with.

So your format changes look like this:

You start with 1920x1080 red, green and blue pixels. You transform to YUV. Now you have 1920x1080 luma, Cb and Cr. where Cb and Cr are color difference components. This is just a different way of representing colors. Now you scale the Cb and Cr plane to 1/4 of their original size. So your resulting Cb and Cr channels are around 960x540 and your luma plane is still 1920x1080. By scaling your color information from 1920x1080 to 960x540 - you are down to 25% of the original size. Then the full size luma plane and 25% color difference channels are passed into the encoder. This level of reducing the color information is called subsampling to 4:2:0. The subsampled input is required by the encoder and is done automatically by the media framework. There is not much you can do to escape it - outside from choosing a different format.

R = red
G = green
B = blue

Y = luminescence
U = blue difference  (Cb)
V = red difference  (Cr)

YUV is used to separate out a luma signal (Y) that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (U and V) that can be bandwidth-reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency. (Wikipedia)

Original format

RGB (4:4:4) 3 bytes per pixel

R  R  R  R   R  R  R  R    R  R  R  R   R  R  R  R
G  G  G  G   G  G  G  G    G  G  G  G   G  G  G  G
B  B  B  B   B  B  B  B    B  B  B  B   B  B  B  B

Encoder input format - before H.264 compression

YUV (4:2:0) 1.5 bytes per pixel (6 bytes per 4 pixel)

Y  Y  Y  Y   Y  Y  Y  Y   Y  Y  Y  Y   Y  Y  Y  Y
    UV           UV           UV           UV

I'm trying to understand your problem.

My program ScreenCaptureEncode uses default Microsoft encoder settings :

  • Profile : baseline
  • Level : 40
  • CODECAPI_AVEncCommonQuality : 70
  • Bitrate : 2000000

From my results, i think quality is good/acceptable.

You can change profile/level/bitrate with MF_MT_MPEG2_PROFILE/MF_MT_MPEG2_LEVEL/MF_MT_AVG_BITRATE. For CODECAPI_AVEncCommonQuality, it seems like you are trying to use a locally registered encoder, because you're on Win7, to set that value to 100, I guess.

but I do not think that will change things significantly.

So.

here is 3 screenshots with keyboard print screen :

  • the screen
  • the encoded screen, playing by a video player in fullscreen mode
  • the encoded screen, playing by a video player in a non fullscreen mode

enter image description here

The two last pictures are from the same video encoded file. The video player introduces aliasing when not playing in fullscreen mode. The same encoded file playing in fullscreen mode is not so bad, comparing to the original screen, and with default encoder settings. You should try this. I think we have to look at this more closely.

I think aliasing comes from your video player, and because not playing in fullscreen mode.

PS : I use the video player MPC-HC.

PS2: my program needs to be improved :

  • (not sure) use IDirect3D9Ex to improve buffered mechanism. On Windows7, for rendering, IDirect3D9Ex is better (no swap buffer). Perhaps it's the same for capture screen (todo list).
  • I should use two threads, one for capture screen, and one for encoding.

EDIT

Did you read this :

CODECAPI_AVLowLatencyMode

Low-latency mode is useful for real-time communications or live capture, when latency should be minimized. However, low-latency mode might also reduce the decoding or encoding quality.

About why my program using MFVideoFormat_RGB32 and yours using MFVideoFormat_YUY2. By default, SinkWriter has converters enable. The SinkWriter converts MFVideoFormat_RGB32 to a compatible h264 encoder format. For Microsoft encoder, read this : H.264 Video Encoder

Input format :

  • MFVideoFormat_I420
  • MFVideoFormat_IYUV
  • MFVideoFormat_NV12
  • MFVideoFormat_YUY2
  • MFVideoFormat_YV12

So there is no MFVideoFormat_RGB32. The SinkWriter does the conversion using the Color Converter DSP, I think.

so definitely, the problem does not come from converting rgb to yuv, before encoding.

PS (last)

like Markus Schumann said ;

H.264 was more designed for natural content rather than screen capture.

He should have mentioned that the problem is particularly related to text capture.

You just have found encoder limitation. I just think that no encoder is optmized for text encoding, with an acceptable streching, like I mention with video player rendering.

You see aliasing on final video capture, because it is fixed information inside the movie. Playing this movie in fullscreen (same as capture) is OK.

On Windows, text is calculate according to the screen resolution. So display is always good.

this is my last conclusion.