Oldskooler Ramblings

the unlikely child born of the home computer wars

Using A Sharper Stick

Posted by Trixter on November 22, 2007


I met with Brian the other day to help out with some MobyGames infrastructure activities and we rekindled the idea of allowing video submissions. These could be actual videos of gameplay, or video reviews, or commercials… but before we work all that out, we have to work out how feasible it is from a technical standpoint. Which gives me yet another excuse to toy with video :-)

Before we posed our own internal technical questions, I decided to try to figure out how YouTube was encoding their videos, so that we could have a baseline to work from in terms of user expectations. The viewing public seems to accept YouTube as generally “watchable”. If MobyGames were to implement video, it would have to be at least as decent as YouTube.

So we’ve got a few questions to answer here: What video and audio codecs is YouTube using? What parameters (approximately) is YouTube using to encode video?

Thankfully for me, my friend Mike Melanson has already answered most of these questions: It’s Flash video, using Sorensen Spark (h.263 with some limitations) for video and MP3 for audio. To determine the encoding parameters, I grabbed a few .flvs from YouTube and ran “ffmpeg -i myvideo.flv” to see what ffmpeg could glean from the file format, and it identified the audio stream as 22050Hz audio, 1 channel (mono), at 56kbps. But it couldn’t identify the video stream other than to tell me it was indeed using flv’s limited h.263. I asked Mike how he thought it was being encoded, and also what best practices were.

“No idea; I’m not really that much of an encoder guy.” Luckily, I am.

The first thing I wanted to determine was how the bitrate control was. Was it quality-based or bandwidth-based? I encoded two test videos, a simple one (few changes per frame) and a complex one (many changes per frame), both one minute in length. (By the way, I noticed a few people other than Mike and myself doing this, with names from the innocent “video quality test” to the unabashedly-named “ffmpeg FLV encoding raw audio mux test”.) After letting YouTube encode them, I sucked the end results back and immediately it was apparent that the encoder was bandwidth-based because both file sizes were identical. If it were quality-based (“constant quantization” for the encoding nerds out there), the simple file would have been substantially smaller.

After figuring this out, of course I had to determine what that constant bitrate setting was. After some manual binary partitioning trials, I narrowed it down to 240kb/s. Matching youtube’s 56kbps mp3 audio and encoding my originals, a setting of 240kb/s for video resulted in nearly the same filesize as the encoded YouTube .flvs.

I was happy with this until I actually viewed my encoding results, which looked substantially worse than YouTube’s files. It took a few more viewings of the “simple” example before I noticed that the YouTube video had a very long GOP (group of pictures). Meaning, a very long time went by before the entire frame was repainted with a keyframe. In my trials, my keyframe was changing every 15 frames; that’s twice a second @ 30fps. Since intraframes (keyframes) are much larger than interframes (frames where only the differences are stored), this was eating up my bandwidth, and visual quality suffered. I was able to determine YouTube’s intraframe interval by staring at a static section of the “simple” file, hitting a stopwatch when it changed, then hitting the stopwatch again when it changed again. I measured 8 seconds, which is 240 frames @ 30fps. Setting the GOP to an interval of 240 frames, my encoded files now matched YouTubes’ results. For the video encoding nerds out there, I can even see some of the same DCTs being used in the same places :-)

If you want to reproduce YouTube encoding yourself, possibly to see how your encoded video will look on YouTube without waiting for the upload+processing, grab yourself a command-line version of FFMPEG and use this syntax:

ffmpeg -i (inputfilename.whatever) -s 320x240 -b 240kb -ab 56kb -ar 22050 -ac 1 -g 240 (outputfilename.flv)

I’m sure it’s not 100% complete, but it certainly gets you 98% of the way there.

Will MobyGames improve on this? Most likely. For one thing, I didn’t agree with the decision of limiting everything to mono sound. For the typical YouTube talking head over laptop speakers, it makes perfect sense, but for stereo video game music or positional sound effects, it doesn’t. I was also not happy with the video quality at 240kb/s because, for fast-moving sources like FPS games, everything becomes a blocky mess. We will also probably come up with some convention (read: Encode It Yourself Dammit) for frame sizes over 320×240, like games that run 640×480 and are simple (ie. have very little motion or changes between frames). One of the “obvious” optimizations that turns out to be not so obvious is 2-pass encoding to spread out the bits more optimally, but this was only a benefit (in my tests) for complex sources. It actually made simple/static sources more blocky in the static parts.

After all this discovery, I was bothered by something. Check YouTube’s parameters again:

  • 240 lines (framesize is 320×240)
  • 240 kbps (video bitrate)
  • 240 frames in a GOP

I’m a big fan of Occam’s Razor, but I was depressed to apply the Razor here because, after doing so, it sure seems like the YouTube guys were taking blind stabs at encoding parameters (ie. “Hey, let’s set everything to 240 and see what happens!”).

14 Responses to “Using A Sharper Stick”

  1. a said

    .flv supports VP6 video.

  2. Trixter said

    Yes, but ffmpeg (via libavcodec) can’t encode to vp6. Yet, anyway. We need to use an automated process — we’re not going to be loading all the videos into premiere and encoding them by hand…

  3. […] how YouTube works its magic so that MobyGames may one day take user’s video submissions, Trixter has been doing his own poking at the video giant to determine what parameters the site uses to transcode user videos. What he […]

  4. Isn’t there at least a little coincidence at work here? 320×240 is a very common video resolution. 300 kbps is a common bitrate for streaming multimedia, and splitting 240/60 between video and audio is not unusual.

  5. Trixter said

    320×240 is the common part, but 240kbps for video and 240 frames in a GOP?

    I wouldn’t call 300 common; it’s only “common” now because of YouTube. I remember when RealVideo was starting to target broadband, their templates were things like 112kbps and 224kbps (multiples of ISDN’s bitrate).

  6. I’m coming from a perspective of Windows Media music videos. Common rates included 56k, 128k, 300k, 500k, 700k, 1500k. I have a large collection of 300k videos which were the best offered on certain sites.

  7. astrange said

    I have this mencoder command left over from experimenting at beating YouTube:

    mencoder -oac pcm -ovc lavc -vf scale=320:240:::::::1 -sws 7 -o flv.avi -msglevel mencoder=1 -lavcopts vcodec=flv:mbd=2:v4mv:vrc_eq=\(\tex+100000000\*mcVar\)\^qComp:scplx_mask=0.1:naq:cmp=266:subcmp=2:dia=-1:trell:cbp:mv0:mv0_threshold=0:qprd:sc_factor=2:vbitrate=250:vpass=1

    mencoder -oac pcm -ovc lavc -vf scale=320:240:::::::1 -sws 7 -o flv.avi -msglevel mencoder=1 -lavcopts vcodec=flv:mbd=2:v4mv:vrc_eq=\(\tex+100000000\*mcVar\)\^qComp:scplx_mask=0.1:naq:cmp=266:subcmp=2:dia=-1:trell:cbp:mv0:mv0_threshold=0:qprd:sc_factor=2:vbitrate=250:vpass=2

    It’s probably translatable to ffmpeg. Quality should be much better. (this is targeted against blockiness artifacts, they really should have used h264 or something with a loop filter)

  8. Jeremy said

    Don’t know if you saw this but last week YouTube announced upcoming support for higher resolutions. Check the link in my name for the article. Then again, they were supposed to have their entire catalog encoded in H.264 to support Apple products by this Fall, which hasn’t happened, so who knows.

    In any case, nice detective work. Totally agree on the stereo issue.

  9. Trixter said

    astrange: Most of it translates; since posting the article, I’ve found that this exceeds (both visual quality and PSNR) standard youtube:

    ffmpeg -i $1.avi -psnr -cmp 3 -subcmp 3 -mbd 2 -flags aic+cbp+mv0+mv4+trell -b 248kb -ab 56kb -ar 22050 -ac 1 -g 240 -pass 1 -y $1_2pass.flv

    (run again with -pass 2 of course). The slight increase to video bitrate (240 to 248) is to compensate for the conservative bit bucket when using 2-pass mode.

    I’m not totally on the “anti-blocking” bandwagon since in some cases it noticeably smooths the video out. When trying to represent video game footage at 320×240, you need as many details as you can get, quantization noise or not :-)

    It’s important to note that my post wasn’t necessarily HACKING youtube — for that, just generate any .flv you want and hex-edit the end result’s duration such that the new “duration” makes it look like the file has a total bitrate of 350kbps or lower. YouTube will accept any such file without complaints. It’s not much of a hack, though, because they will still stream at 350kbps so your 720p trailer will take a few minutes to start playing :)

  10. astrange said

    Right, I didn’t try to oversmooth either. That uses NSSEW (-cmp 10/266) which tries to keep noise; apparently it thinks it’s making smooth video when it introduces blocking otherwise. It might be kind of slow, though. (but at least -qns didn’t turn out to help, that’s REALLY slow)

  11. - said

    Flash 9 r115 is out already, supporting h.264 and HE-AAC (in mp4 files), which would dramatically the increase quality (and compatibility, for that matter, since .flv is… well, .flv) of the files.

  12. scharfis_brain said

    Fortunately now there is a method to hack youtube to show
    – high quality (>350kbps AVG) short videos
    or
    – low quality looong (> 11 minutes runtime) videos.

    have a look on:
    http://forum.videohelp.com/topic336882.html

    I made some test and was pretty astonished:

    1) A capture of a friend playing some tetris:
    320×240 @ 1400kbps with 60fps
    44100 Hz Stereo MP3 @128 kbps

    2) Christina Aguilera crying my ears to ear-cancer (sorry, hadn’t a better sample at hand the day I discovered the method):
    480×380 @ 800kbps with 30fps
    44100 Hz Stereo MP3 @128 kbps

    So you actually CAN do high quality with Youtube, but it is very uncomfortable to do.
    But the good thing is: you can do everything on your own without letting youtube fucking up the video.

  13. Trixter said

    Hi Scharfis!!

    While you can indeed do that, there is a drawback: YouTube’s servers will still only stream to you at their regular rate of 350kbps. So you have to pause the video yourself for 2 minutes or more in order to get much of it, or else it will pause repeatedly during playback. But it is indeed a good hack, especially for times when you need 60fps video and it will fit within the constraints of the bandwidth.

  14. Thad The Fly said

    brilliant site, and brilliant article. trixter you are the man!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.