macOS screen updating, 2017 edition
February 2, 2017
TL;DR: Retina iMac (4k/5k) owners can greatly improve the graphics performance of many applications (including REAPER) by setting the color profile (in System Preferences, Displays, Color tab) to "Generic RGB" or "Adobe RGB." (and restarting REAPER and/or other applications being tested)
February 2, 2017
I previously wrote in mid-2014 about the state of blitting bitmaps to screen on modern OS X (now macOS) versions. Since then, Apple has released new hardware (including Retina iMacs) and a couple of new macOS versions.
Much of that article is still useful today, but I made a mistake in the second update:
-
OK, if you provide a bitmap that is twice the size of the drawing rect, you can avoid argb32_image_mark_RGBXX, and get the Retina display to update in about 5-7ms, which is a good improvement (but by no means impressive, given how powerful this machine is). I made a very simple software scaler (that turns each pixel into 4), and it uses very little CPU.
I discovered this because I've been researching how to improve REAPER's graphic performance on the iMac 5k in particular, so I started benchmarking. This time around, I figured I should measure how many screen pixels are updated and divide that by how long it takes. Some results, based on my memory (I'm not going to rerun them for this article, laziness).
Initial version (REAPER 5.32 state, using the retina hack described above, public WDL as of today):
- old C2D iMac, 10.6: 350MPix/sec
- mid-2012 RMBP 15", 10.12, Thunderbolt display (non-retina): 1500MPix/sec
- mid-2012 RMBP 15", 10.12, built-in display (retina): 800MPix/sec
- late-2015 Retina iMac 5k, 10.12: 192MPix/sec
After I realized the hack above was actually doing more work (thank you, Xcode instrumentation), I did some more experiments, avoiding the hack, and found that in the newer SDKs there are kCGImageByteOrderXYZ flags (I don't believe it was in previous SDKs), and found that these alised to KCGBitmapByteOrderXYZ, and that when using kCGBitmapByteOrder32Host with the pixel format for CGImageCreate()/etc, it would speed things up. With retina hack removed:
- mid-2012 RMBP 15", 10.12, built-in display (retina): 300MPix/sec
- late-2015 Retina iMac 5k, 10.12: 152MPix/sec
- old C2D iMac, 10.6: 350MPix/sec
- mid-2012 RMBP 15", 10.12, Thunderbolt display (non-retina): 1500MPix/sec
- mid-2012 RMBP 15", 10.12, built-in display (retina): 720MPix/sec
- late-2015 Retina iMac 5k, 10.12: 200MPix/sec
From profiling and looking at the code, this blit performance could easily be improved by Apple -- the inner loop where most time is being spent does a lot more than it needs to. Come on Apple, make us happy. Details offered on request.
Of course, this really doesn't do anything for the iMac 5k -- 200MPix/sec is *TERRIBLE*. The full screen is 15 megapixels, so at most that gets you around 13fps, and that's at 100% CPU use. After some more profiling, I found that the function chewing the most CPU ended in "64". Then it hit me -- was this display running in 16 bits per channel? A quick google search later, it was clear: the Retina iMacs have 10-bit displays, and you can run them in 10 bits per channel, which means 64 bits per pixel. macOS is converting all of our pixels to 64 bits per pixel (I should also mention that it seems to be doing a very slow job of it). Luckily, changing the color profile (in system preferences, displays) to "Generic RGB" or similar disables this, and it gets the ~800MPix/sec level of performance similar to the RMBP, which is at least tolerable.
Sorry for the long wordy mess above, I'm posting it here so that google finds it and anybody looking into why their software is slow on macOS 10.11 or 10.12 on retina imacs have some explanation.
Also please please please Apple optimize CGContextDrawImage()! I'm drawing an image with no alpha channel and no interpolation and no blend mode and the inner loop is checking each pixel to see if the alpha is 255? I mean wtf. You can do better. Hell, you've done way better. All that "new" Retina code needs optimizing!
Update a few hours later:
Fixing various issues with the updated byte-ordering, CoreText produces quite different output for CGBitmapContexts created with different byte orderings:
Hmph! Not sure which one is "correct" there... hmm... If you use kCGImageAlphaPremultipliedFirst for the CGBitmapContext rather than kCGImageAlphaNoneFirst, then it looks closer to the original, maybe. ?
Also other caveat: NSBitmapImageRep can't seem to deal with the ARGB format either, so if you use that you need to manually bswap the pixels...
Update (2019): SolvedWorked around most of this issue by using Metal, read here.
Posted by a on Fri 03 Feb 2017 at 02:41 from 73.158.230.x
Posted by Justin on Fri 03 Feb 2017 at 12:16 from 74.72.45.x
Posted by Gus Mueller on Fri 03 Feb 2017 at 13:31 from 72.130.81.x
Posted by Justin on Fri 03 Feb 2017 at 14:41 from 74.72.45.x
Add comment: