Basically, what we're doing is this:
First, in RGSS, we create our target viewport, and a bitmap set to its dimensions. We pass the dimensions, and a pointer to the first byte of the bitmap's data, to our dll.
In the dll, initially, we create an FBO (via extension, to allow support for pre-3.0). The FBO is sized to the target viewport dimensions. Rendering directly to the bitmap data causes OpenGL to bypass hardware and use a software rasterization (a DirectDraw wrapper). Ewwwwww.
So instead we render to this offscreen FBO, gaining all the perks of hardware acceleration. All that's left is to get the pixels on the screen - so we take that bitmap pointer, and after figuring out the memory structure of an RM bitmap, we use glReadPixels to copy the FBO directly into the target bitmap, with no additional copies or conversions. This was the tricky part, and is mostly thanks to Glitchfinder for wrapping his head around the Ruby/RGSS bitmap data structure in RAM so that it could be written to directly from OpenGL.
So yes, it does technically pause the card and read pixels, but this only happens exactly once per frame, and is lightning fast compared to the software rendering (DirectShow) of vanilla.
Hell, it runs quite playably on my laptop which can't even handle the vanilla tilemap, heh.
Edited for clarity.