Wow, some pretty amazing contributions in this thread. I've been away from this subject for a while, and it's so nice to everyone keeping this discussion thread alive.
@JoergZdarsky - Unity 5.6 is out, and check this:
In 5.6 we now support Procedural Instancing, where instance data is supplied via a custom source in the Shader, rather than from Material Property Blocks and Support for DrawMeshInstancedIndirect, where draw arguments are supplied from a ComputeBuffer. This new way of rendering instances via script has almost no CPU overhead, resulting in a massive performance boost, assuming the CPU is the limiting factor for your framerate. -- source
That sounds like exactly what we need to solve the batching issue.
And the developments on Asynchronous data transfer (GPU->CPU) in this thread excite me even more! (You're already aware of this, but I figured everyone else might benefit from the link being re-posted)
With these two pieces of the puzzle solved, it might be time for me to revisit this subject. I've recently started writing a number of simple games (one-a-week in scope) to improve my Unity chops, and have come to appreciate more of what Unity can do despite the limitations it places upon one's code architecture. Your recent move to handle most of the scene graph outside of Unity, and simply use Unity as the rendering "front-end" is a great solution, and likely worth the time cost of setting it up.
Anyhow, that's all I have to share for now, other than to say hello again to my fellow devs and hope life is treating you all well!
-Navy (although "Civilian" is a more accurate title now...)