I love this kind of thing! That’s a good update!
Well, well. Seems like it’s Christmas come early! Or late. Either way, it’s Christmas come now!
There are no words in any human tongue that can adequately describe my excitement.
Amazing info and amazing interestingly written. Nice!
Wow. Kinda like a KSP blog. FUN!
And looking forward to that ED block … don’t forget to copy -paste that last sentence to the front of that
In addition, the part that we can multithread ( applying the particle behaviors ) isn’t currently a bottleneck. Writing the particles data to GPU buffers is.
Will DirectX 12 help with this?
Great blog, thanks @INovaeFlavien, hope you can keep up the updates for those 2-3 months.
On a different note, a quick search on Google for Unity3D finds some discussions saying that their particle system can handle ~150k particles CPU side without a significant FPS drop and millions of particles using DX11 GPU particles. Your numbers are somewhat on the low scale, ~20k, have you pushed it higher to see when it really starts hitting performance?
Our implementation is already pretty optimized ( other than multithreading ), so most of the overhead comes from fillrate. The shaders already handle the camera alignment, and we’re using instancing for the rendering calls.
I just did a test and pushed a test particle system to 100K particles, and it rendered at around 90 fps, provided that I reduced the fillrate to nothing ( by reducing the particles sizes to avoid overdraw ).
I don’t really see how a 150K particle system could not have a significant fps drop, unless “significant” has a different meaning to me. If you had a scene rendering at 100 fps, and you added a 150K particle system and it went down to 90 fps, then that could be considered an insignificant cost. But that’d mean it’d take 1.11 ms to render that particle system, meaning that you could render 135 millions particles per second on the CPU ( 150000 * 1000 / 1.11 ). So those numbers aren’t coherent; a 150K particle system most definitely would have a significant cost on the CPU, no matter what your implementation is ^^
Unless you start to consider things such as splitting your 150K particles into hundreds of particle systems and culling away the ones that aren’t seen. Of course we do that too, but that’s not the kind of performance test I’ve been doing.
Yes, this is precisely the problem Mantle and DX12 were designed to solve (among a few others).
Awesome blog! Hope to see more in the future!
Keep it up, guys.
I don’t know the details, maybe the guys are running some wild hardware. I just made a particle emitter in Unity that spawns 10k particles per seconds, those particles seem to last around 10-20 seconds and the scene is running at ~400fps. What I find strange is that Unity stats windows is reporting only 2 draw calls and only some 34k tris in the scene.
I’m sure you have checked your results to at least UE and maybe others, so if you think 20k particles is ok, I suppose it is.
It’s 20K in my test with big particles overdraw. It’s 100K per frame at 85 fps ( hence 8.5 millions particles per second ) in a stress test with no fillrate.
I can’t test Unity atm, but as I was curious I replicated the test in UE4. It’s a bit trickier since UE4 seems to have a limit per emitter, so I couldn’t recreate one big emitter. Instead I created 280 emitters, each spawning 515 particles ( UE4’s limit ), and it ran at 30 fps. That’s about 4.3 millions particles per second.
So as a rough test, it seems out particle renderer is twice faster than UE4’s. Although UE4 is probably a bit more flexible / has more functionality than ours.
So I don’t think Unity will be much faster than these numbers.
It probably has some limit per emitter like UE4 too. 34K tris would mean it’s rendering 17K particles per frame, at 400 fps that’s 6.8 millions particles per second. That one is not a fair comparison though since it’s not the same machine ( mine vs yours ), but it should more or less prove that our engine is at least on the same performance level than Unity and UE4.
By curiosity, what are your machine’s specs ?
Thing is that by upping the emitter rate the FPS does go down and the other way around…
I’m running nothing fancy, a i7 4770k with a GTX 660 that is up for replacement.
I have an i5 4670k ( so slightly slower than yours ) and a Radeon 7970 ( slightly faster ):
http://cpuboss.com/cpus/Intel-Core-i7-4770K-vs-Intel-Core-i5-4670K
http://gpuboss.com/gpus/Radeon-HD-7970-vs-GeForce-GTX-660
So our systems should have comparable performance.
Out of curiosity have you guys pushed everything to the limit to the point where the game becomes a slide show?
Yes, with everything on maximum detail I believe there are parts of our KS video that (currently) run at 35fps on @INovaeGene’s dual gtx 980 setup if I’m not mistaken :p. That qualifies as a slideshow for 99% of everybody else ;).
Any chance of a sneak peak of said scene?
Absolutely none
Not even an MS paint interpretation? It was worth a shot