|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
CUDA Subdivide EdgesHi
I have just commited a version of the CUDASubdivideEdges plugin that uses a device implementation of create_edge_adjacency_lookup. The only remaining functionality that is not handled by the GPU is validate_polyhedra and merge_selection. We can clearly see from the performance, that although serial code can run on the GPU, it does not run very fast - especially if the behaviour due to the depth of nested loops is to be avoided. It may be possible to improve the performance of the edge_index_calculator somewhat by moving the loop over the edges in a loop back into the kernel. In the implementation of the edge_adjacency lookup, I had tried a slightly more parallel version of the calculation of first_edges, but I switched to the serial implementation to start with as the other one was giving some hassles. I think my next step will be to convert a mesh source (MeshArray) so that I can gauge the performance there and then have a look at porting validate_polyhedra as well as merge selection so that I can test a simple device-only pipeline as Bart requested. I know that the deadline is coming ever closer - and I do still need to get some of the documentation done, but I think I will leave that for this weekend. Evan ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA Subdivide EdgesOn Tuesday 12 August 2008 17:49:29 Evan Lezar wrote:
> We can clearly see from the performance, that although serial code can run > on the GPU, it does not run very fast - especially if the behaviour due to > the depth of nested loops is to be avoided. It may be possible to improve > the performance of the edge_index_calculator somewhat by moving the loop > over the edges in a loop back into the kernel. > > In the implementation of the edge_adjacency lookup, I had tried a slightly > more parallel version of the calculation of first_edges, but I switched to > the serial implementation to start with as the other one was giving some > hassles. Hi Evan, Perhaps rather than a head-on serial run, it would be better to attempt to get a correct execution order using __syncthreads(). For the index_map in edge_index_calculator, for example, each thread could keep a local count of split edges, which could be offset by the counts obrained from the other threads after a __syncthreads() call. For the midpoint indices, you could store the thread number with each midpoint found, and then process the mid point index array a second time and only keep those associated with the lowest thread numbers, and add the associated edges to the edge_list. This should avoid having separate midpoints for an edge and its companion. > I think my next step will be to convert a mesh source (MeshArray) so that I > can gauge the performance there and then have a look at porting > validate_polyhedra as well as merge selection so that I can test a simple > device-only pipeline as Bart requested. OK, as mentioned in the other thread this should be PolyGrid ;) Cheers, Bart ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA Subdivide EdgesOn Thu, Aug 14, 2008 at 1:47 AM, Bart Janssens <bart.janssens@...> wrote:
The problem with __syncthreads() is that is only synchronizes threads in a block - but I will have a look at the code again to see if I can get some prallelism from it.
Yes - as I mentioned in the other thread, it had been a rough day (week :) ) Thanks Evan
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
| Free Forum Powered by Nabble | Forum Help |