CUDA Subdivide Edges

View: New views
3 Messages — Rating Filter:   Alert me  

CUDA Subdivide Edges

by Evan Lezar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

I have just commited a version of the CUDASubdivideEdges plugin that uses a device implementation of create_edge_adjacency_lookup.  The only remaining functionality that is not handled by the GPU is validate_polyhedra and merge_selection.

We can clearly see from the performance, that although serial code can run on the GPU, it does not run very fast - especially if the behaviour due to the depth of nested loops is to be avoided.  It may be possible to improve the performance of the edge_index_calculator somewhat by moving the loop over the edges in a loop back into the kernel.

In the implementation of the edge_adjacency lookup, I had tried a slightly more parallel version of the calculation of first_edges, but I switched to the serial implementation to start with as the other one was giving some hassles.

I think my next step will be to convert a mesh source (MeshArray) so that I can gauge the performance there and then have a look at porting validate_polyhedra as well as merge selection so that I can test a simple device-only pipeline as Bart requested.

I know that the deadline is coming ever closer - and I do still need to get some of the documentation done, but I think I will leave that for this weekend.

Evan

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
K3d-development mailing list
K3d-development@...
https://lists.sourceforge.net/lists/listinfo/k3d-development

Re: CUDA Subdivide Edges

by bART Janssens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 12 August 2008 17:49:29 Evan Lezar wrote:

> We can clearly see from the performance, that although serial code can run
> on the GPU, it does not run very fast - especially if the behaviour due to
> the depth of nested loops is to be avoided.  It may be possible to improve
> the performance of the edge_index_calculator somewhat by moving the loop
> over the edges in a loop back into the kernel.
>
> In the implementation of the edge_adjacency lookup, I had tried a slightly
> more parallel version of the calculation of first_edges, but I switched to
> the serial implementation to start with as the other one was giving some
> hassles.

Hi Evan,

Perhaps rather than a head-on serial run, it would be better to attempt to get
a correct execution order using __syncthreads(). For the index_map in
edge_index_calculator, for example, each thread could keep a local count of
split edges, which could be offset by the counts obrained from the other
threads after a __syncthreads() call.
For the midpoint indices, you could store the thread number with each midpoint
found, and then process the mid point index array a second time and only keep
those associated with the lowest thread numbers, and add the associated edges
to the edge_list. This should avoid having separate midpoints for an edge and
its companion.

> I think my next step will be to convert a mesh source (MeshArray) so that I
> can gauge the performance there and then have a look at porting
> validate_polyhedra as well as merge selection so that I can test a simple
> device-only pipeline as Bart requested.

OK, as mentioned in the other thread this should be PolyGrid ;)

Cheers,

Bart


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
K3d-development mailing list
K3d-development@...
https://lists.sourceforge.net/lists/listinfo/k3d-development

Re: CUDA Subdivide Edges

by Evan Lezar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Thu, Aug 14, 2008 at 1:47 AM, Bart Janssens <bart.janssens@...> wrote:
On Tuesday 12 August 2008 17:49:29 Evan Lezar wrote:
> We can clearly see from the performance, that although serial code can run
> on the GPU, it does not run very fast - especially if the behaviour due to
> the depth of nested loops is to be avoided.  It may be possible to improve
> the performance of the edge_index_calculator somewhat by moving the loop
> over the edges in a loop back into the kernel.
>
> In the implementation of the edge_adjacency lookup, I had tried a slightly
> more parallel version of the calculation of first_edges, but I switched to
> the serial implementation to start with as the other one was giving some
> hassles.

Hi Evan,

Perhaps rather than a head-on serial run, it would be better to attempt to get
a correct execution order using __syncthreads(). For the index_map in
edge_index_calculator, for example, each thread could keep a local count of
split edges, which could be offset by the counts obrained from the other
threads after a __syncthreads() call.
For the midpoint indices, you could store the thread number with each midpoint
found, and then process the mid point index array a second time and only keep
those associated with the lowest thread numbers, and add the associated edges
to the edge_list. This should avoid having separate midpoints for an edge and
its companion.

The problem with __syncthreads() is that is only synchronizes threads in a block - but I will have a look at the code again to see if I can get some prallelism from it.
 

> I think my next step will be to convert a mesh source (MeshArray) so that I
> can gauge the performance there and then have a look at porting
> validate_polyhedra as well as merge selection so that I can test a simple
> device-only pipeline as Bart requested.

OK, as mentioned in the other thread this should be PolyGrid ;)

Yes - as I mentioned in the other thread, it had been a rough day (week :) )


Thanks
Evan
 

Cheers,

Bart


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
K3d-development mailing list
K3d-development@...
https://lists.sourceforge.net/lists/listinfo/k3d-development


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
K3d-development mailing list
K3d-development@...
https://lists.sourceforge.net/lists/listinfo/k3d-development
LightInTheBox - Buy quality products at wholesale price!