|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
CUDA benchmarkingHi Evan,
I've been going through the CUDA docs, and from the looks of it there are no technical obstacles to creating a chain of K-3D plugins that keep the main part of their data (i.e. a mesh, bitmap, ...) in GPU memory throughout the pipeline. I especially like the fact that CUDA can directly access OpenGL buffer objects, which means the result of the pipeline calculation can be displayed directly. Before going ahead and start designing this system, though, we should do some more benchmarking on the existing module. I propose splitting the bitmap_add_entry method in 3 functions: 1. Copy from host memory to device memory 2. Do the calculation on the device 3. Copy from device to host We can then use the timer from k3dsdk/high_res_timer.h to determine the overhead of the memory operations, and thus the potential gains of directly connecting plugins with GPU data. Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingBart Janssens wrote:
> I've been going through the CUDA docs, and from the looks of it there > are no technical obstacles to creating a chain of K-3D plugins that > keep the main part of their data (i.e. a mesh, bitmap, ...) in GPU > memory throughout the pipeline. I especially like the fact that CUDA > can directly access OpenGL buffer objects, which means the result of > the pipeline calculation can be displayed directly. > Forgot to mention my skepticism on this issue earlier. When a call to iproperty::property_value() returns, its result must be completely up-to-date, otherwise the caller will fail. Thus, a CUDA plugin will still have to marshal data back to the host whenever the pipeline executes. FWIW, I didn't really expect much of an improvement in image-processing speed - just like our experiments in threading, we're going to have to tackle larger problems before we see real improvements in performance. Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn Fri, May 2, 2008 at 5:43 AM, Timothy M. Shead <tshead@...> wrote:
> Forgot to mention my skepticism on this issue earlier. When a call to > iproperty::property_value() returns, its result must be completely > up-to-date, otherwise the caller will fail. Thus, a CUDA plugin will > still have to marshal data back to the host whenever the pipeline executes. Well, if the benchmarks confirm it's worth it, I think we should have a system whith properties that keep data on the host that would work as follows: - Connection between CUDA plugins: Pass data purely in device memory - Connection to a non-CUDA pugin: copy data to host > FWIW, I didn't really expect much of an improvement in image-processing > speed - just like our experiments in threading, we're going to have to > tackle larger problems before we see real improvements in performance. I think this depends on the length of the pipeline. From what I understand, device memory is a lot faster than host memory. For a chain of mesh modifiers, if they are all implemented in CUDA, the mesh data would never leave the device memory, since the result can be dumped direcly into a VBO for OpenGL rendering. Only control data, such as component selection and transformation matrices would have to be transfered. Either way, we'll have to see what the benchmarking says. If memory transfers are only 1% of the time spent, it's not worth the trouble :) Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingHi Bart and Tim
I think the splitting of the entry function is a good idea, both in terms of accurate benchmarking as well as extensibility and reducing code duplication. Regarding keeping the data in device memory, there is definitely merit in this and also the speedup will be increasing in the length of the device-only pipeline. As Tim mentioned, however, care must be taken to ensure that host data is available when required although using different connection types may give us a means to handle this. As mentioned in my earlier mail, I will be back online tomorrow and then i will make sure to catch up the lost hours :) Cheers Evan On 5/2/08, Bart Janssens <bart.janssens@...> wrote: > On Fri, May 2, 2008 at 5:43 AM, Timothy M. Shead <tshead@...> wrote: > > Forgot to mention my skepticism on this issue earlier. When a call to > > iproperty::property_value() returns, its result must be completely > > up-to-date, otherwise the caller will fail. Thus, a CUDA plugin will > > still have to marshal data back to the host whenever the pipeline > executes. > > Well, if the benchmarks confirm it's worth it, I think we should have > a system whith properties that keep data on the host that would work > as follows: > - Connection between CUDA plugins: Pass data purely in device memory > - Connection to a non-CUDA pugin: copy data to host > > > FWIW, I didn't really expect much of an improvement in image-processing > > speed - just like our experiments in threading, we're going to have to > > tackle larger problems before we see real improvements in performance. > > I think this depends on the length of the pipeline. From what I > understand, device memory is a lot faster than host memory. For a > chain of mesh modifiers, if they are all implemented in CUDA, the mesh > data would never leave the device memory, since the result can be > dumped direcly into a VBO for OpenGL rendering. Only control data, > such as component selection and transformation matrices would have to > be transfered. > > Either way, we'll have to see what the benchmarking says. If memory > transfers are only 1% of the time spent, it's not worth the trouble :) > > Cheers, > > -- > Bart > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > K3d-development mailing list > K3d-development@... > https://lists.sourceforge.net/lists/listinfo/k3d-development > -- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingBart Janssens wrote:
> On Fri, May 2, 2008 at 5:43 AM, Timothy M. Shead <tshead@...> wrote: > >> Forgot to mention my skepticism on this issue earlier. When a call to >> iproperty::property_value() returns, its result must be completely >> up-to-date, otherwise the caller will fail. Thus, a CUDA plugin will >> still have to marshal data back to the host whenever the pipeline executes. >> > > Well, if the benchmarks confirm it's worth it, I think we should have > a system whith properties that keep data on the host that would work > as follows: > - Connection between CUDA plugins: Pass data purely in device memory > - Connection to a non-CUDA pugin: copy data to host > value of a property. When the value of the property is read, the caller must receive an up-to-date value for that property. I'm just emphasizing that, regardless of any work you guys do with out-of-band transfers, that's going to be a hard requirement. Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingHi folks!
I'd like to suggest a workaround for the problem about moving things from or to the GPU memory. My idea is dividing in two types of plugins: 1- Modifiers of the data 2- Evaluators of the data The "data" is the definition of a GPU pipeline operation. The same way that a 3D mesh is data for the 3D modifiers. This way the user could define a GPU pipeline (data) with a pipeline of K-3D modifiers. And at any point of the pipeline we could connect that data to a "GPUPipelineEvaluator" plugin that gives the output through a property. What about this idea? Cheers! Joaquín ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn 5/3/08, Joaquín Duo <hoakoduo@...> wrote:
> This way the user could define a GPU pipeline (data) with a pipeline of K-3D > modifiers. And at any point of the pipeline we could connect that data to a > "GPUPipelineEvaluator" plugin that gives the output through a property. > What about this idea? It's a good idea! Having a CUDAtoK3D or K3DtoCUDA node whenver a conversion is required would not require any change to the SDK. However, we should probably wait for some benchmarking results before we fix the design. Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingHi
Yes ... I think that benchmarks are the first order of business :) Oh ... I also noticed that on the CUDA website there is a doc about writing CUDA image-processing plugins for photoshop with a number of samples. I will have a look at them when I have some spare time - still working on the hardware problems, but I do have my internet connection back - so all is right with the world again, or will be once I get the backlog sorted out. With regards to the benchmarking and splitting the process of copying the data to the GPU and actually calling the kernel - what would be the best wat to handle the pointer to the data on the device memory - if we aim to chain a number of plugins then this should be available to pass to plugins down the chain? Thanks Evan On Sat, May 3, 2008 at 11:49 PM, Bart Janssens <bart.janssens@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingBart Janssens wrote:
> On 5/3/08, Joaquín Duo <hoakoduo@...> wrote: > > This way the user could define a GPU pipeline (data) with a pipeline of K-3D > >> modifiers. And at any point of the pipeline we could connect that data to a >> "GPUPipelineEvaluator" plugin that gives the output through a property. >> What about this idea? >> > > It's a good idea! Having a CUDAtoK3D or K3DtoCUDA node whenver a > conversion is required would not require any change to the SDK. > However, we should probably wait for some benchmarking results before > we fix the design. > pipeline to work normally is a clever one, albeit with a significant downside: it requires the user to explicitly create a "CUDA-aware" pipeline. Each of the three alternative approaches has its pros-and-cons. Let's ensure that we have some significant examples to benchmark before making recommendations. At a minimum we need to see some mesh computations, both embarassingly-parallel and not. Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingHi
I have split the various steps in the GPU-based calculation process and have up until now simply printed the measured times for the various stages out to the screen using k3d::log(). What are my options with respect to gaining access to those times from a regression test? I could define node properties, but this does not strike me as the best solution. Thanks in advance Evan On Mon, May 5, 2008 at 4:32 AM, Timothy M. Shead <tshead@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn 5/5/08, Evan Lezar <evanlezar@...> wrote:
> What are my options with respect to gaining access to those times from a > regression test? I could define node properties, but this does not strike > me as the best solution. The timer is available in python, but obviously can only be called between methods. See tests/ngui.performance.sds.polycube.py. I see two options here: 1. Split the node in 3: K3DBitmapToCudaBitmap, CUDABitmapAdd and CUDABitmapToK3DBitmap. The properties connecting these 3 nodes would contain a CUDA pointer. When constructing the pipeline in a script, you should mke sure you read the output, so you can time each step. 2. Implement the k3d::icommand_node interface in your current node, exposing a text command for each step you want to benchmark. These commands can be called from python, with timer evaluations between each of them. I'd choose 1 if preliminary testing using the k3d::log method indicates that memory transfer time is significant. Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingBart
As you mentioned earlier, the hardware version does not work under linux - but does under windows. I have a feeling that this may be due to the differences in cutil.h for the different platforms - and more specifically the modules are being linked. I am just about to submit some code that displays some timing information for both the CUDA and regular implementation of the BitmapAdd plugin in the INFO log - just as a start. I have to get to sleep now, but I will continue the investigation tomorrow afternoon. I have also updated my wiki page a little. Evan On Mon, May 5, 2008 at 10:24 PM, Bart Janssens <bart.janssens@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn 5/6/08, Evan Lezar <evanlezar@...> wrote:
> As you mentioned earlier, the hardware version does not work under linux - > but does under windows. I have a feeling that this may be due to the > differences in cutil.h for the different platforms - and more specifically > the modules are being linked. Yes, I checked my linux config, but it is fine: the Nvidia SDK examples run perfectly. Maybe all that is needed is to compile the complete module with nvcc? Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingGreat news!
The bitmap add plugin now works in hardware mode under linux. Just have a meeting now, but will submit the changes later tonight along with some benchmark results. Evan On 5/3/08, Joaquín Duo <hoakoduo@...> wrote: > Hi folks! > > I'd like to suggest a workaround for the problem about moving things from or > to > the GPU memory. > > My idea is dividing in two types of plugins: > 1- Modifiers of the data > 2- Evaluators of the data > > The "data" is the definition of a GPU pipeline operation. The same way that > a 3D > mesh is data for the 3D modifiers. > > This way the user could define a GPU pipeline (data) with a pipeline of K-3D > modifiers. And at any point of the pipeline we could connect that data to a > "GPUPipelineEvaluator" plugin that gives the output through a property. > What about this idea? > > Cheers! > Joaquín > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > K3d-development mailing list > K3d-development@... > https://lists.sourceforge.net/lists/listinfo/k3d-development > -- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingHi Evan!
I checked your benchmarks, very nice report. :-) In page 2 on the graphic "BitmapAdd and CUDABitmapAdd Execution Time" Labels say "BitmapAdd" and "Total Time". Shouldn't it be CudaBitmapAdd time and BitmapAdd time in that order? Cheers! Joaquín ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingJoaquín
Yes, it should ... OpenOffice wasn't cooperating and it was late :) I will give it another bash later today and update the doc. Evan On Fri, May 9, 2008 at 12:06 PM, Joaquín Duo <hoakoduo@...> wrote: Hi Evan! -- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingby |