|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
Re: CUDA benchmarkingOn 5/10/08, Evan Lezar <evanlezar@...> wrote:
> The list is a good idea and a useful reminder of Knuth's words. Very > interested to see how the CUDA code stacks up to what intel can offer. I've added my thoughts. Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingHi there
I have automated the benchmarking somewhat for the CUDABitmapAdd plugin (and partially for CUDABitmapMultiply and CUDABitmapSubtract). The way I read the timing values out was using private data containers with script_property property policy in the classes. This method works, but I don't think it is the most elegant solution - in that these properties that have been added are now visible in the UI. What would be the right (or better) way of going about this? Setting the properties to read_only_properties would be better in that the user will not be able to set them, but they will still be visible in the UI. As yet I have not made any changes to the other bitmap modules (with respect to the benchmarking) and would like to sort this out before I do anything too major. Thanks Evan
On Sat, May 10, 2008 at 11:34 AM, Bart Janssens <bart.janssens@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn Sunday 11 May 2008 22:21:09 Evan Lezar wrote:
> I have automated the benchmarking somewhat for the CUDABitmapAdd plugin > (and partially for CUDABitmapMultiply and CUDABitmapSubtract). The way I Hi Evan, I saw your commits before this mail, and have already given this a test-drive. Great work, this will allow us to analyse the results with minimal effort! I have submitted a small patch that correctly synchronises the execution part so it can be timed. Attached are my before and after results. You will see that the execution time now scales by a factor of 4 for the larger data sets, which is as expected. The measured memory transfer overhead now varies from 85% for the 32x32 test to 71% for the 4096x4096 case. > read the timing values out was using private data containers with > script_property property policy in the classes. This method works, but I > don't think it is the most elegant solution - in that these properties that > have been added are now visible in the UI. What would be the right (or > better) way of going about this? Setting the properties to > read_only_properties would be better in that the user will not be able to > set them, but they will still be visible in the UI. I'd say just switch them to read-only properties for now. The benchmarking system works great this way, for our purposes. I'll add the non-emulated CUDA plugins to my dashboard build. Cheers, Bart Image Width, Image Height, BitmapAdd Time (s), CUDA host to device time (s), CUDA kernel time (s), CUDA device to host time (s) [8, 8, 4.0531158447265625e-06, 0.1429741382598877, 0.00016117095947265625, 2.7894973754882812e-05] [16, 16, 1.0013580322265625e-05, 2.9087066650390625e-05, 4.1961669921875e-05, 2.002716064453125e-05] [32, 32, 3.910064697265625e-05, 0.00022912025451660156, 4.38690185546875e-05, 2.7179718017578125e-05] [64, 64, 0.00015997886657714844, 0.00023317337036132812, 5.5074691772460938e-05, 5.9843063354492188e-05] [128, 128, 0.00060105323791503906, 0.0003490447998046875, 0.000125885009765625, 0.00019311904907226562] [256, 256, 0.0024042129516601562, 0.00096988677978515625, 0.00038409233093261719, 0.00065302848815917969] [512, 512, 0.0098111629486083984, 0.002838134765625, 0.0014460086822509766, 0.0020120143890380859] [1024, 1024, 0.039055824279785156, 0.0087108612060546875, 0.0057430267333984375, 0.0063650608062744141] [2048, 2048, 0.15646195411682129, 0.033174991607666016, 0.022634983062744141, 0.024331808090209961] [4096, 4096, 0.62535691261291504, 0.13156604766845703, 0.091360092163085938, 0.095906972885131836] Image Width, Image Height, BitmapSubtract Time (s), CUDA host to device time (s), CUDA kernel time (s), CUDA device to host time (s) [8, 8, 0.0, 0.14784002304077148, 0.000164031982421875, 2.7894973754882812e-05] [16, 16, 0.0, 3.2901763916015625e-05, 4.5061111450195312e-05, 1.9073486328125e-05] [32, 32, 0.0, 0.0002288818359375, 4.7206878662109375e-05, 2.6941299438476562e-05] [64, 64, 0.0, 0.00026202201843261719, 5.817413330078125e-05, 6.29425048828125e-05] [128, 128, 0.0, 0.00035691261291503906, 0.00013399124145507812, 0.00019812583923339844] [256, 256, 0.0, 0.00096082687377929688, 0.00037884712219238281, 0.00067019462585449219] [512, 512, 0.0, 0.0029101371765136719, 0.001461029052734375, 0.0020508766174316406] [1024, 1024, 0.0, 0.0091609954833984375, 0.0057461261749267578, 0.0069689750671386719] [2048, 2048, 0.0, 0.035613059997558594, 0.022637844085693359, 0.026031017303466797] [4096, 4096, 0.0, 0.13622903823852539, 0.091310024261474609, 0.1020960807800293] Image Width, Image Height, BitmapMultiply Time (s), CUDA host to device time (s), CUDA kernel time (s), CUDA device to host time (s) [8, 8, 0.0, 0.14816999435424805, 0.0001659393310546875, 2.8133392333984375e-05] [16, 16, 0.0, 3.0994415283203125e-05, 4.3153762817382812e-05, 1.9788742065429688e-05] [32, 32, 0.0, 0.00023293495178222656, 4.6968460083007812e-05, 2.6941299438476562e-05] [64, 64, 0.0, 0.00025701522827148438, 5.6028366088867188e-05, 6.103515625e-05] [128, 128, 0.0, 0.00041413307189941406, 0.00012493133544921875, 0.00019121170043945312] [256, 256, 0.0, 0.0011820793151855469, 0.00039005279541015625, 0.00064992904663085938] [512, 512, 0.0, 0.0029239654541015625, 0.0014410018920898438, 0.0028681755065917969] [1024, 1024, 0.0, 0.0095160007476806641, 0.0057530403137207031, 0.0069241523742675781] [2048, 2048, 0.0, 0.035025119781494141, 0.02261805534362793, 0.026360034942626953] [4096, 4096, 0.0, 0.13782000541687012, 0.091247081756591797, 0.10477304458618164] Image Width, Image Height, BitmapAdd Time (s), CUDA host to device time (s), CUDA kernel time (s), CUDA device to host time (s) [8, 8, 4.0531158447265625e-06, 0.14317798614501953, 0.00011110305786132812, 2.9087066650390625e-05] [16, 16, 1.1205673217773438e-05, 2.7894973754882812e-05, 2.002716064453125e-05, 3.0994415283203125e-05] [32, 32, 3.910064697265625e-05, 0.00022101402282714844, 1.9788742065429688e-05, 4.00543212890625e-05] [64, 64, 0.00015997886657714844, 0.00022912025451660156, 1.8835067749023438e-05, 8.5115432739257812e-05] [128, 128, 0.00060796737670898438, 0.00033617019653320312, 2.193450927734375e-05, 0.0002880096435546875] [256, 256, 0.0024089813232421875, 0.00092005729675292969, 2.6941299438476562e-05, 0.00098490715026855469] [512, 512, 0.0098149776458740234, 0.0028109550476074219, 2.9087066650390625e-05, 0.0032689571380615234] [1024, 1024, 0.039144039154052734, 0.0086050033569335938, 3.0994415283203125e-05, 0.012082099914550781] [2048, 2048, 0.16218209266662598, 0.033320903778076172, 3.1948089599609375e-05, 0.046805858612060547] [4096, 4096, 0.62636399269104004, 0.13098406791687012, 3.1948089599609375e-05, 0.1876990795135498] Image Width, Image Height, BitmapSubtract Time (s), CUDA host to device time (s), CUDA kernel time (s), CUDA device to host time (s) [8, 8, 0.0, 0.14423704147338867, 0.00011014938354492188, 2.7894973754882812e-05] [16, 16, 0.0, 3.0040740966796875e-05, 2.09808349609375e-05, 3.0040740966796875e-05] [32, 32, 0.0, 0.00021886825561523438, 2.002716064453125e-05, 3.910064697265625e-05] [64, 64, 0.0, 0.00022792816162109375, 1.9073486328125e-05, 8.487701416015625e-05] [128, 128, 0.0, 0.00033783912658691406, 2.09808349609375e-05, 0.00028586387634277344] [256, 256, 0.0, 0.00094914436340332031, 2.7894973754882812e-05, 0.00099706649780273438] [512, 512, 0.0, 0.0029520988464355469, 3.0994415283203125e-05, 0.0033650398254394531] [1024, 1024, 0.0, 0.0087320804595947266, 3.2901763916015625e-05, 0.012176036834716797] [2048, 2048, 0.0, 0.0367889404296875, 3.4809112548828125e-05, 0.046859025955200195] [4096, 4096, 0.0, 0.13089704513549805, 3.3140182495117188e-05, 0.18763494491577148] Image Width, Image Height, BitmapMultiply Time (s), CUDA host to device time (s), CUDA kernel time (s), CUDA device to host time (s) [8, 8, 0.0, 0.14319801330566406, 0.00011515617370605469, 2.8848648071289062e-05] [16, 16, 0.0, 3.0994415283203125e-05, 2.193450927734375e-05, 3.0040740966796875e-05] [32, 32, 0.0, 0.00023102760314941406, 2.2172927856445312e-05, 3.8862228393554688e-05] [64, 64, 0.0, 0.000247955322265625, 2.09808349609375e-05, 8.6069107055664062e-05] [128, 128, 0.0, 0.0003509521484375, 2.193450927734375e-05, 0.00028181076049804688] [256, 256, 0.0, 0.00092196464538574219, 3.0040740966796875e-05, 0.0010011196136474609] [512, 512, 0.0, 0.0029089450836181641, 2.9087066650390625e-05, 0.0032789707183837891] [1024, 1024, 0.0, 0.0086770057678222656, 3.2901763916015625e-05, 0.012228965759277344] [2048, 2048, 0.0, 0.033008813858032227, 3.5047531127929688e-05, 0.047010898590087891] [4096, 4096, 0.0, 0.12997984886169434, 3.3855438232421875e-05, 0.18689799308776855] ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingBart
I just noticed your patch. It seems that it sorts things out. Have you got an explanation or some references that I could have a look at? Thanks Evan On Sun, May 11, 2008 at 10:21 PM, Evan Lezar <evanlezar@...> wrote: Hi there -- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn Sunday 11 May 2008 23:03:34 Evan Lezar wrote:
> Bart > > I just noticed your patch. It seems that it sorts things out. Have you > got an explanation or some references that I could have a look at? Well, while looking through the docs to write up my answer, I came across an even simpler method, which I just submitted. Basically, the calls in bitmap_kernel_entry launch threads on the device, and return immediately. In the meantime, the calculation runs on the device and you are free to launch additional threads. Some methods, such as cudaMemCpy, wait until the threads have finished executing. The cudaEventSynchronise method I used first, blocks until the stop event was recorded (i.e. the device has reached that point). The new fix uses cudaThreadSynchronize, which simply blocks until all threads have finished. Cheers, Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingEvan Lezar wrote:
> I have automated the benchmarking somewhat for the CUDABitmapAdd > plugin (and partially for CUDABitmapMultiply and CUDABitmapSubtract). > The way I read the timing values out was using private data containers > with script_property property policy in the classes. This method > works, but I don't think it is the most elegant solution - in that > these properties that have been added are now visible in the UI. What > would be the right (or better) way of going about this? Setting the > properties to read_only_properties would be better in that the user > will not be able to set them, but they will still be visible in the UI. > > As yet I have not made any changes to the other bitmap modules (with > respect to the benchmarking) and would like to sort this out before I > do anything too major. Good plan. Summing-up the requirements, we would like to: * Allow for high-precision timing. I don't trust Python to do this due to the overhead of the interpreter. * Allow for timing with sub-filter granularity - i.e. time internal processes within the filter. * Get the results from the filter to an outside observer (a script, the dashboard, etc) with a minimum of coupling and without cluttering the UI. Fortunately, we're already around 80% of the way there. If you haven't already, I encourage you to try creating a "Pipeline Profiler" panel, and then creating a simple mesh pipeline, maybe a source and a few modifiers. What you will see in the Pipeline Profiler is a two-level hierarchy of execution times for nodes/tasks. What's happening is that nodes call the methods in the k3d::ipipeline_profiler interface to mark the start-and-finish of internal operations, which are labelled with an arbitrary string (the task). The Pipeline Profiler panel acts as an observer of these events as-they-happen, displaying the most-recent times for each node/task. It's a pretty-good solution to a tough problem. So, it would be easy for you to make calls to the profiler to record your internal events. The only thing missing is a way to get the data to where it can be used, such as a script. My thought is to introduce a simple new class that would do the same thing the Pipeline Profiler Panel does, just without the UI. It would provide an API that could be wrapped into the Python object model and used in tests. You could take a stab at this if you like, or I could knock it out - your choice. Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingBart and Tim
I was under the impression that a call to __syncthreads() was supposed to do just this, but obviously I was mistaken :) Thanks for the assistance. With regards to the non-UI profiler - I would like to give it a shot (Summer of code is supposed to be a learning experience to some extent) but I will give you a shout if I get stuck. Thanks Evan On Mon, May 12, 2008 at 8:25 AM, Timothy M. Shead <tshead@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingOn Mon, May 12, 2008 at 8:35 AM, Evan Lezar <evanlezar@...> wrote:
> I was under the impression that a call to __syncthreads() was supposed to do > just this, but obviously I was mistaken :) Thanks for the assistance. Yes, that had me confused too. However, __syncthreads() and cudaThreadSynchronise() do different things. The former is used on the device to make sure that all threads that are executing in the same block all have reached the same point, so they can i.e. exchange data. The latter makes the host runtime wait until all threads on the device have finished executing. Cheers, -- Bart ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingTim
I have been having a look at the pipeline_profiler code and have added some functionality that uses it to the CUDABitmapAdd plugin as well as the BitmapAdd plugin to begin with. I have also seen that the document().pipeline_profiler() is used to keep track of the execution times for the different start->finish calls. I have not yet been able to extend the wrapping of idocument to include the pipeline_profiler() method or the neccesary (i assume) wrapper for the pipeline_profiler itself aswell as the added functionality to actually get to the stored times from python. I will have a look at it again tomorrow, but please let me know if I am barking up the wrong tree. As I mentioned, I have added the pipeline profiling calls to both versions of the add plugin - and I was wondering what the reason was that it was not already present in the existing one? Shouldn't the profiling be implemented in every pipeline modifier - the same goes for the pipeline visualization code? What may be needed then would be a flag to turn the profiling of a node on or off if required. I have submitted the profile changes as well as an early implementation of CUDABitmapMonochrome. I will submit the updated test cases including the catching of python exceptions tomorrow during the day. Evan On Mon, May 12, 2008 at 8:25 AM, Timothy M. Shead <tshead@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingEvan Lezar wrote:
> I have been having a look at the pipeline_profiler code and have added > some functionality that uses it to the CUDABitmapAdd plugin as well as > the BitmapAdd plugin to begin with. I have also seen that the > document().pipeline_profiler() is used to keep track of the execution > times for the different start->finish calls. > > I have not yet been able to extend the wrapping of idocument to > include the pipeline_profiler() method or the neccesary (i assume) > wrapper for the pipeline_profiler itself aswell as the added > functionality to actually get to the stored times from python. I will > have a look at it again tomorrow, but please let me know if I am > barking up the wrong tree. callback mechanism to the Python object model, which would be a) completely new and b) pretty tricky. A better idea would be to create an application plugin to collect profiler events, then have it expose the resulting data as a read-only property or something-else. There's still some trickiness there getting the data into a form where the Python layer can read it, but at least it doesn't introduce this entirely new notion of Python callbacks. And as always plugins are preferred so users don't pay for what they don't use. This does start to go a little further outside your project than I'd like, so I encourage you to focus on intrumenting your code with profiler calls wherever appropriate, and I'll work out the details on the profiler plugin. > As I mentioned, I have added the pipeline profiling calls to both > versions of the add plugin - and I was wondering what the reason was > that it was not already present in the existing one? Shouldn't the > profiling be implemented in every pipeline modifier - the same goes > for the pipeline visualization code? What may be needed then would be > a flag to turn the profiling of a node on or off if required. There are always too many things to do, and not enough time to do them. I wrote the pipeline profiler code when I was considering moving the mesh data structures in a new direction (to where they are today) - I put in code where I needed it and moved-on. It's great to have you filling-in the gaps! Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingTimothy M. Shead wrote:
> A better idea would be to create > an application plugin to collect profiler events, then have it expose > the resulting data as a read-only property or something-else. Correction: a document plugin ;) Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingEvan Lezar wrote:
> I have submitted the profile changes as well as an early > implementation of CUDABitmapMonochrome. I will submit the updated > test cases including the catching of python exceptions tomorrow during > the day. Actually, you didn't commit your sources for CUDABitmapMonochrome ... easy mistake to make ;) Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingDOh! I was wondering why that test failed :)
I have been making some changed today anyway. Have added the pipeline profiler to all the existing bitmap plugins as well as profiling of host to device, kernel execution, and device to host for the existing CUDA plugins. I also just noticed that you have submitted some changes relating to the profiler - I will check them out and incorporate them into my tests - hopefully being able to remove the properties currently being used for benchmarking. Any tips for the python usage? Evan On Sun, May 18, 2008 at 6:32 PM, Timothy M. Shead <tshead@...> wrote:
-- visit http://randomestrandom.blogspot.com ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingEvan Lezar wrote:
> I also just noticed that you have submitted some changes relating to > the profiler - I will check them out and incorporate them into my > tests - hopefully being able to remove the properties currently being > used for benchmarking. > > Any tips for the python usage? Yes, see the plugin documentation at http://www.k-3d.org/wiki/PipelineProfiler and the regression test at tests/test.PipelineProfiler.py Cheers, Tim ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ K3d-development mailing list K3d-development@... https://lists.sourceforge.net/lists/listinfo/k3d-development |
|
|
Re: CUDA benchmarkingThanks I just noticed the commit for the test. Will have a look and get going. Evan On Sun, May 18, 2008 at 7:30 PM, Timothy M. Shead <tshead@...> wrote: |