|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
MacOS X: Redirection performance problemHello,
I'm facing a performance problem under MacOS X when using gawk's output redirection: it's very slow. I have to process CSV files (~5G lines each) that must be splited into separated files (~300) based on a field value, so performance is critical. For now my old PIII outperforms my MacPro... so something clearly isn't right somewhere under MacOS... Here what I'm using: { FS="," row=$0 var=$5 gsub(/\"/,"",var) path=dir"/"var".csv" print row >> path close(path) } Find below some simple test cases that compare performance of my MacPro to an old IBM server. Any idea how the redirection could be optimized under MacOS? I'm not a programmer but I can realize tests if necessary, so please don't hesitate to ask... simply let me know exactly what you want me to do. Best regards, Ben. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin: $ awk -V awk version 20040207 $ time awk '{ print > "/tmp/output.txt" }' /tmp/input.txt real 0m12.071s user 0m5.171s sys 0m6.171s $ time awk '{ print }' < /tmp/input.txt > /tmp/output.txt real 0m3.648s user 0m2.561s sys 0m0.665s -- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin using /dev/null $ time awk '{ print > "/dev/null" }' /tmp/input.txt real 0m7.068s user 0m4.752s sys 0m2.314s $ time awk '{ print }' < /tmp/input.txt > /dev/null real 0m2.602s user 0m2.425s sys 0m0.177s $ wc -l /tmp/output.txt 2000000 /tmp/output.txt $ wc -l /tmp/input.txt 2000000 /tmp/input.txt $ ls -lh /tmp/output.txt -rw-rw-r-- 1 abc abc 129M Sep 21 00:58 /tmp/output.txt -> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 (Built: ./configure --prefix=/usr/local/gawk-3.1.6) : $ /usr/local/gawk-3.1.6/bin/awk -W version GNU Awk 3.1.6 $ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/tmp/output.txt"}' /tmp/input.txt real 0m6.657s user 0m3.968s sys 0m2.107s $ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /tmp/output.txt real 0m6.475s user 0m3.757s sys 0m2.136s -- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 using /dev/null $ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/dev/null"}' /tmp/input.txt real 0m5.341s user 0m3.779s sys 0m1.561s $ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /dev/null real 0m5.192s user 0m3.620s sys 0m1.570s Here an example with gawk 3.1.6 using an old IBM PIII@... server running CentOS 5: $ time /usr/src/gawk-3.1.6/gawk '{ print > "/tmp/output.txt" }' < /tmp/input.txt real 0m3.334s user 0m2.184s sys 0m1.150s $ time /usr/src/gawk-3.1.6/gawk '{ print }' < /tmp/input.txt > /tmp/output.txt real 0m2.969s user 0m1.727s sys 0m1.243s -> IBM PIII@... using /dev/null $ time /usr/src/gawk-3.1.6/gawk '{ print > "/dev/null" }' /tmp/input.txt real 0m2.614s user 0m2.271s sys 0m0.343s $ time /usr/src/gawk-3.1.6/gawk '{ print }' /tmp/input.txt > /dev/null real 0m2.520s user 0m2.144s sys 0m0.358s |
| Free Forum Powered by Nabble | Forum Help |