MacOS X: Redirection performance problem

View: New views
1 Messages — Rating Filter:   Alert me  

MacOS X: Redirection performance problem

by Benjamin M. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I'm facing a performance problem under MacOS X when using gawk's output
redirection: it's very slow.

I have to process CSV files (~5G lines each) that must be splited into
separated files (~300) based on a field value, so performance is
critical. For now my old PIII outperforms my MacPro... so something
clearly isn't right somewhere under MacOS... Here what I'm using:

{
FS=","
row=$0
var=$5
gsub(/\"/,"",var)
path=dir"/"var".csv"
print row >> path
close(path)
}

Find below some simple test cases that compare performance of my MacPro
to an old IBM server. Any idea how the redirection could be optimized
under MacOS? I'm not a programmer but I can realize tests if necessary,
so please don't hesitate to ask... simply let me know exactly what you
want me to do.

Best regards,

Ben.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin:

$ awk -V
awk version 20040207

$ time awk '{ print > "/tmp/output.txt" }'  /tmp/input.txt
real    0m12.071s
user    0m5.171s
sys    0m6.171s

$ time awk '{ print }' < /tmp/input.txt  > /tmp/output.txt
real    0m3.648s
user    0m2.561s
sys    0m0.665s

-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin using /dev/null

$ time awk '{ print > "/dev/null" }'  /tmp/input.txt
real    0m7.068s
user    0m4.752s
sys    0m2.314s

$ time awk '{ print }' < /tmp/input.txt  > /dev/null
real    0m2.602s
user    0m2.425s
sys    0m0.177s


$ wc -l /tmp/output.txt
2000000 /tmp/output.txt
$ wc -l /tmp/input.txt
2000000 /tmp/input.txt
$ ls -lh /tmp/output.txt
-rw-rw-r-- 1 abc abc 129M Sep 21 00:58 /tmp/output.txt


-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 (Built:
./configure --prefix=/usr/local/gawk-3.1.6) :

$ /usr/local/gawk-3.1.6/bin/awk -W version
GNU Awk 3.1.6

$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/tmp/output.txt"}'  
/tmp/input.txt

real    0m6.657s
user    0m3.968s
sys    0m2.107s

$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt >
/tmp/output.txt

real    0m6.475s
user    0m3.757s
sys    0m2.136s


-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 using /dev/null

$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/dev/null"}'  
/tmp/input.txt

real    0m5.341s
user    0m3.779s
sys    0m1.561s

$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /dev/null

real    0m5.192s
user    0m3.620s
sys    0m1.570s


Here an example with gawk 3.1.6 using an old IBM PIII@... server
running CentOS 5:

$ time /usr/src/gawk-3.1.6/gawk '{ print > "/tmp/output.txt" }' <
/tmp/input.txt

real    0m3.334s
user    0m2.184s
sys    0m1.150s

$ time /usr/src/gawk-3.1.6/gawk '{ print }' < /tmp/input.txt >
/tmp/output.txt

real    0m2.969s
user    0m1.727s
sys    0m1.243s

-> IBM PIII@... using /dev/null

$ time /usr/src/gawk-3.1.6/gawk '{ print > "/dev/null" }' /tmp/input.txt

real    0m2.614s
user    0m2.271s
sys    0m0.343s

$ time /usr/src/gawk-3.1.6/gawk '{ print }' /tmp/input.txt > /dev/null

real    0m2.520s
user    0m2.144s
sys    0m0.358s




LightInTheBox - Buy quality products at wholesale price!