Strange space leak

View: New views
5 Messages — Rating Filter:   Alert me  

Strange space leak

by Grzegorz Chrupala :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,
I just noticed that a tiny change to the program which I posted recently in the "More idiomatic use of strictness" thread causes a space leak.
The code is:
{-# LANGUAGE BangPatterns, PatternGuards #-}
import Data.List (foldl')
import Data.Char
split delim s
        | [] <- rest = [token]
        | otherwise = token : split delim (tail rest)
   where (token,rest) = span (/=delim) s

main = do
  putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents

stats ws docs =  foldl' f ((map (const 0) ws),0) docs
    where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w `elem` d))) ws dfs
                        in  sum dfs' `seq` (dfs',n+1)

If I change this line:
  putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents
to this:
  putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words . map toLower) getContents

suddenly the programs starts using tons of memory instead of running in small constant space.
What's going on?

Best,
--
Grzegorz

Re: Strange space leak

by apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Grzegorz Chrupala wrote:

> Hi all,
> I just noticed that a tiny change to the program which I posted recently in
> the "More idiomatic use of strictness" thread causes a space leak.
>
> The code is:
> {-# LANGUAGE BangPatterns, PatternGuards #-}
> import Data.List (foldl')
> import Data.Char
> split delim s
> | [] <- rest = [token]
> | otherwise = token : split delim (tail rest)
>    where (token,rest) = span (/=delim) s
>
> main = do
>   putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words)
> getContents
>
> stats ws docs =  foldl' f ((map (const 0) ws),0) docs
>     where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w
> `elem` d))) ws dfs
>                         in  sum dfs' `seq` (dfs',n+1)
>
> If I change this line:
>   putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words)
> getContents
> to this:
>   putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words
> .. map toLower) getContents
>
> suddenly the programs starts using tons of memory instead of running in
> small constant space.
> What's going on?

Answer:

   split "<DOC>" . words . map toLower = (:[]) . words . map toLower

Since you converted everything to lowercase, the string "<DOC>" will
never appear in the text, resulting in a single huge document.
Furthermore, due to  `elem` d , your  stats  function takes space
proportional to the length of each document it processes.


Beauty & makeup tips:

     putStrLn =<< fmap f getContents
   = putStrLn . f =<< getContents
  ~= interact f

Here's a version with glittering nail polish that should run in constant
space:

   split y xs = zs : case xs' of
           []    -> []
           _:xs' -> split y xs'
       where (zs,xs') = break (==y) xs

   main = interact $
       show . stats ["the","a","and"] . split "<DOC>" . words

   zipWith' f xs ys = zipWith f xs ys `using` rnf

   stats ws = foldl' (zipWith' (+)) zero
       . map (foldl' (zipWith' max) zero . map bits)
       where
       zero   = map (const 0) ws
       bits v = map (fromEnum . (== v)) ws


Regards,
apfelmus

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Strange space leak

by Grzegorz Chrupala :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

apfelmus wrote:
Answer:

   split "<DOC>" . words . map toLower = (:[]) . words . map toLower

Since you converted everything to lowercase, the string "<DOC>" will
never appear in the text, resulting in a single huge document.
Oops, that should have been obvious, sorry for the dumb question.
Thanks,
--
Grzegorz

Re: Strange space leak

by apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Grzegorz Chrupala wrote:
>>
>>    split "<DOC>" . words . map toLower = (:[]) . words . map toLower
>>
>> Since you converted everything to lowercase, the string "<DOC>" will
>> never appear in the text, resulting in a single huge document.
>>
>
> Oops, that should have been obvious, sorry for the dumb question.
> Thanks,

No problem, it was not obvious to me and I had fun trying to figure it out :)


Regards,
apfelmus

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: Strange space leak

by Jacques Carette :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

apfelmus wrote:

> Grzegorz Chrupala wrote:
>>>
>>>    split "<DOC>" . words . map toLower = (:[]) . words . map toLower
>>>
>>> Since you converted everything to lowercase, the string "<DOC>" will
>>> never appear in the text, resulting in a single huge document.
>>
>> Oops, that should have been obvious, sorry for the dumb question.
>> Thanks,
>
> No problem, it was not obvious to me and I had fun trying to figure it
> out :)

Speaking of not obvious: Haskell's type system catches a lot of bugs --
but still gives no help with this particular 'problem'.  But one can
easily imagine an extension to a type system which could have detected
that "<DOC>" can never occur in the result of words . map toLower, and
then with a bit more work [type-level Nat], the type of the full
expression could have encoded that the result is always going to be of
length 1.  That would surely have been a good hint that something
non-trivial was going on.

Whether a Haskell-friendly type system extension could be
created/implemented which would cover this example, I don't know.  
However, I have had a lot of fun with the underlying idea: anytime
someone encounters a bug in their code (and relates the debugging story
on haskell-cafe), try to imagine how the type system could be extended
to automate that.  In most cases, I don't mean to have the type system
reject the code, but rather to have an inferred type that would make it
obvious that the code did not behave as expected.

Jacques
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe
LightInTheBox - Buy quality products at wholesale price