|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Strange space leakHi all,
I just noticed that a tiny change to the program which I posted recently in the "More idiomatic use of strictness" thread causes a space leak. The code is: {-# LANGUAGE BangPatterns, PatternGuards #-} import Data.List (foldl') import Data.Char split delim s | [] <- rest = [token] | otherwise = token : split delim (tail rest) where (token,rest) = span (/=delim) s main = do putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents stats ws docs = foldl' f ((map (const 0) ws),0) docs where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w `elem` d))) ws dfs in sum dfs' `seq` (dfs',n+1) If I change this line: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) getContents to this: putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words . map toLower) getContents suddenly the programs starts using tons of memory instead of running in small constant space. What's going on? Best, -- Grzegorz |
|
|
Re: Strange space leakGrzegorz Chrupala wrote:
> Hi all, > I just noticed that a tiny change to the program which I posted recently in > the "More idiomatic use of strictness" thread causes a space leak. > > The code is: > {-# LANGUAGE BangPatterns, PatternGuards #-} > import Data.List (foldl') > import Data.Char > split delim s > | [] <- rest = [token] > | otherwise = token : split delim (tail rest) > where (token,rest) = span (/=delim) s > > main = do > putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) > getContents > > stats ws docs = foldl' f ((map (const 0) ws),0) docs > where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w > `elem` d))) ws dfs > in sum dfs' `seq` (dfs',n+1) > > If I change this line: > putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words) > getContents > to this: > putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words > .. map toLower) getContents > > suddenly the programs starts using tons of memory instead of running in > small constant space. > What's going on? Answer: split "<DOC>" . words . map toLower = (:[]) . words . map toLower Since you converted everything to lowercase, the string "<DOC>" will never appear in the text, resulting in a single huge document. Furthermore, due to `elem` d , your stats function takes space proportional to the length of each document it processes. Beauty & makeup tips: putStrLn =<< fmap f getContents = putStrLn . f =<< getContents ~= interact f Here's a version with glittering nail polish that should run in constant space: split y xs = zs : case xs' of [] -> [] _:xs' -> split y xs' where (zs,xs') = break (==y) xs main = interact $ show . stats ["the","a","and"] . split "<DOC>" . words zipWith' f xs ys = zipWith f xs ys `using` rnf stats ws = foldl' (zipWith' (+)) zero . map (foldl' (zipWith' max) zero . map bits) where zero = map (const 0) ws bits v = map (fromEnum . (== v)) ws Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Strange space leakOops, that should have been obvious, sorry for the dumb question. Thanks, -- Grzegorz |
|
|
Re: Strange space leakGrzegorz Chrupala wrote:
>> >> split "<DOC>" . words . map toLower = (:[]) . words . map toLower >> >> Since you converted everything to lowercase, the string "<DOC>" will >> never appear in the text, resulting in a single huge document. >> > > Oops, that should have been obvious, sorry for the dumb question. > Thanks, No problem, it was not obvious to me and I had fun trying to figure it out :) Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Re: Strange space leakapfelmus wrote:
> Grzegorz Chrupala wrote: >>> >>> split "<DOC>" . words . map toLower = (:[]) . words . map toLower >>> >>> Since you converted everything to lowercase, the string "<DOC>" will >>> never appear in the text, resulting in a single huge document. >> >> Oops, that should have been obvious, sorry for the dumb question. >> Thanks, > > No problem, it was not obvious to me and I had fun trying to figure it > out :) Speaking of not obvious: Haskell's type system catches a lot of bugs -- but still gives no help with this particular 'problem'. But one can easily imagine an extension to a type system which could have detected that "<DOC>" can never occur in the result of words . map toLower, and then with a bit more work [type-level Nat], the type of the full expression could have encoded that the result is always going to be of length 1. That would surely have been a good hint that something non-trivial was going on. Whether a Haskell-friendly type system extension could be created/implemented which would cover this example, I don't know. However, I have had a lot of fun with the underlying idea: anytime someone encounters a bug in their code (and relates the debugging story on haskell-cafe), try to imagine how the type system could be extended to automate that. In most cases, I don't mean to have the type system reject the code, but rather to have an inferred type that would make it obvious that the code did not behave as expected. Jacques _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
| Free Forum Powered by Nabble | Forum Help |