[Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

View: New views
12 Messages — Rating Filter:   Alert me  

Parent Message unknown [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Benedikt Huber :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I'd like to propose bugfixes, documentation fixes and a performance  
improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't  
effect the expected behaviour of the PP library.

I've written a QuickCheck test suite for the pretty printer (to test  
the improvement), and found two bugs and some misconceptions/
ambiguities in the documentation. Additionally, there is a  
microbenchmark for the suggested improvement.
Both are available at http://code.haskell.org/~bhuber/Text/ 
PrettyPrint/. Note that the QuickCheck tests need access to all top-
level names in HughesPJ (i.e. ignore the export list).

In summary, I propose to
* fix a bug in fillNB and one in fillNB/sepNB
* correct documentation on laws and invariants.
* add more efficient implementations of vcat,hsep,hcat

More specifically:

(1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)

(2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical  
composition

(3) Lazy implementations of vcat,hcat and hsep

(4) Law <t2> isn't always true

(5) Invariant 5 should be made stronger

(6) Change the comment about negative indentation



The details follow below (maybe a little long).

best regards,

benedikt


=
=
=
=
=
=
=
=
========================================================================
== Details (Long) ==

Bug Fixes
--------------------------------------


(1) Bugfix fillNB

Law <l1> states that

 > sep (ps++[empty]++qs)   = sep (ps ++ qs)
 >         ...ditto hsep, hcat, vcat, fill...

In the current implementation, this fails for the paragraph fill  
variants.

 > render' $ fill True [ text "c", text "c",empty, text "c", text "b"]
 >   where render' = renderStyle (Style PageMode 7 1.4)
 >> c c c
 >>     b

The reason is a missing test for Empty in fillNB

2) Bugfix: overlap and f?(cat|sep)

The specification for cat/sep:
  * oneLiner (hcat/hsep ps)
   `union`
    vcat ps [*]

But currently cat, sep, fcat and fsep attempt to overlap the second  
line with the first one, i.e. they use
`foldr ($$) empty ps' instead of `foldr ($+$) empty ps' [*]. I assume  
this is a mistake.

This bug can lead to situations, where the line in the right argument  
of Union is actually longer:

 > prettyDoc$ cat [ text "a", nest 2 ( text "b") ]
 >> text "a"; union
 >>           (text "b"; empty)
 >>           (nilabove; nest 1; text "b"; empty)

 > renderStyle (Style PageMode 1 1) $ cat [ text "a", nest 2 ( text  
"b") ]
 >> "a b"

In the implementation, we call `nilAbove False' instead of `nilAbove  
True' (see patch).

Performance Improvements
--------------------------------------

3) Improving the space/time performance of vcat, hsep, hcat

vcat (hsep,cat) is implemented in an unneccessarily strict way.
We only get some output after all of vcat's arguments are evaluated  
and checked against being Empty.
This can be improved by only checking the right argument of foldr  
against being Empty, and
then applying an Empty-filter on the resulting Doc. Space improvement  
is obvious.
The microbenchmark (code.haskell.org/~bhuber/Text/PrettyPrint/
HughesPJPerfCheck.hs) suggests that
the improvements in time are remarkable too.


Documentation fixes
--------------------------------------
The QuickCheck tests revealed that

4) Law <t2> isn't always true

   <t2>    text "" <> x            = x, if x non-empty

only holds if x does not start with nest (obviously, because of law  
<n6>).

5) The invariant 5 should be extended:

    A @NoDoc@ may only appear on the first line of the left argument of
    an union. Therefore, the right argument of an union can never be  
equivalent
    to the empty set.

because this assumption is used when rendering.

6) Change the comment about negative indentation

In the source code we have:

-- (spaces n) generates a list of n spaces
-- It should never be called with 'n' < 0, but that can happen for  
reasons I don't understand

If we compose a <> b, and the first line of b is deeply nested, but  
other lines of b are not,
then, because <> eats the nest, the pretty printer will try to layout  
some of b's lines with
negative indentation:

doc       |0123345
------------------
d1        |a
d2        |   b
          |c
d1<>d2    |ab
         c|

Here is the reason why this is rather unavoidable: In John Hughes  
paper, there is a line stating that
    "composing layouts does not change the layouts being composed".
Another one states that
    "<> should eat up nests"
But this leads to negative indentation - imho a user error.

== End ==






_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

HughesPJ.patch (5K) Download Attachment

RE: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Simon Peyton-Jones :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you for doing this Benedict.  I've added your more detailed comments to ticket #2393 so that they are preserved.

Ian: would you like to apply?  I'm not sure how to integrate the QuickCheck tests, but I bet you know.


Benedict: while you are in the area, would you like to take a swing at http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?

Simon


| -----Original Message-----
| From: libraries-bounces@... [mailto:libraries-bounces@...] On Behalf Of Benedikt Huber
| Sent: 24 June 2008 13:12
| To: libraries@...
| Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
|
| Hello,
|
| I'd like to propose bugfixes, documentation fixes and a performance
| improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
| effect the expected behaviour of the PP library.
|
| I've written a QuickCheck test suite for the pretty printer (to test
| the improvement), and found two bugs and some misconceptions/
| ambiguities in the documentation. Additionally, there is a
| microbenchmark for the suggested improvement.
| Both are available at http://code.haskell.org/~bhuber/Text/
| PrettyPrint/. Note that the QuickCheck tests need access to all top-
| level names in HughesPJ (i.e. ignore the export list).
|
| In summary, I propose to
| * fix a bug in fillNB and one in fillNB/sepNB
| * correct documentation on laws and invariants.
| * add more efficient implementations of vcat,hsep,hcat
|
| More specifically:
|
| (1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)
|
| (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
| composition
|
| (3) Lazy implementations of vcat,hcat and hsep
|
| (4) Law <t2> isn't always true
|
| (5) Invariant 5 should be made stronger
|
| (6) Change the comment about negative indentation
|
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re[2]: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Bulat Ziganshin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Neil,

Tuesday, June 24, 2008, 10:09:20 PM, you wrote:
> working on it? I am currently limited to Hugs only (not enough disk
> space to install GHC and no SSH access...)

is it prison or university? ;)

--
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@...

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Benedikt,

Great work, and many thanks!

One ticket that might be interesting, while you are in the area, is:

http://hackage.haskell.org/trac/ghc/ticket/1217

The code is attached, I went through all the API review months and
months ago, and everyone agreed that the function should be added, but
under the name "zeroWidthText".

I'm guessing it isn't too much effort to tack this on while you are
working on it? I am currently limited to Hugs only (not enough disk
space to install GHC and no SSH access...), so am unable to patch any
libraries myself.

Thanks

Neil


On 6/24/08, Simon Peyton-Jones <simonpj@...> wrote:

> Thank you for doing this Benedict.  I've added your more detailed comments to ticket #2393 so that they are preserved.
>
>  Ian: would you like to apply?  I'm not sure how to integrate the QuickCheck tests, but I bet you know.
>
>
>  Benedict: while you are in the area, would you like to take a swing at http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?
>
>  Simon
>
>
>
>  | -----Original Message-----
>  | From: libraries-bounces@... [mailto:libraries-bounces@...] On Behalf Of Benedikt Huber
>  | Sent: 24 June 2008 13:12
>  | To: libraries@...
>  | Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
>  |
>  | Hello,
>  |
>  | I'd like to propose bugfixes, documentation fixes and a performance
>  | improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
>  | effect the expected behaviour of the PP library.
>  |
>  | I've written a QuickCheck test suite for the pretty printer (to test
>  | the improvement), and found two bugs and some misconceptions/
>  | ambiguities in the documentation. Additionally, there is a
>  | microbenchmark for the suggested improvement.
>  | Both are available at http://code.haskell.org/~bhuber/Text/
>  | PrettyPrint/. Note that the QuickCheck tests need access to all top-
>  | level names in HughesPJ (i.e. ignore the export list).
>  |
>  | In summary, I propose to
>  | * fix a bug in fillNB and one in fillNB/sepNB
>  | * correct documentation on laws and invariants.
>  | * add more efficient implementations of vcat,hsep,hcat
>  |
>  | More specifically:
>  |
>  | (1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)
>  |
>  | (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
>  | composition
>  |
>  | (3) Lazy implementations of vcat,hcat and hsep
>  |
>  | (4) Law <t2> isn't always true
>  |
>  | (5) Invariant 5 should be made stronger
>  |
>  | (6) Change the comment about negative indentation
>  |
>
> _______________________________________________
>  Libraries mailing list
>  Libraries@...
>  http://www.haskell.org/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Benedikt Huber :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Simon Peyton-Jones schrieb:
> Thank you for doing this Benedict.  I've added your more detailed comments to ticket #2393 so that they are preserved.
>
> Ian: would you like to apply?  I'm not sure how to integrate the QuickCheck tests, but I bet you know.
I'm glad this is useful - and I'm impressed by the complexity of the
pretty printer library. I'm tackling the other tickets now and adding
more QuickCheck test.

> Benedict: while you are in the area, would you like to take a swing at http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?
In my opinion, ticket #667 can be marked invalid (see comments).

Concerning ticket #1337, we have to change the formal specification of
fill (it doesn't match the implementation):

-- Current Specification:
--   fill []  = empty
--   fill [p] = p
--   fill (p1:p2:ps) = oneLiner p1 <#> nest (length p1)
--                                          (fill (oneLiner p2 : ps))
--                     `union`
--                      p1 $$ fill ps

Problem 1: We want to `unnest' the second argument of (p1 $$ fill ps),
but not the first one

In the definition above we have e.g.

 > getSecondLayout $
 >   fillDef False [text "a", text "b", text "a"]

 >> text "ab"; nilabove; nest -1; text "a"; empty
 >> |ab|
 >> |.a|

Problem 2: The overlapping $$ should only be used for those layouts of
p1 which aren't one liners (otherwise violating the invariant "Left
union arg has shorter first line").

I suggest the following specification (i believe it almost matches the
current implementation, modulo [fillNB: fix bug #1337] (see below):

-- Revised Specification:
--   fill g docs = fill' 0 docs
--   gap g       = if g then 1 else 0
--   fill' n []  = []
--   fill' n [p] = [p]
--   fill' n (p1:p2:ps) =
--      oneLiner p1 <g> (fill' (n+length p1+gap g) (oneLiner p2 : ps))
--        `union`
--     (p1 $*$ nest (-n) (fill' g ps))
--
-- $*$ is defined for layouts (One-Layout Documents) as
--
-- layout1 $*$ layout2 | isOneLiner layout1 = layout1 $+$ layout2
--                     | otherwise          = layout1 $$ layout2

I've also implemented the specification in HughesPJQuickCheck.hs,
and checked them against the patched pretty printer.

Concerning Bug #1337:
~~~~~~~~~~~~~~~~~~~~~

If the above formal specification is fine, it is easy to fix:
elide the nests of (oneLiner p2) [see attached patch, record bug #1337].

 > PrettyPrint(0) $ ./Bug1337
 > ....ab
 > ...c

The (long) explanation follows below.

I'll look into the other tickets too.
thorkil: Can you help me with a simplified test case (pretty printer
only) of Bug #1176 ?
Also, I didn't find any regressions tests for the pretty printer - I'll
create some if someone points me in the right direction.

best regards,
benedikt

===========================================================
Explanation of Bug #1337:

Consider

 > fcat [ nest 1 $ text "a", nest 2 $ text "b", text "c"]

--> expected: (nest 1; text "a"; text "b"; nest -3; "c")
--> actual  : (nest 1; text "a"; text "b"; nest -5; "c")

Reduction:
=== (nest 1; text a) <> (fill (-2) (p2:ps))
==>                     (nest 2 (text "b") $+$ text "c")
==>                     (nest 2 (text "b")) `nilabove`
                         (nest (-3) (text "c"))
==> (nest 1; text a; text b; nest -5 c)

The problem is that if we decide to layout (p1:p2:ps) as

| p1 p2
| ps

(call it layout A), then we want to have

 > (p1 <> p2) $+$ ps.

But following law <n6> this means that

 > fcat_A [p1:nest k p2:ps]

is equivalent to

 > fcat_A [p1,p2,ps]

so the nest of p2 has to be removed.

This is somewhat similar to bug #667, but easier to fix
from a semantic point of view:
p1,p2 and ps are distinct layouts - we only have to preserve the
individual layouts, and no combinations of them.



>
> Simon
>
>
> | -----Original Message-----
> | From: libraries-bounces@... [mailto:libraries-bounces@...] On Behalf Of Benedikt Huber
> | Sent: 24 June 2008 13:12
> | To: libraries@...
> | Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
> |
> | Hello,
> |
> | I'd like to propose bugfixes, documentation fixes and a performance
> | improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
> | effect the expected behaviour of the PP library.
> |
> | I've written a QuickCheck test suite for the pretty printer (to test
> | the improvement), and found two bugs and some misconceptions/
> | ambiguities in the documentation. Additionally, there is a
> | microbenchmark for the suggested improvement.
> | Both are available at http://code.haskell.org/~bhuber/Text/
> | PrettyPrint/. Note that the QuickCheck tests need access to all top-
> | level names in HughesPJ (i.e. ignore the export list).
> |
> | In summary, I propose to
> | * fix a bug in fillNB and one in fillNB/sepNB
> | * correct documentation on laws and invariants.
> | * add more efficient implementations of vcat,hsep,hcat
> |
> | More specifically:
> |
> | (1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)
> |
> | (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
> | composition
> |
> | (3) Lazy implementations of vcat,hcat and hsep
> |
> | (4) Law <t2> isn't always true
> |
> | (5) Invariant 5 should be made stronger
> |
> | (6) Change the comment about negative indentation
> |


New patches:

[fillNB bug, lazy vcat
benedikt.huber@...**20080624113715] {
hunk ./Text/PrettyPrint/HughesPJ.hs 410
+  
+** because of law n6, t2 only holds if x doesn't
+** start with `nest'.
+    
hunk ./Text/PrettyPrint/HughesPJ.hs 429
-<m1>    (text s <> x) $$ y = text s <> ((text "" <> x)) $$
-                                         nest (-length s) y)
+<m1>    (text s <> x) $$ y = text s <> ((text "" <> x) $$
+                                         nest (-length s) y)
hunk ./Text/PrettyPrint/HughesPJ.hs 490
+-- lazy list versions
+hcat = reduceAB . foldr (beside_' False) empty
+hsep = reduceAB . foldr (beside_' True)  empty
+vcat = reduceAB . foldr (above_' True) empty
hunk ./Text/PrettyPrint/HughesPJ.hs 495
-hcat = foldr (<>)  empty
-hsep = foldr (<+>) empty
-vcat = foldr ($$)  empty
+beside_' :: Bool -> Doc -> Doc -> Doc
+beside_' _ p Empty = p
+beside_' g p q = Beside p g q
+
+above_' :: Bool -> Doc -> Doc -> Doc
+above_' _ p Empty = p
+above_' g p q = Above p g q
+
+reduceAB :: Doc -> Doc
+reduceAB (Above Empty _ q) = q
+reduceAB (Beside Empty _ q) = q
+reduceAB doc = doc
hunk ./Text/PrettyPrint/HughesPJ.hs 556
-  * The arugment of @TextBeside@ is never @Nest@.
+  * The argument of @TextBeside@ is never @Nest@.
hunk ./Text/PrettyPrint/HughesPJ.hs 564
-  * The right argument of a union cannot be equivalent to the empty set
-    (@NoDoc@).  If the left argument of a union is equivalent to the
-    empty set (@NoDoc@), then the @NoDoc@ appears in the first line.
+  * A @NoDoc@ may only appear on the first line of the left argument of an
+    union. Therefore, the right argument of an union can never be equivalent
+    to the empty set (@NoDoc@).
hunk ./Text/PrettyPrint/HughesPJ.hs 578
-        -- Arg of a NilAbove is always an RDoc
-nilAbove_ :: Doc -> Doc
+-- Invariant: Args to the 4 functions below are always RDocs
+nilAbove_ :: RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 583
-textBeside_ :: TextDetails -> Int -> Doc -> Doc
+textBeside_ :: TextDetails -> Int -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 586
-        -- Arg of Nest is always an RDoc
-nest_ :: Int -> Doc -> Doc
+nest_ :: Int -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 589
-        -- Args of union are always RDocs
-union_ :: Doc -> Doc -> Doc
+union_ :: RDoc -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 765
-                            nilAboveNest False k (reduceDoc (vcat ys))
+                            nilAboveNest True k (reduceDoc (vcat ys))
hunk ./Text/PrettyPrint/HughesPJ.hs 810
-fillNB g Empty k (y:ys)    = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
+fillNB g Empty k (Empty:ys)  = fillNB g Empty k ys
+fillNB g Empty k (y:ys)    = fillNBE g k y ys
+fillNB g p k ys            = fill1 g p k ys
+
+fillNBE g k y ys           = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
hunk ./Text/PrettyPrint/HughesPJ.hs 816
-                             nilAboveNest False k (fill g (y:ys))
+                             nilAboveNest True k (fill g (y:ys))
hunk ./Text/PrettyPrint/HughesPJ.hs 821
-fillNB g p k ys            = fill1 g p k ys
-
-
hunk ./Text/PrettyPrint/HughesPJ.hs 1039
--- It should never be called with 'n' < 0, but that can happen for reasons I don't understand
--- Here's a test case:
--- ncat x y = nest 4 $ cat [ x, y ]
--- d1 = foldl1 ncat $ take 50 $ repeat $ char 'a'
--- d2 = parens $  sep [ d1, text "+" , d1 ]
--- main = print d2
--- I don't feel motivated enough to find the Real Bug, so meanwhile we just test for n<=0
+-- returns the empty string on negative argument.
+--
hunk ./Text/PrettyPrint/HughesPJ.hs 1045
-{- Comments from Johannes Waldmann about what the problem might be:
+{-
+Concerning negative indentation:
+If we compose a <> b, and the first line of b is deeply nested, but other lines of b are not,
+then, because <> eats the nest, the pretty printer will try to layout some of b's lines with
+negative indentation:
hunk ./Text/PrettyPrint/HughesPJ.hs 1051
-   In the example above, d2 and d1 are deeply nested, but `text "+"' is not,
-   so the layout function tries to "out-dent" it.
-  
-   when I look at the Doc values that are generated, there are lots of
-   Nest constructors with negative arguments.  see this sample output of
-   d1 (obtained with hugs, :s -u)
-  
-   tBeside (TextDetails_Chr 'a') 1 Doc_Empty) (Doc_NilAbove (Doc_Nest
-   (-241) (Doc_TextBeside (TextDetails_Chr 'a') 1 Doc_Empty)))))
-   (Doc_NilAbove (Doc_Nest (-236) (Doc_TextBeside (TextDetails_Chr 'a') 1
-   (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside (TextDetails_Chr 'a') 1
-   Doc_Empty)))))))) (Doc_NilAbove (Doc_Nest (-231) (Doc_TextBeside
-   (TextDetails_Chr 'a') 1 (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside
-   (TextDetails_Chr 'a') 1 (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside
-   (TextDetails_Chr 'a') 1 Doc_Empty))))))))))) (Doc_NilAbove (Doc_Nest
+doc       |0123345
+------------------
+d1        |a
+d2        |   b
+          |c
+d1<>d2    |ab
+         c|
}

[fillNB: fix bug #1337
benedikt.huber@...**20080625114659] {
hunk ./Text/PrettyPrint/HughesPJ.hs 814
-fillNBE g k y ys           = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
+fillNBE g k y ys           = nilBeside g (fill1 g (elideNest . oneLiner . reduceDoc $ y) k1 ys)
hunk ./Text/PrettyPrint/HughesPJ.hs 820
-
+elideNest (Nest k d) = d
+elideNest d = d
}

Context:

[Fix warnings
Ian Lynagh <igloo@...>**20080620135156]
[TAG 2008-05-28
Ian Lynagh <igloo@...>**20080528004408]
Patch bundle hash:
1c1d36f58462f5c86700d37fb4d7729461aea35c

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Benedikt Huber :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Neil Mitchell schrieb:

> Hi Benedikt,
>
> Great work, and many thanks!
>
> One ticket that might be interesting, while you are in the area, is:
>
> http://hackage.haskell.org/trac/ghc/ticket/1217
>
> The code is attached, I went through all the API review months and
> months ago, and everyone agreed that the function should be added, but
> under the name "zeroWidthText".
Hi Neil,

Thanks !

Concerning #1217:
http://www.haskell.org/pipermail/libraries/2007-March/007028.html

I'm not quite sure what you are asking for - I'm not the one to apply
the patches ;)

 From a technical point of view, zeroWidthText wouldn't cause any bugs
in the current implementation, as far as I can see.

Two things should be considered though:

1) when defining zeroTextWidth as (textBeside_ (Str s) 0 Empty), the
client looses the information that (Str s) is a zero-width TextDetail.
When combining the text again (see string_txt for example), it might be
good to know that.
I do not suggest it, but it /might/ be worth to consider using a
different constructor for zero width texts. Of course, this modifies the
API, as TextDetails is exported.

Btw, is there any use for the (PStr :: String -> TextDetails) constructor ?

2) One has to take precautions that future versions of the library do
not use the text length for optimizations.

For example the law
 > text "" <> s = elide_nests s
must not be used by testing against the text length (which is the
obvious and fasted way to do so). Some regression tests would be a good
idea.
--

Btw: the people commenting in the thread linked above seemed to be using
different pretty printer libraries, so maybe some comments from actual
users would be good too.

cheers,
benedikt


>
> I'm guessing it isn't too much effort to tack this on while you are
> working on it? I am currently limited to Hugs only (not enough disk
> space to install GHC and no SSH access...), so am unable to patch any
> libraries myself.
>
> Thanks
>
> Neil
>
>
> On 6/24/08, Simon Peyton-Jones <simonpj@...> wrote:
>> Thank you for doing this Benedict.  I've added your more detailed comments to ticket #2393 so that they are preserved.
>>
>>  Ian: would you like to apply?  I'm not sure how to integrate the QuickCheck tests, but I bet you know.
>>
>>
>>  Benedict: while you are in the area, would you like to take a swing at http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?
>>
>>  Simon
>>
>>
>>
>>  | -----Original Message-----
>>  | From: libraries-bounces@... [mailto:libraries-bounces@...] On Behalf Of Benedikt Huber
>>  | Sent: 24 June 2008 13:12
>>  | To: libraries@...
>>  | Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
>>  |
>>  | Hello,
>>  |
>>  | I'd like to propose bugfixes, documentation fixes and a performance
>>  | improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
>>  | effect the expected behaviour of the PP library.
>>  |
>>  | I've written a QuickCheck test suite for the pretty printer (to test
>>  | the improvement), and found two bugs and some misconceptions/
>>  | ambiguities in the documentation. Additionally, there is a
>>  | microbenchmark for the suggested improvement.
>>  | Both are available at http://code.haskell.org/~bhuber/Text/
>>  | PrettyPrint/. Note that the QuickCheck tests need access to all top-
>>  | level names in HughesPJ (i.e. ignore the export list).
>>  |
>>  | In summary, I propose to
>>  | * fix a bug in fillNB and one in fillNB/sepNB
>>  | * correct documentation on laws and invariants.
>>  | * add more efficient implementations of vcat,hsep,hcat
>>  |
>>  | More specifically:
>>  |
>>  | (1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)
>>  |
>>  | (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
>>  | composition
>>  |
>>  | (3) Lazy implementations of vcat,hcat and hsep
>>  |
>>  | (4) Law <t2> isn't always true
>>  |
>>  | (5) Invariant 5 should be made stronger
>>  |
>>  | (6) Change the comment about negative indentation
>>  |
>>
>> _______________________________________________
>>  Libraries mailing list
>>  Libraries@...
>>  http://www.haskell.org/mailman/listinfo/libraries
>>

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Benedikt,

>  I'm not quite sure what you are asking for - I'm not the one to apply the
> patches ;)

The patch needs redoing with zeroWidthText, rather than zeroWidth. I
can't build a patch, but you are probably at the right point to do so,
and include it into the other patches you are doing. Also, since you
seem to understand the low-level details very well, it was a general
request for comment.

>  Two things should be considered though:
>
>  1) when defining zeroTextWidth as (textBeside_ (Str s) 0 Empty), the client
> looses the information that (Str s) is a zero-width TextDetail.
>  When combining the text again (see string_txt for example), it might be
>  good to know that.

>  I do not suggest it, but it /might/ be worth to consider using a different
> constructor for zero width texts. Of course, this modifies the API, as
> TextDetails is exported.

That seems like a bad idea, as I certainly don't know enough details
about the library to write the correct implementation in every case,
and to be sure there are no pattern-match errors.

>  2) One has to take precautions that future versions of the library do not
> use the text length for optimizations.
>
>  For example the law
>  > text "" <> s = elide_nests s
>  must not be used by testing against the text length (which is the obvious
> and fasted way to do so). Some regression tests would be a good idea.

True, but I guess that it is currently safe?

>  Btw: the people commenting in the thread linked above seemed to be using
> different pretty printer libraries, so maybe some comments from actual users
> would be good too.

I think the discussion proceeded enough to be fairly sure that no one
objected to it, and lots of people wanted it as part of a general
pretty printing library. I think it also showed that lots of serious
pretty-printing users had moved away from HughesPJ - its possible your
fixes and maintenance might tempt them back.

Thanks

Neil

>
> >
> > I'm guessing it isn't too much effort to tack this on while you are
> > working on it? I am currently limited to Hugs only (not enough disk
> > space to install GHC and no SSH access...), so am unable to patch any
> > libraries myself.
> >
> > Thanks
> >
> > Neil
> >
> >
> > On 6/24/08, Simon Peyton-Jones <simonpj@...> wrote:
> >
> > > Thank you for doing this Benedict.  I've added your more detailed
> comments to ticket #2393 so that they are preserved.
> > >
> > >  Ian: would you like to apply?  I'm not sure how to integrate the
> QuickCheck tests, but I bet you know.
> > >
> > >
> > >  Benedict: while you are in the area, would you like to take a swing at
> http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?
> > >
> > >  Simon
> > >
> > >
> > >
> > >  | -----Original Message-----
> > >  | From: libraries-bounces@...
> [mailto:libraries-bounces@...] On Behalf Of Benedikt Huber
> > >  | Sent: 24 June 2008 13:12
> > >  | To: libraries@...
> > >  | Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes,
> performance improvement
> > >  |
> > >  | Hello,
> > >  |
> > >  | I'd like to propose bugfixes, documentation fixes and a performance
> > >  | improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
> > >  | effect the expected behaviour of the PP library.
> > >  |
> > >  | I've written a QuickCheck test suite for the pretty printer (to test
> > >  | the improvement), and found two bugs and some misconceptions/
> > >  | ambiguities in the documentation. Additionally, there is a
> > >  | microbenchmark for the suggested improvement.
> > >  | Both are available at
> http://code.haskell.org/~bhuber/Text/
> > >  | PrettyPrint/. Note that the QuickCheck tests need access to all top-
> > >  | level names in HughesPJ (i.e. ignore the export list).
> > >  |
> > >  | In summary, I propose to
> > >  | * fix a bug in fillNB and one in fillNB/sepNB
> > >  | * correct documentation on laws and invariants.
> > >  | * add more efficient implementations of vcat,hsep,hcat
> > >  |
> > >  | More specifically:
> > >  |
> > >  | (1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)
> > >  |
> > >  | (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
> > >  | composition
> > >  |
> > >  | (3) Lazy implementations of vcat,hcat and hsep
> > >  |
> > >  | (4) Law <t2> isn't always true
> > >  |
> > >  | (5) Invariant 5 should be made stronger
> > >  |
> > >  | (6) Change the comment about negative indentation
> > >  |
> > >
> > > _______________________________________________
> > >  Libraries mailing list
> > >  Libraries@...
> > >  http://www.haskell.org/mailman/listinfo/libraries
> > >
> > >
> >
>
>
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Duncan Coutts :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Wed, 2008-06-25 at 16:03 +0200, Benedikt Huber wrote:

> Btw, is there any use for the (PStr :: String -> TextDetails) constructor ?

Previously it was used for PackedString (or some predecessor of it) but
at some point it got changed to String. There is really very little
advantage to PackedString over String because PackedString is very slow.

However we should not get rid of the PStr constructor, we should change
it to use the new Unicode string type that is in the works. That will be
fast and memory efficient (using a similar underlying representation as
ByteStrings).

So yeah, I suggest we leave it for now and change it to the new fast
unicode text type at some point in the future when that is ready.

Duncan

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Thorkil Naur :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Benedikt,

Thanks for all this. From memory, notes, and (last resort) trac search, the
HughesPJ-related trac tickets on the GHC trac are:

#669 negative indentation in Text.PrettyPrint.HughesPJ
#1176 Infinite loop when printing error message
#1217 Add zeroText to Text.PrettyPrint.HughesPJ
#1337 Fix wrong indentation from Text.PrettyPrint.HughesPJ fill (fcat and
fsep)

I have added comments to #669 and #1176 that I hope speak for themselves. The
problem reported in #1337 has a fix (Fix of wrong indentation from HughesPJ
fill (fcat and fsep)), it may still provide some inspiration.

On Wednesday 25 June 2008 14:41, Benedikt Huber wrote:
> ...
> thorkil: Can you help me with a simplified test case (pretty printer
> only) of Bug #1176 ?

#1176 is a case of #1337 for the internal GHC version of the HughesPJ library
and I expect a fix of #1337 to be applicable to the internal GHC library,
hence fixing #1176. So I expect the failing cases for #1337 to fail for #1176
as well, but being, unfortunately, somewhat less easily exercised.

> Also, I didn't find any regressions tests for the pretty printer - I'll
> create some if someone points me in the right direction.

The patch (Patch with HughesPJ tests with automated high-level coverage check)
attached to #1337 has some tests, but also some rather bombastic things that
attempts to ensure some coverage of the tests. This was created at a point in
time where hpc was not available.

> ...

Thanks and best regards
Thorkil
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Benedikt Huber :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorkil Naur schrieb:

> Hello Benedikt,
>
> Thanks for all this. From memory, notes, and (last resort) trac search, the
> HughesPJ-related trac tickets on the GHC trac are:
>
> #669 negative indentation in Text.PrettyPrint.HughesPJ
> #1176 Infinite loop when printing error message
> #1217 Add zeroText to Text.PrettyPrint.HughesPJ
> #1337 Fix wrong indentation from Text.PrettyPrint.HughesPJ fill (fcat and
> fsep)
>
Hello,
I've added a comment to #2393 commenting those bugs: #669 wontfix, #1217
and #1337 fixed in the patch bundle, #1176 propably fixed.

So, there is one open issue: the specification of fill. Note that the
old specification is flawed and needs to be fixed anyway.

Here's the cleaned up, documented version. It looks implementation
oriented, but is far simpler than the actual implementation.

-- Revised Specification:

--   {- start paragraph fill at column 0 -}
--   fill g docs = fill' 0 docs
--
--   {- base cases -}
--   fill' col []  = []
--   fill' col [p] = [p]
--
--   {- either put p2 beside p1, or below p1 -}
--   fill' col (p1:p2:ps) =
--        {- precondition for p1 beside p2: p1,p2 span only one line -}
--        {- As we build (p1 <> p2), remove the nesting of p2 -}
--            oneLiner p1
--          <g>
--            fill' (col + length p1 + gap g)
--                  (elideNests (oneLiner p2) : ps)
--      `union`
--        {- put p2 below p1; p2 should be aligned with the first
--           argument of fill, which is col columns to the left of p1 -}
--            p1
--          $*$
--            nest (-col) (fill' 0 (p2:ps))
--
--   {- width of space -}
--   gap g       = if g then 1 else 0
--
--
-- $*$ is defined for layouts (One-Layout Documents) as
--
-- layout1 $*$ layout2 | isOneLiner layout1 = layout1 $+$ layout2
--                     | otherwise          = layout1 $$ layout2
--
-- without the the first case, we would violate the one-line lookahead
-- invariant

The specification of fill' - left alone elideNests and $*$ - does not
add any complexity, it is exactly what a paragraph fill should do.

elideNests is an improvement - it makes it possible to have nesting in
the arguments (otherwise, fill fails with nested arguments).

$*$ is a "feature" in the current implementation:
It allows overlapping in certain situations.

Consider
fill |a|  ,  |...c|
      |b|     |...d|
Without $*$ ($+$) we would get
|a|
|b|
|...c|
|...d|
Using $*$ we get
|a|
|b..c|
|...d|

I do not know wheter this might be useful - it is an extra feature
rarely used I suppose ??

Note that the specification of fill has to be changed anyway, either
with $+$ or with $*$.


> I have added comments to #669 and #1176 that I hope speak for themselves. The
> problem reported in #1337 has a fix (Fix of wrong indentation from HughesPJ
> fill (fcat and fsep)), it may still provide some inspiration.
Thank you thorkil, I think your patch was correct; the patches in #2393
achieve the same effect.

>
> On Wednesday 25 June 2008 14:41, Benedikt Huber wrote:
>> ...
>> thorkil: Can you help me with a simplified test case (pretty printer
>> only) of Bug #1176 ?
>
> #1176 is a case of #1337 for the internal GHC version of the HughesPJ library
> and I expect a fix of #1337 to be applicable to the internal GHC library,
> hence fixing #1176. So I expect the failing cases for #1337 to fail for #1176
> as well, but being, unfortunately, somewhat less easily exercised.
Could you have a look at Bug1176a.hs attached to #1337 ? Is this what
happened ?

>
>> Also, I didn't find any regressions tests for the pretty printer - I'll
>> create some if someone points me in the right direction.
>
> The patch (Patch with HughesPJ tests with automated high-level coverage check)
> attached to #1337 has some tests, but also some rather bombastic things that
> attempts to ensure some coverage of the tests. This was created at a point in
> time where hpc was not available.
Ah yes, hpc, thanks for the pointer. Using the quickcheck test suite I
found some unused functions and pattern matches. I've added comments to
the most important ones.

As you seem to know the pretty printer library well, it would be great
if you could comment on the revised specification of fill.

best regards,
benedikt

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Ian Lynagh :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 24, 2008 at 02:11:34PM +0200, Benedikt Huber wrote:
>
> I'd like to propose bugfixes, documentation fixes and a performance  
> improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't  
> effect the expected behaviour of the PP library.

Thanks Benedikt; I've applied your patch.


Thanks
Ian

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

by Thorkil Naur :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Benedikt,

On Thursday 26 June 2008 20:40, Benedikt Huber wrote:
> Thorkil Naur schrieb:
> > Hello Benedikt,
> >
> > Thanks for all this. From memory, notes, and (last resort) trac search,
the

> > HughesPJ-related trac tickets on the GHC trac are:
> >
> > #669 negative indentation in Text.PrettyPrint.HughesPJ
> > #1176 Infinite loop when printing error message
> > #1217 Add zeroText to Text.PrettyPrint.HughesPJ
> > #1337 Fix wrong indentation from Text.PrettyPrint.HughesPJ fill (fcat and
> > fsep)
> >
> Hello,
> I've added a comment to #2393 commenting those bugs: #669 wontfix,

I suggest to add documentation as suggested by Simon PJ and then close it.

> ...
> #1176 propably fixed.  

I don't think so: The infinite loop is gone, but the GHC version of the
pretty-printing library still has the error reported as #1337 against the
external library (see the comments that I added to #1176.).

>
> So, there is one open issue: the specification of fill. Note that the
> old specification is flawed and needs to be fixed anyway.
>
> Here's the cleaned up, documented version. It looks implementation
> oriented, but is far simpler than the actual implementation.
>
> -- Revised Specification:
>
> --   {- start paragraph fill at column 0 -}
> --   fill g docs = fill' 0 docs
> --
> --   {- base cases -}
> --   fill' col []  = []
> --   fill' col [p] = [p]
> --
> --   {- either put p2 beside p1, or below p1 -}
> --   fill' col (p1:p2:ps) =
> --        {- precondition for p1 beside p2: p1,p2 span only one line -}
> --        {- As we build (p1 <> p2), remove the nesting of p2 -}
> --            oneLiner p1
> --          <g>
> --            fill' (col + length p1 + gap g)
> --                  (elideNests (oneLiner p2) : ps)
> --      `union`
> --        {- put p2 below p1; p2 should be aligned with the first
> --           argument of fill, which is col columns to the left of p1 -}
> --            p1
> --          $*$
> --            nest (-col) (fill' 0 (p2:ps))
> --
> --   {- width of space -}
> --   gap g       = if g then 1 else 0
> --
> --
> -- $*$ is defined for layouts (One-Layout Documents) as
> --
> -- layout1 $*$ layout2 | isOneLiner layout1 = layout1 $+$ layout2
> --                     | otherwise          = layout1 $$ layout2
> --
> -- without the the first case, we would violate the one-line lookahead
> -- invariant
>
> The specification of fill' - left alone elideNests and $*$ - does not
> add any complexity, it is exactly what a paragraph fill should do.
>
> elideNests is an improvement - it makes it possible to have nesting in
> the arguments (otherwise, fill fails with nested arguments).
>
> $*$ is a "feature" in the current implementation:
> It allows overlapping in certain situations.
>
> Consider
> fill |a|  ,  |...c|
>       |b|     |...d|
> Without $*$ ($+$) we would get
> |a|
> |b|
> |...c|
> |...d|
> Using $*$ we get
> |a|
> |b..c|
> |...d|
>
> I do not know wheter this might be useful - it is an extra feature
> rarely used I suppose ??
>
> Note that the specification of fill has to be changed anyway, either
> with $+$ or with $*$.
> ...
> > On Wednesday 25 June 2008 14:41, Benedikt Huber wrote:
> >> ...
> >> thorkil: Can you help me with a simplified test case (pretty printer
> >> only) of Bug #1176 ?
> >
> > #1176 is a case of #1337 for the internal GHC version of the HughesPJ
library
> > and I expect a fix of #1337 to be applicable to the internal GHC library,
> > hence fixing #1176. So I expect the failing cases for #1337 to fail for
#1176
> > as well, but being, unfortunately, somewhat less easily exercised.
> Could you have a look at Bug1176a.hs attached to #1337 ? Is this what
> happened ?

This looks like the same thing, yes, so a correstion to the GHC version of the
pretty-printing library like the one you supplied for #1337 seems called for.

> ...
> As you seem to know the pretty printer library well, it would be great
> if you could comment on the revised specification of fill.

The new specification certainly explained things for me that I had not
understood earlier, so it is definitely an improvement.

> ...

Best regards
Thorkil
_______________________________________________
Libraries mailing list
Libraries@...