|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: Numerical sort, reverse sort, etc?On 2008/04/18, at 03:55, Richard A. O'Keefe wrote: > On 17 Apr 2008, at 9:48 pm, Paulo Moura wrote: >> Just to clarify my suggestion: >> >> sort(L, S, <) - ascending order, remove duplicates, compare >> numerically >> sort(L, S, @<) - ascending order, remove duplicates, compare >> using standard order >> sort(L, S, >) - descending order, remove duplicates, compare >> numerically >> sort(L, S, @>) - descending order, remove duplicates, compare >> using standard order >> sort(L, S, =<) - ascending order, keep duplicates, compare >> numerically >> sort(L, S, @=<) - ascending order, keep duplicates, compare >> using standard order >> sort(L, S, >=) - descending order, keep duplicates, compare >> numerically >> sort(L, S, @>=) - descending order, keep duplicates, compare >> using standard order >> >> sort(L, S, UserFunctor) - whatever the predicate with UserFunctor >> dictates > > Apologies for the upcoming unparliamentary language, but this makes > me sick. Hope you're feeling better now. Of course, better than apologizing is to use nicer language in the first place. > What is so hard to understand about > > "Keep or retain duplicates are not only not the only options, > they aren't even the most important options."? What is so hard to understand that in my suggestion above you're specifying not only keep/discard duplicates but also ascending/ descending order and numeric/standard order (this last one as per Jan initial post)? For someone complaining that other people don't pay attention to what you write, maybe you should try first to actually read what other people wrote? > So the illustrative code below. > > By the way, the third argument > - isn't a functor ((<)/2 is a functor, (<) is an atom) So? Where do I wrote that (<) is a functor??? Above the atoms <, @<, >, @>, =<, @=<, >=, @>= are used to specify *three* different sorting options. When the third argument is a functor you use what the programmer tells you to use. > ... I agree that the compact notation above is not the best option to cover all the requirements in Jan initial post. Using a list of options as the last argument is a better, extendable solution. Cheers, Paulo ----------------------------------------------------------------- Paulo Jorge Lopes de Moura Dep. of Computer Science, University of Beira Interior 6201-001 Covilhã, Portugal Office 4.3 Ext. 3257 Phone: +351 275319891 Fax: +351 275319899 Email: <mailto:pmoura@...> Home page: <http://www.di.ubi.pt/~pmoura> Research: <http://logtalk.org/> ----------------------------------------------------------------- ------------ For further info, please visit http://www.swi-prolog.org/ To unsubscribe, send a plaintext mail with "unsubscribe prolog <e-mail>" in its body to majordomo@... |
|
|
Re: Numerical sort, reverse sort, etc?On 18 Apr 2008, at 8:53 pm, Salvador Fandino wrote:
>> Recall that I proposed the interface >> >> sort(Compare, List, Sorted) >> sort(Compare, Combine, List, Sorted_And_Combined) > > That makes me think that (maybe) we are trying to use the same > "sort" predicate for two unrelated things: > > 1 - sorting > 2 - grouping and processing things that compare equally (eliminating > duplicates is a sub-case of this). They are not unrelated things. The reason that COBOL has had SORT and MERGE statements since the late 1950s is in order to BOTH group and process things that are equal according to some key AND to display results in humans find natural. Find natural for what? For finding EQUAL keys! Why are telephone directories sorted? So that we can find (equal) stuff in them. Why are the indices in books sorted? So that we find (equal) stuff in them. Why are library catalogues sorted? So that we can find stuff in them. Library catalogues also bring together things that are *similar*: if I can't remember that "Greeley" is actually "Greeley, Andrew Moran" or "Greeley, A. M." at least "Greeley*" will find a *range* of things with the *same* (equal!) prefix. I was about to say that the one use of sorting that I was familiar with that *wasn't* about bringing similar things together was sorting a set of integers so that you can use delta encoding (as done in inverted indices), but of course that *isn't* an exception: the aim is to bring similar things together so that they will compress better. There are other ways to bring matching things together. Two of the classic ways to implement JOIN in a relational database are sort/merge join and hash join. Hashing has problems of its own: Andrés Valloud has a very nice new book just out on hashing (in a Smalltalk context, but of very general relevance) showing just how hard it is to hash well. Sort/merge is an excellent general purpose technique that scales well, and it's a major tool for efficient programming in Prolog. > > Are these operation tightly related? My opinion is that they aren't. > Itjust happens that (1) is a convenient way to implement (2), but > this isan implementation detail, not something we should try to > model theinterface after. It doesn't "just happen"; it is central to the nature and uses of sorting. This is not to say that other methods of grouping together matching things do not exist. > > > In my opinion, instead of offering an overcomplicated predicate todo > both things, having two unrelated predicates would be moredesirable. > For instance: > > sort(Compare, List, Sorted) > > group(Compare, List, Combined) ==> creates an unordered list of > lists with the elements comparing equally BUT HOW IS IT SUPPOSED TO DO THAT? Given a comparison predicate, it's going to do it by SORTING. (Oh, and how does it combine them? Your interface doesn't say.) You see, the full name of the paradigm is the sort/merge paradigm. It's not enough to bring things in *one* collection together; it is very often necessary to arrange the elements of *two* collections THE SAME WAY so that the two collections can be efficiently combined. (This is why the ordset representation is so efficient compared with the unordered set representation.) It's not just sort(Comparison, List, Sorted) sort(Comparison, Combination, List, Sorted) that we need, but merge(Comparison, List1, List2, Merged) merge(Comparison, Combination, List1, List2, Combined) > > > and maybe > > combine(Compare, Combine, List, Combined) :- > group(Compare, List, Out), maplist(Combine, Out, Combined). > > unique(Compare, List, Unique) :- ... > > and maybe some extra versions allowing for optimized implementations: Provide as many different extra predicates as you want. I will *still* want one of them to take a comparison predicate and promise me a sorted result. So what the heck could possibly be wrong about calling it 'sort'? Who CARES how many uses it has as long as the specification is the same for all of them? > > > sort(ExtractKey, Compare, List, Sorted) ==> ExtractKey can be > anexpensive operation and compare can be something cheap (for > instance, anumeric comparison, handled as a special case by the C > sortingroutine). There are so many problems with this I don't know where to being. Why assume that there will be such a thing as one underlying method (rather the template expansion) or that it will be written in C (Quintus preferred doing stuff in Prolog; if it wasn't fast enough that meant the Prolog system needed to be faster). > For most sorting algorithms, ExtractKey will be called on theO(N) > part and Compare in the O(NlogN), so this can be a realperformance > boost. In case you hadn't noticed, that's why we have keysort/2. We have maplist/3, so attaching the keys can be done outside the sorting algorithm (it is, as you point out, not really part of the core sorting operation). > group(ExtractKey, Compare, List, Combine) ==> in the same way, > forsome (common) special cases of Compare, a hashing algorithm could > beused to make the groups. Separating the key generation makes > thosecases more likely to happen. Now, tell me please, how is it that the implementation is able to conjure up AT RUN TIME a hashing function that is compatible with the notion of equality implied by the user-defined comparison predicate? (Note that term_hash/2 *cannot* be used all the time; this would break the essential hash/= compatibility axiom for all x, y : equal(x,y) => hash(x) = hash(y).) If group/4 *doesn't* work for *every* reasonable choice of Compare that someone might program, then it simply doesn't work, end of story. I've just spent a couple of hours writing about 250 lines of Smalltalk because I made the mistake of relying on an operation someone had written where "for some special cases ..." it worked, but in general it didn't. > ... and it is not just and optimization matter, I think that they > are also handy predicates! You like 'em, you implement 'em, and post 'em. Note that we *have* maplist, so we don't *need* things with ExtractKey arguments: they add nothing whatever to the efficiency of our programs, and very little to our convenience. The combination operation, in contrast, takes place deep in the guts of sorting, and can make quite a large difference to efficiency. I've got a full implementation of sort/[3,4] that works by template expansion that just needs typing up and will appear in this mailing list some time this week. Te Reo Ingarihi is a taonga of Te Iwi Pakeha, ergo we should keep it pure, sans mélange, ruat caelum. ------------ For further info, please visit http://www.swi-prolog.org/ To unsubscribe, send a plaintext mail with "unsubscribe prolog <e-mail>" in its body to majordomo@... |
|
|
Re: Numerical sort, reverse sort, etc?On 18 Apr 2008, at 10:16 pm, Paulo Moura wrote: > What is so hard to understand that in my suggestion above you're > specifying not only keep/discard duplicates but also ascending/ > descending order and numeric/standard order (this last one as per > Jan initial post)? That is one of the things that is WRONG with that interface! As noted before (why do I have to say this again) - it is NOT true that the only alternatives are ascending/descending. Quite often we want to sort in ascending order of PART of the data and descending order on another part. Your interface prohibits this. Bad mark number 1. - it is NOT true that the only alternatives are keep/discard duplicates. There are literally INFINITELY many other things we might want to do with duplicates, and your interface prohibits all of them. Bad mark number 2. - it is NOT true that the only alternatives are term order/numeric order. Your interface *looks* as though it takes a general two-place predicate for sorting, but precisely BECAUSE it tangles (crippled versions of) the other aspects of sorting into the same argument, it cannot possibly work for other predicates. Bad mark number 3. Suppose, for example, that I wish to sort (Document_Id, Term_Frequency) pairs. Let us for the sake of argument ignore the fact that if I sort such things I either want to reject equal Ids as an error (this is an alternative your interface does not allow) or to add corresponding Frequencies (which is an alternative your interface does not allow). Let's just see what happens when I write id_freq_less((I1,_), (I2,_)) :- I1 < I2. How, precisely, does your interface know whether sort(Pairs, SortedPairs, id_freq_less) should or should not discard duplicates? I suspect that you intended the what-to-do argument to be purely symbolic, not to be a general predicate. If that is so, bad mark number 3 becomes "it deceives users into THINKING it is meant to work for other predicates, but it doesn't". In short, the only virtue of this proposal is compactness. Everything else about it is bad, and the badness is *caused* by the compactness. > Above the atoms <, @<, >, @>, =<, @=<, >=, @>= are used to specify > *three* different sorting options. When the third argument is a > functor you use what the programmer tells you to use. Ah, so it *is* expected to work with other predicates. Too bad that it can't. > > >> ... > > I agree that the compact notation above is not the best option to > cover all the requirements in Jan initial post. But it isn't a good option for anything except brevity.\ > Using a list of options as the last argument is a better, extendable > solution. It isn't extendable BY THE USER. The implementor gets to add new options, but *only* the implementor. sort/3 and sort/4 are better, because they are extendable by the user. ANY FIXED SET OF IMPLEMENTOR-CHOSEN OPTIONS WILL BE INADEQUATE. ------------ For further info, please visit http://www.swi-prolog.org/ To unsubscribe, send a plaintext mail with "unsubscribe prolog <e-mail>" in its body to majordomo@... |
| < Prev | 1 - 2 | Next > |
| Free Forum Powered by Nabble | Forum Help |