|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
casereader numberingI've started rewriting EXAMINE to make it less crappy.
One thing that would make it easier would be a function of the form struct casereader * append_case_numbers (struct casereader *cr); which returns a new casereader which is identical to CR except that it has one extra column which contains the ordinal number of each case. For example if CR contains x y z . a b c d the return will contain x y 1 z . 2 a b 3 c d 4 Such a function would also simply the implementation of RANK. But it's not obvious to me how to create such a function. Is this a feasible thing to do? J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingOn Thu, Jul 17, 2008 at 07:31:30PM +0800, John Darrington wrote:
> I've started rewriting EXAMINE to make it less crappy. > > One thing that would make it easier would be a function of the form > > struct casereader * append_case_numbers (struct casereader *cr); > > which returns a new casereader which is identical to CR except that it > has one extra column which contains the ordinal number of each case. > > For example if CR contains > > x y > z . > a b > c d > > the return will contain > > x y 1 > z . 2 > a b 3 > c d 4 > > > Such a function would also simply the implementation of RANK... This would be valuable for other procedures, too. Anything that could use permutations could make use of such a function. Also, it would be useful for bootstrap and jacknife tests. -Jason _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingJohn Darrington <john@...> writes:
> One thing that would make it easier would be a function of the form > > struct casereader * append_case_numbers (struct casereader *cr); > > which returns a new casereader which is identical to CR except that it > has one extra column which contains the ordinal number of each case. That shouldn't be hard. I'll try to whip up something like that over the weekend. -- Ben Pfaff http://benpfaff.org _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingOn Thu, Jul 17, 2008 at 03:18:33PM -0400, Jason Stover wrote:
This would be valuable for other procedures, too. Anything that could use permutations could make use of such a function. Also, it would be useful for bootstrap and jacknife tests. How would numbering the cases will help for those applications? I would have thought that some kind of random/low discrepency sequence iterator would be required. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingOn Tue, Jul 22, 2008 at 07:44:47PM +0800, John Darrington wrote:
> On Thu, Jul 17, 2008 at 03:18:33PM -0400, Jason Stover wrote: > > This would be valuable for other procedures, too. Anything that > could use permutations could make use of such a function. Also, it > would be useful for bootstrap and jacknife tests. > > > How would numbering the cases will help for those applications? I > would have thought that some kind of random/low discrepency sequence > iterator would be required. I hadn't thought it through entirely when I typed that, but permutation tests can be done by generating a random permutation p(1), p(2), ..., p(n); then instead of computing a statistic from the data T(X_1, X_2, ..., X_n), we compute T(X_{p(1)}, X_{p(2)},...,X_{p(n)}). Do that for many random permutations and check how far away the original statistic is from its parameter under the null hypothesis. "How far" is measured by looking at distances of the statistics from the permuted data. In the past, I usually did this in C by generating a random permutation, then doing something like: for (i = 0; i < n; i++) { y[i] = x[p[i]]; } t[i] = T(y); (That bit about computing T(y) is usually more complicated.) So that's what I was thinking. Having case numbers may not be useful for permutation tests if we aren't going to just copy data into an array, but I thought it might be, even if I couldn't see the details in advance. -Jason _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingJohn Darrington <john@...> writes:
> I've started rewriting EXAMINE to make it less crappy. > > One thing that would make it easier would be a function of the form > > struct casereader * append_case_numbers (struct casereader *cr); > > which returns a new casereader which is identical to CR except that it > has one extra column which contains the ordinal number of each case. Here is a patch to try. I haven't had a chance to test it, but I think that it should work. If it works for you I'll push it to the repository. commit 003c33f013e762fc162029f37c3f233fd29645d8 Author: Ben Pfaff <blp@...> Date: Tue Jul 22 22:11:46 2008 -0700 New function for adding a numbering column to a casereader. diff --git a/src/data/casereader-translator.c b/src/data/casereader-translator.c index b857b5b..229dac2 100644 --- a/src/data/casereader-translator.c +++ b/src/data/casereader-translator.c @@ -110,3 +110,58 @@ static const struct casereader_class casereader_translator_class = NULL, NULL, }; + +struct casereader_arithmetic_sequence + { + int value_ofs; + double first; + double increment; + casenumber n; + }; + +static void cas_translate (struct ccase *input, struct ccase *output, + void *aux); +static bool cas_destroy (void *aux); + +/* Creates and returns a new casereader whose cases are produced + by reading from SUBREADER and appending an additional value, + which takes the value FIRST in the first case, FIRST + + INCREMENT in the second case, FIRST + INCREMENT * 2 in the + third case, and so on. + + After this function is called, SUBREADER must not ever again + be referenced directly. It will be destroyed automatically + when the translating casereader is destroyed. */ +struct casereader * +casereader_create_arithmetic_sequence (struct casereader *subreader, + double first, double increment) +{ + /* This could be implemented with a great deal more efficiency + and generality. However, this implementation is easy. */ + struct casereader_arithmetic_sequence *cas = xmalloc (sizeof *cas); + cas->value_ofs = casereader_get_value_cnt (subreader); + cas->first = first; + cas->increment = increment; + cas->n = 0; + return casereader_create_translator (subreader, cas->value_ofs + 1, + cas_translate, cas_destroy, cas); +} + +static void +cas_translate (struct ccase *input, struct ccase *output, void *cas_) +{ + struct casereader_arithmetic_sequence *cas = cas_; + case_nullify (output); + case_move (output, input); + case_resize (output, cas->value_ofs + 1); + case_data_rw_idx (output, cas->value_ofs)->f + = cas->first + cas->increment * cas->n++; +} + +static bool +cas_destroy (void *cas_) +{ + struct casereader_arithmetic_sequence *cas = cas_; + free (cas); + return true; +} diff --git a/src/data/casereader.h b/src/data/casereader.h index 6d719c6..ba65cb1 100644 --- a/src/data/casereader.h +++ b/src/data/casereader.h @@ -112,4 +112,8 @@ casereader_create_translator (struct casereader *, size_t output_value_cnt, bool (*destroy) (void *aux), void *aux); +struct casereader * +casereader_create_arithmetic_sequence (struct casereader *, + double first, double increment); + #endif /* data/casereader.h */ -- "Note that nobody reads every post in linux-kernel. In fact, nobody who expects to have time left over to actually do any real kernel work will read even half. Except Alan Cox, but he's actually not human, but about a thousand gnomes working in under-ground caves in Swansea." --Linus _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingOn Tue, Jul 22, 2008 at 10:13:31PM -0700, Ben Pfaff wrote:
John Darrington <john@...> writes: > One thing that would make it easier would be a function of the form > > struct casereader * append_case_numbers (struct casereader *cr); > > which returns a new casereader which is identical to CR except that it > has one extra column which contains the ordinal number of each case. Here is a patch to try. I haven't had a chance to test it, but I think that it should work. If it works for you I'll push it to the repository. On initial tests, it appears to work fine, except that I would have expected casereader_get_value_cnt on the new casereader to return 1 more than that of the old one. But that's not what I am experiencing. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingJohn Darrington <john@...> writes:
> On Tue, Jul 22, 2008 at 10:13:31PM -0700, Ben Pfaff wrote: > John Darrington <john@...> writes: > > > One thing that would make it easier would be a function of the form > > > > struct casereader * append_case_numbers (struct casereader *cr); > > > > which returns a new casereader which is identical to CR except that it > > has one extra column which contains the ordinal number of each case. > > Here is a patch to try. I haven't had a chance to test it, but I > think that it should work. If it works for you I'll push it to > the repository. > > On initial tests, it appears to work fine, except that I would have > expected casereader_get_value_cnt on the new casereader to return 1 more > than that of the old one. But that's not what I am experiencing. Er, I would expect that too. On inspection, the code looks correct; I don't see how casereader_get_value_cnt() could return value different from that. Huh. -- "Then, I came to my senses, and slunk away, hoping no one overheard my thinking." --Steve McAndrewSmith in the Monastery _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingOn Wed, Jul 23, 2008 at 07:24:21AM -0700, Ben Pfaff wrote:
> On initial tests, it appears to work fine, except that I would have > expected casereader_get_value_cnt on the new casereader to return 1 more > than that of the old one. But that's not what I am experiencing. Er, I would expect that too. On inspection, the code looks correct; I don't see how casereader_get_value_cnt() could return value different from that. Huh. Maybe I made a mistake then. I'll have closer look later. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingOn Thu, Jul 24, 2008 at 07:06:05AM +0800, John Darrington wrote:
On Wed, Jul 23, 2008 at 07:24:21AM -0700, Ben Pfaff wrote: > On initial tests, it appears to work fine, except that I would have > expected casereader_get_value_cnt on the new casereader to return 1 more > than that of the old one. But that's not what I am experiencing. Er, I would expect that too. On inspection, the code looks correct; I don't see how casereader_get_value_cnt() could return value different from that. Huh. Maybe I made a mistake then. I'll have closer look later. On closer inspection it turns out that the culprit is the function sort_execute : struct casereader * sort_execute (struct casereader *input, struct case_ordering *ordering); Instead of returning a casereader the same width as INPUT, it returns one with the width associated with ORDERING. I don't see any valid reason to for a case_ordering to be aware of the value_cnt, so I'm proposing this patch which seems to fix this problem. diff --git a/src/data/case-ordering.c b/src/data/case-ordering.c index 7b3948c..c4a716e 100644 --- a/src/data/case-ordering.c +++ b/src/data/case-ordering.c @@ -37,8 +37,6 @@ struct sort_key /* A set of criteria for ordering cases. */ struct case_ordering { - size_t value_cnt; /* Number of `union value's per case. */ - /* Sort keys. */ struct sort_key *keys; size_t key_cnt; @@ -49,10 +47,9 @@ struct case_ordering contains no variables, so that all cases will compare as equal. */ struct case_ordering * -case_ordering_create (const struct dictionary *dict) +case_ordering_create (void) { struct case_ordering *co = xmalloc (sizeof *co); - co->value_cnt = dict_get_next_value_idx (dict); co->keys = NULL; co->key_cnt = 0; return co; @@ -63,7 +60,6 @@ struct case_ordering * case_ordering_clone (const struct case_ordering *orig) { struct case_ordering *co = xmalloc (sizeof *co); - co->value_cnt = orig->value_cnt; co->keys = xmemdup (orig->keys, orig->key_cnt * sizeof *orig->keys); co->key_cnt = orig->key_cnt; return co; @@ -80,15 +76,6 @@ case_ordering_destroy (struct case_ordering *co) } } -/* Returns the number of `union value's in the cases that case - ordering CO compares (taken from the dictionary used to - construct it). */ -size_t -case_ordering_get_value_cnt (const struct case_ordering *co) -{ - return co->value_cnt; -} - /* Compares cases A and B given case ordering CO and returns a strcmp()-type result. */ int diff --git a/src/data/case-ordering.h b/src/data/case-ordering.h index 026cd89..f49f265 100644 --- a/src/data/case-ordering.h +++ b/src/data/case-ordering.h @@ -32,7 +32,7 @@ enum sort_direction }; /* Creation and destruction. */ -struct case_ordering *case_ordering_create (const struct dictionary *); +struct case_ordering *case_ordering_create (void); struct case_ordering *case_ordering_clone (const struct case_ordering *); void case_ordering_destroy (struct case_ordering *); diff --git a/src/language/stats/rank.q b/src/language/stats/rank.q index 5bc88c4..cb63949 100644 --- a/src/language/stats/rank.q +++ b/src/language/stats/rank.q @@ -261,7 +261,7 @@ rank_cmd (struct dataset *ds, const struct case_ordering *sc, /* Sort this split group by the BY variables as primary keys and the rank variable as secondary key. */ - ordering = case_ordering_create (d); + ordering = case_ordering_create (); for (j = 0; j < n_group_vars; j++) case_ordering_add_var (ordering, group_vars[j], SRT_ASCEND); case_ordering_add_var (ordering, @@ -778,7 +778,7 @@ cmd_rank (struct lexer *lexer, struct dataset *ds) /* Put the active file back in its original order. Delete our sort key, which we don't need anymore. */ { - struct case_ordering *ordering = case_ordering_create (dataset_dict (ds)); + struct case_ordering *ordering = case_ordering_create (); struct casereader *sorted; case_ordering_add_var (ordering, order, SRT_ASCEND); /* FIXME: loses error conditions. */ diff --git a/src/language/stats/sort-criteria.c b/src/language/stats/sort-criteria.c index c84f71d..fd8c7c5 100644 --- a/src/language/stats/sort-criteria.c +++ b/src/language/stats/sort-criteria.c @@ -39,7 +39,7 @@ struct case_ordering * parse_case_ordering (struct lexer *lexer, const struct dictionary *dict, bool *saw_direction) { - struct case_ordering *ordering = case_ordering_create (dict); + struct case_ordering *ordering = case_ordering_create (); const struct variable **vars = NULL; size_t var_cnt = 0; diff --git a/src/math/merge.c b/src/math/merge.c index d56a78c..4fc7c8d 100644 --- a/src/math/merge.c +++ b/src/math/merge.c @@ -44,16 +44,18 @@ struct merge struct case_ordering *ordering; struct merge_input inputs[MAX_MERGE_ORDER]; size_t input_cnt; + size_t value_cnt; }; static void do_merge (struct merge *m); struct merge * -merge_create (const struct case_ordering *ordering) +merge_create (const struct case_ordering *ordering, size_t value_cnt) { struct merge *m = xmalloc (sizeof *m); m->ordering = case_ordering_clone (ordering); m->input_cnt = 0; + m->value_cnt = value_cnt; return m; } @@ -95,8 +97,7 @@ merge_make_reader (struct merge *m) } else if (m->input_cnt == 0) { - size_t value_cnt = case_ordering_get_value_cnt (m->ordering); - struct casewriter *writer = mem_writer_create (value_cnt); + struct casewriter *writer = mem_writer_create (m->value_cnt); r = casewriter_make_reader (writer); } else @@ -129,7 +130,7 @@ do_merge (struct merge *m) assert (m->input_cnt > 1); - w = tmpfile_writer_create (case_ordering_get_value_cnt (m->ordering)); + w = tmpfile_writer_create (m->value_cnt); for (i = 0; i < m->input_cnt; i++) taint_propagate (casereader_get_taint (m->inputs[i].reader), casewriter_get_taint (w)); diff --git a/src/math/merge.h b/src/math/merge.h index c9c9c48..18322e8 100644 --- a/src/math/merge.h +++ b/src/math/merge.h @@ -18,11 +18,12 @@ #define MATH_MERGE_H 1 #include <stdbool.h> +#include <stddef.h> struct case_ordering; struct casereader; -struct merge *merge_create (const struct case_ordering *); +struct merge *merge_create (const struct case_ordering *, size_t); void merge_destroy (struct merge *); void merge_append (struct merge *, struct casereader *); struct casereader *merge_make_reader (struct merge *); diff --git a/src/math/sort.c b/src/math/sort.c index e03ef57..10b8a12 100644 --- a/src/math/sort.c +++ b/src/math/sort.c @@ -41,6 +41,7 @@ int max_buffers = INT_MAX; struct sort_writer { + size_t value_cnt; struct case_ordering *ordering; struct merge *merge; struct pqueue *pqueue; @@ -52,7 +53,7 @@ struct sort_writer static struct casewriter_class sort_casewriter_class; -static struct pqueue *pqueue_create (const struct case_ordering *); +static struct pqueue *pqueue_create (const struct case_ordering *, size_t); static void pqueue_destroy (struct pqueue *); static bool pqueue_is_full (const struct pqueue *); static bool pqueue_is_empty (const struct pqueue *); @@ -62,15 +63,15 @@ static void pqueue_pop (struct pqueue *, struct ccase *, casenumber *); static void output_record (struct sort_writer *); struct casewriter * -sort_create_writer (struct case_ordering *ordering) +sort_create_writer (struct case_ordering *ordering, size_t value_cnt) { - size_t value_cnt = case_ordering_get_value_cnt (ordering); struct sort_writer *sort; sort = xmalloc (sizeof *sort); + sort->value_cnt = value_cnt; sort->ordering = case_ordering_clone (ordering); - sort->merge = merge_create (ordering); - sort->pqueue = pqueue_create (ordering); + sort->merge = merge_create (ordering, value_cnt); + sort->pqueue = pqueue_create (ordering, value_cnt); sort->run = NULL; sort->run_id = 0; case_nullify (&sort->run_end); @@ -118,8 +119,7 @@ sort_casewriter_convert_to_reader (struct casewriter *writer, void *sort_) if (sort->run == NULL && sort->run_id == 0) { /* In-core sort. */ - sort->run = mem_writer_create (case_ordering_get_value_cnt ( - sort->ordering)); + sort->run = mem_writer_create (casewriter_get_value_cnt (writer)); sort->run_id = 1; } while (!pqueue_is_empty (sort->pqueue)) @@ -151,8 +151,7 @@ output_record (struct sort_writer *sort) } if (sort->run == NULL) { - sort->run = tmpfile_writer_create (case_ordering_get_value_cnt ( - sort->ordering)); + sort->run = tmpfile_writer_create (sort->value_cnt); sort->run_id = min_run_id; } @@ -176,7 +175,8 @@ static struct casewriter_class sort_casewriter_class = struct casereader * sort_execute (struct casereader *input, struct case_ordering *ordering) { - struct casewriter *output = sort_create_writer (ordering); + struct casewriter *output = + sort_create_writer (ordering, casereader_get_value_cnt (input)); casereader_transfer (input, output); return casewriter_make_reader (output); } @@ -201,14 +201,14 @@ static int compare_pqueue_records_minheap (const void *a, const void *b, const void *pq_); static struct pqueue * -pqueue_create (const struct case_ordering *ordering) +pqueue_create (const struct case_ordering *ordering, size_t value_cnt) { struct pqueue *pq; pq = xmalloc (sizeof *pq); pq->ordering = case_ordering_clone (ordering); pq->record_cap - = settings_get_workspace_cases (case_ordering_get_value_cnt (ordering)); + = settings_get_workspace_cases (value_cnt); if (pq->record_cap > max_buffers) pq->record_cap = max_buffers; else if (pq->record_cap < min_buffers) diff --git a/src/math/sort.h b/src/math/sort.h index 7f7b2f8..ea2c16b 100644 --- a/src/math/sort.h +++ b/src/math/sort.h @@ -25,7 +25,7 @@ struct case_ordering; extern int min_buffers ; extern int max_buffers ; -struct casewriter *sort_create_writer (struct case_ordering *); +struct casewriter *sort_create_writer (struct case_ordering *, size_t value_cnt); struct casereader *sort_execute (struct casereader *, struct case_ordering *); #endif /* math/sort.h */ -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingJohn Darrington <john@...> writes:
> I don't see any valid reason to for a case_ordering to be aware of the > value_cnt, so I'm proposing this patch which seems to fix this > problem. I know that there was a reason at one point, but I don't recall what it is. Please give me a few days to investigate. -- "Mon peu de succès près des femmes est toujours venu de les trop aimer." --Jean-Jacques Rousseau _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: casereader numberingJohn Darrington <john@...> writes:
> I don't see any valid reason to for a case_ordering to be aware of the > value_cnt, so I'm proposing this patch which seems to fix this > problem. Upon examination, I withdraw my objections. Clearly this is an improvement. Please check it in. -- "In this world that Hugh Heffner had made, he alone seemed forever bunnyless." --John D. MacDonald _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
| Free Forum Powered by Nabble | Forum Help |