3 questions...

View: New views
14 Messages — Rating Filter:   Alert me  

Parent Message unknown 3 questions...

by Brian Myers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have two questions about Metakit for y'all.

First the background: I'm writing ETL software in Python. This means I
am interfacing with databases, but also storing serialized objects on
disk with ordering. I like both BSDDB and Metakit as a library for
object serialization because they both allow the data to be stored in
either pure RAM or on disk transparently. Metakit seems a lot easier to
use since it's more aware of python data types than BSDDB.

I have some questions about Metakit though. First, can it handle large
files? It's quite common to have to process tables that are 10+ GB in
size when dealing with data warehouses. Are there any limits on number
of rows, etc?

Second, is there support for arbitrary types? Currently the only type I
need to handle that's not supported by either Metakit or BSDDB is the
decimal data type. Databases often have decimal, Numeric, or money data
types are common in BI data, and I don't really want to convert them to
floats, especially if I'm doing arithmetic on them. Are there any plans
to add the decimal data type to Metakit?

Also on the topic of data types, how does Metakit deal with unicode
strings? I don't remember seeing types for those.

In BSDDB, everything has to be serialized into binary strings. This
means atypical data types like decimal and unicode strings must be
serialize into binary strings that will compare properly with a low
level memcmp operation. I think I can do that for the decimal type, but
I don't know at all about unicode strings. If Metakit can handle these
types and create large files, it will definitely be my choice.

Brian
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Kelley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

1) On 32 bit systems, the file size limit is 2GB.  It is more on
64bit.  If you are using large tables, use blocked tables otherwise
you will die a slow horrible death.  If you have indexed keys use
blocked, ordered tables.

_vw = st.getas("blocked[_B[id:I,data:S]]")
st = _vw.blocked()#.ordered() if an ordered table

2) decimal types

No, metakit doesn't have support for decimal types, I have had some
sucess using two integer columns though to represent decimal numbers.
Sorting requires a sort on two columns for instance.

Since metakit's integers are adaptive, this doesn't take up any extra space.

The python interface at least doesn't support unicode verty well.

Hope this helps.

Brian
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Myers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If I use blocked views, does each sub block constitute a separate file?
It doesn't look like you can control the number of blocks, so will
Metakit automatically split a large view into enough blocks that each
is less than 2GB.

I had thought of using two integer columns to represent a decimal. By
adaptive, do you mean Metakit handles arbitrarily large integers? That
would be  awesome if it did.

I can live without unicode for now, but I hope there's plans to include
it in the future. BSDDB doesn't really support unicode either. Does the
C interface support it?

Brian

On Aug 7, 2005, at 9:12 AM, Brian Kelley wrote:

> 1) On 32 bit systems, the file size limit is 2GB.  It is more on
> 64bit.  If you are using large tables, use blocked tables otherwise
> you will die a slow horrible death.  If you have indexed keys use
> blocked, ordered tables.
>
> _vw = st.getas("blocked[_B[id:I,data:S]]")
> st = _vw.blocked()#.ordered() if an ordered table
>
> 2) decimal types
>
> No, metakit doesn't have support for decimal types, I have had some
> sucess using two integer columns though to represent decimal numbers.
> Sorting requires a sort on two columns for instance.
>
> Since metakit's integers are adaptive, this doesn't take up any extra
> space.
>
> The python interface at least doesn't support unicode verty well.
>
> Hope this helps.
>
> Brian
>
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Kelley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 8/7/05, Brian Myers <tarkawebfoot@...> wrote:
> If I use blocked views, does each sub block constitute a separate file?
No, metakit does all the blocks internally.  The entire file can only
be 2GB total.
 
> I had thought of using two integer columns to represent a decimal. By
> adaptive, do you mean Metakit handles arbitrarily large integers? That
> would be  awesome if it did.

It's the other way around, if your integers only take 4 bits to
represent the largest one, then there are only 4bits in the column
width.

> I can live without unicode for now, but I hope there's plans to include
> it in the future. BSDDB doesn't really support unicode either. Does the
> C interface support it?

The only way I know for doing unicode is
vw2 = st.getas("t[a:b]")
p = u'hi'
vw2.append(a=repr(p))
metakit.dump(vw2)
 a    
 -----
 u'hi'
 -----

unfortunately, you need to evaluate the unicode string on the way out as well.

Brian
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Michael Schlenker :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Brian Kelley wrote:

> On 8/7/05, Brian Myers <tarkawebfoot@...> wrote:
>
>>I can live without unicode for now, but I hope there's plans to include
>>it in the future. BSDDB doesn't really support unicode either. Does the
>>C interface support it?
>
> The only way I know for doing unicode is
> vw2 = st.getas("t[a:b]")
> p = u'hi'
> vw2.append(a=repr(p))
> metakit.dump(vw2)
>  a    
>  -----
>  u'hi'
>  -----
>
> unfortunately, you need to evaluate the unicode string on the way out as well.

At least the Tcl binding has no problems with using unicode with
Metakit, if one uses the binary type for fields (s works too for Tcls
internal utf-8 variant which recodes \0 with hex C080).

Don't know how intelligent pythons unicode handling is.

Michael


_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Kelley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The metakit wrappers were created far before python supported unicode.
 I don't think that having 'b' store unicode would be that difficult
to implement.

Brian
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Myers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Crap, well, it looks like unicode might be more accessible in Metakit,
but because of the 2GB file limit, I'd better stick with BSDDB. I can
handle decimals in the same way in both as well.

Thanks for the input. Are there plans to increase the file size limit
soon?

Brian

On Aug 8, 2005, at 8:12 AM, Brian Kelley wrote:

> The metakit wrappers were created far before python supported unicode.
>  I don't think that having 'b' store unicode would be that difficult
> to implement.
>
> Brian
>
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Kelley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The file size limit is dependent on your OS.  If you have a 64 bit
system, then you have more than 2GB files.

You might want to take a look at pytables which is very metakit-like
in some respects

http://pytables.sourceforge.net/html/WelcomePage.html

I'm not sure how well it maps to sql though.

Brian

On 8/9/05, Brian Myers <tarkawebfoot@...> wrote:

> Crap, well, it looks like unicode might be more accessible in Metakit,
> but because of the 2GB file limit, I'd better stick with BSDDB. I can
> handle decimals in the same way in both as well.
>
> Thanks for the input. Are there plans to increase the file size limit
> soon?
>
> Brian
>
> On Aug 8, 2005, at 8:12 AM, Brian Kelley wrote:
>
> > The metakit wrappers were created far before python supported unicode.
> >  I don't think that having 'b' store unicode would be that difficult
> > to implement.
> >
> > Brian
> >
>
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Myers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yeah, I checked pytables out and it really is a good solution, but the
parts I don't like are the complexity of the install, and no explicit
division of memory and disk based files. The latter is not so
important, but the former is a problem.

As far as mapping to SQL, I handle all the type conversion since I'm
making database tables and flat files transparent. I would only be
using a small subset of the functionality in pytables, which is OK, but
I've done the install on Windows and it's really difficult because of
all the dependencies. It would be easier on Unix though.

I don't know, I'll have to think about it some more. I sure wish there
were better object serialization solutions for Python. Actually I wish
Metakit just supported large files on 32 bit platforms. There simply
aren't enough 64 bit platform to justify requiring users to use one if
they have large tables.

Brian

On Aug 9, 2005, at 3:17 AM, Brian Kelley wrote:

> The file size limit is dependent on your OS.  If you have a 64 bit
> system, then you have more than 2GB files.
>
> You might want to take a look at pytables which is very metakit-like
> in some respects
>
> http://pytables.sourceforge.net/html/WelcomePage.html
>
> I'm not sure how well it maps to sql though.
>
> Brian
>
> On 8/9/05, Brian Myers <tarkawebfoot@...> wrote:
>> Crap, well, it looks like unicode might be more accessible in Metakit,
>> but because of the 2GB file limit, I'd better stick with BSDDB. I can
>> handle decimals in the same way in both as well.
>>
>> Thanks for the input. Are there plans to increase the file size limit
>> soon?
>>
>> Brian
>>
>> On Aug 8, 2005, at 8:12 AM, Brian Kelley wrote:
>>
>>> The metakit wrappers were created far before python supported
>>> unicode.
>>>  I don't think that having 'b' store unicode would be that difficult
>>> to implement.
>>>
>>> Brian
>>>
>>
>
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Kelley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is probably a non solution, but I once had a wrapper for multi-file tables.

I.e.

vw = CombinedView(viewname, file1, file2, file3)

and had python do all the internals, it was ok for read only
structures but when tables got altered, ouch!

It was essentially a "block" that used several metakit databases.  I
also kept external index tags for quicker indexing.  I.e. I had
metadata for ordered columns to quickly see which file had the
specified column.

One really interesting thing about metakit is that you can do joins on
databases that are stored in seperate files!  This won't help you
much, since you still have a total 2GB limit I think.

Brian
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Myers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Seems to me like it would be easier to just recompile Metakit with
large file support enabled. But maybe Metakit is using internal
structures that would be difficult to alter.

I'd like to know if that's the case. It would mean it's unlikely
Metakit will support large files anytime soon. On the other hand, if
large files were to materialize soon, it would be worth while just to
wait.

Brian

On Aug 9, 2005, at 11:30 AM, Brian Kelley wrote:

> This is probably a non solution, but I once had a wrapper for
> multi-file tables.
>
> I.e.
>
> vw = CombinedView(viewname, file1, file2, file3)
>
> and had python do all the internals, it was ok for read only
> structures but when tables got altered, ouch!
>
> It was essentially a "block" that used several metakit databases.  I
> also kept external index tags for quicker indexing.  I.e. I had
> metadata for ordered columns to quickly see which file had the
> specified column.
>
> One really interesting thing about metakit is that you can do joins on
> databases that are stored in seperate files!  This won't help you
> much, since you still have a total 2GB limit I think.
>
> Brian
>
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by Brian Kelley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jean-Claude knows more, but I think the problem is memory mapping of
files is OS dependent.

Brian

On 8/9/05, Brian Myers <tarkawebfoot@...> wrote:

> Seems to me like it would be easier to just recompile Metakit with
> large file support enabled. But maybe Metakit is using internal
> structures that would be difficult to alter.
>
> I'd like to know if that's the case. It would mean it's unlikely
> Metakit will support large files anytime soon. On the other hand, if
> large files were to materialize soon, it would be worth while just to
> wait.
>
> Brian
>
> On Aug 9, 2005, at 11:30 AM, Brian Kelley wrote:
>
> > This is probably a non solution, but I once had a wrapper for
> > multi-file tables.
> >
> > I.e.
> >
> > vw = CombinedView(viewname, file1, file2, file3)
> >
> > and had python do all the internals, it was ok for read only
> > structures but when tables got altered, ouch!
> >
> > It was essentially a "block" that used several metakit databases.  I
> > also kept external index tags for quicker indexing.  I.e. I had
> > metadata for ordered columns to quickly see which file had the
> > specified column.
> >
> > One really interesting thing about metakit is that you can do joins on
> > databases that are stored in seperate files!  This won't help you
> > much, since you still have a total 2GB limit I think.
> >
> > Brian
> >
>
_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

Re: 3 questions...

by jcw :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Aug 9, 2005, at 20:56, Brian Myers wrote:

> Seems to me like it would be easier to just recompile Metakit with  
> large file support enabled. But maybe Metakit is using internal  
> structures that would be difficult to alter.
>
> I'd like to know if that's the case. It would mean it's unlikely  
> Metakit will support large files anytime soon. On the other hand,  
> if large files were to materialize soon, it would be worth while  
> just to wait.

Short answer: forget >2Gb files in MK 2.4, it can't be done.

On 32-bit machines, MK does not support >2Gb files because of an  
assumption in lots of places that the size must fit in an int.  A  
while back, I went through the source and concluded that there are a  
couple of hairy issues preventing an easy fix.

Even on 64-bit machines, MK would need changes to support >2Gb files  
- some in the API but also a change in the file format (there are  
precisely three 32-bit ints in various header/footer fields which  
need to be extended).

I've solved the file format design issues, and have experimental non-
MK code which confirms it works and remains backwards compatible.  
But that is of no use to anyone today.

> On Aug 9, 2005, at 11:30 AM, Brian Kelley wrote:
[...]
>> One really interesting thing about metakit is that you can do  
>> joins on
>> databases that are stored in seperate files!  This won't help you
>> much, since you still have a total 2GB limit I think.

The 2 Gb limit is due to exhaustion of 32-bit memory address space.  
Opening multiple files won't help you get around the fact that only 2  
Gb can be mapped (i.e. open) at any one time.

The *only* option I see with a pure MK solution today is to use 64-
bit env, open multiple MK files, each under 2 Gb, and join/concat  
them as you describe.

For completeness: a workaround is to try and stay under 2 Gb, by  
keeping the larger bits of the data in separate (non-MK) files and  
use MK only to manage that data, not contain it.  That is not always  
a meaningful option - it depends on the application needs.

-jcw

_____________________________________________
Metakit mailing list  -  Metakit@...
http://www.equi4.com/mailman/listinfo/metakit

mk4py binaries for python 2.5?

by Geoffrey Zhu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Does anyone know if there is mk4py binaries from python 2.5 on Windows?

Thanks,
Geoffrey

 


_______________________________________________________


The information in this email or in any file attached hereto is
intended only for the personal and confidential use of the individual
or entity to which it is addressed and may contain information that is
proprietary and confidential. If you are not the intended recipient of
this message you are hereby notified that any review, dissemination,
distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product. Email transmission cannot be guaranteed to be secure or error-free.
1�Z�+fj)b� b����"�ꮋ�(�m������.�o�j)fj��b��?��Z�+
LightInTheBox - Buy quality products at wholesale price