Question on scalability

View: New views
2 Messages — Rating Filter:   Alert me  

Question on scalability

by Paul Kling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am currently working on a project that has the need to store 100
million images every year. We tend to keep the images for around 18
months. So at any one time we will have about 150 million images in the
repository. The question I have is does this sound reasonable to store
this quantity of images in JackRabbit or does this sound scary? I worry
about retrieval of the items.

 

I also noticed there is a clustering feature but the documentation
seemed to point you to using the DB for file storage. We have been down
the route of letting the DB store file data in the past and it has never
turned out to be something that worked well and I don't think I can get
people convinced of again.

 

We were planning on building something but we are getting a little push
back on why we don't use a JCR. Does anyone have a write up on something
of this scale that they could share?

 

A little more information on the images. Each node would have some sort
of id (UUID) that we use to find the images. Each node (with all
sub-nodes may be using wrong vocabulary here) would have 5-7 images and
corresponding data.

 

Thank you for any help and thank you for your time

Paul


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

Re: Question on scalability

by Marcel Reutegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Paul,

Paul Kling wrote:
> I am currently working on a project that has the need to store 100
> million images every year. We tend to keep the images for around 18
> months. So at any one time we will have about 150 million images in the
> repository. The question I have is does this sound reasonable to store
> this quantity of images in JackRabbit or does this sound scary? I worry
> about retrieval of the items.

it doesn't sound scary to me, though I have to admit that I never worked with a
repository of that size. I've seen repositories that contain around 10 million
nodes without any issues.

> I also noticed there is a clustering feature but the documentation
> seemed to point you to using the DB for file storage. We have been down
> the route of letting the DB store file data in the past and it has never
> turned out to be something that worked well and I don't think I can get
> people convinced of again.

jackrabbit does not store the complete file in the database, it stores the
contents of a file in a data store [1]. the database will only contain a
reference into the data store.

[1] http://wiki.apache.org/jackrabbit/DataStore

regards
  marcel