JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

View: New views
13 Messages — Rating Filter:   Alert me  

JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by sbarriba :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

As output from the concurrency investigation we've dug into the caching and
contention within JackRabbit. So far our understanding is:

 

..        PersistenceManager Cache:

o   The "bundleCacheSize" determines how many nodes the PersistenceManager
will cache. As this determines the lifetime of the references to the
temporary BLOB cache if its not large enough BLOBs will be continually read
from the database (if using externalBlobs=false).

o   Configurable in <PersistenceManager> XML block

o   Default size 8MB

o   This cache is shared by all sessions.

o   Synchronised access using the ISMLocking stategy e.g. Default or
FineGrained

..        Session ItemManager Cache:

o   Items are cached from the underlying persistence manager on a per
session basis.

o   Limit cannot be set.

o   Uses a ReferenceMap which can be emptied by the JVM GC as required

o   Synchronised access using the itemCache object

..        CacheManager Cache:

o   Limit can only be set programmatically via the Workspace cacheManager

o   http://wiki.apache.org/jackrabbit/CacheManager

o   Defaults to 16MB

o   Its not clear as yet how the CacheManager relates, if at all, to the
ItemManager cache

 

2 questions:

..        What is the purpose of the CacheManager and which caches does it
actually control?

..        For example, for a workspace with 100,000 nodes what is an
appropriate setting for the Cache Manager?

 

We were originally using a PooledSessionInView pattern for our application
but we've now found that this means we see synchronisation on the
BundleCache as we do not benefit from the Session Cache. It seems the Java
GC cleans up the ItemManager cache fairly aggressively.

Using a GlobalSessionInView pattern (sharing a single session across
threads) also doesn't really help as it moves the contention to Session
ItemManager instead of the Persistence Manager.

..which implies that a SharedSession per X Views is probably the best pattern
e.g. a limited number of threads sharing a single session to stripe the
contention.

 

It would seem like JackRabbit would benefit from some consolidation of
caching to use a library such as ehcache etc which provides more
fine-grained and consistent control over the various caching layers and
configuration mechanisms.

 

All comments welcome.

Regards,

Shaun

 

 

 

 

 


Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Marcel Reutegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

sbarriba wrote:

> ..        PersistenceManager Cache:
>
> o   The "bundleCacheSize" determines how many nodes the PersistenceManager
> will cache. As this determines the lifetime of the references to the
> temporary BLOB cache if its not large enough BLOBs will be continually read
> from the database (if using externalBlobs=false).
>
> o   Configurable in <PersistenceManager> XML block
>
> o   Default size 8MB
>
> o   This cache is shared by all sessions.
>
> o   Synchronised access using the ISMLocking stategy e.g. Default or
> FineGrained

correct, but there's additional synchronization in the persistence manager using
conventional synchronized methods. e.g. see
AbstractBundlePersistenceManager.load(NodeId)

> ..        Session ItemManager Cache:
>
> o   Items are cached from the underlying persistence manager on a per
> session basis.
>
> o   Limit cannot be set.

not sure, but I think this cache is also managed (at least partially) by the
CacheManager.

> o   Uses a ReferenceMap which can be emptied by the JVM GC as required

that's the 'other part' that manages the cache ;)

items that are still referenced in the application will force the reference map
to keep the respective ItemState instances (using weak references).

> o   Synchronised access using the itemCache object
>
> ..        CacheManager Cache:
>
> o   Limit can only be set programmatically via the Workspace cacheManager
>
> o   http://wiki.apache.org/jackrabbit/CacheManager
>
> o   Defaults to 16MB
>
> o   Its not clear as yet how the CacheManager relates, if at all, to the
> ItemManager cache

this only happens indirectly. see above.

> 2 questions:
>
> ..        What is the purpose of the CacheManager and which caches does it
> actually control?

It controls *all* the caches that contain ItemState instances.

> ..        For example, for a workspace with 100,000 nodes what is an
> appropriate setting for the Cache Manager?

I guess that depends on your JVM heap settings and the usage pattern. if you
have a lot of random reads over nearly all 100k nodes and performance is
critical you may consider caching all of them. have a look a
ItemState.calculateMemoryFootprint() for a formula on how the memory consumption
is calculated.

regards
  marcel


RE: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by sbarriba :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Marcel et al,
3 suggestions come to mind from this (perhaps for the develop list):

1) the ItemManager should be using Soft References rather than Weak
References otherwise a PooledSessionInView pattern is not really effective
as, pooled (but unused) sessions have their caches cleared immediately by
the GC (using weak references).

2) the CacheManager config needs to be externalised so it can be changed
within the XML config, not programmatically.

3) its worth considering using a caching library (e.g. ehcahe) for the
BundleCache at least? As a case study we've got multi-GB of binaries in
BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours after
each restart filling /tmp. It would be great to use a caching library which
supported a persistent cache etc. Obviously externalBlobs helps here.

Regards,
Shaun

-----Original Message-----
From: Marcel Reutegger [mailto:marcel.reutegger@...]
Sent: 01 July 2008 09:47
To: users@...
Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

Hi,

sbarriba wrote:
> ..        PersistenceManager Cache:
>
> o   The "bundleCacheSize" determines how many nodes the PersistenceManager
> will cache. As this determines the lifetime of the references to the
> temporary BLOB cache if its not large enough BLOBs will be continually
read

> from the database (if using externalBlobs=false).
>
> o   Configurable in <PersistenceManager> XML block
>
> o   Default size 8MB
>
> o   This cache is shared by all sessions.
>
> o   Synchronised access using the ISMLocking stategy e.g. Default or
> FineGrained

correct, but there's additional synchronization in the persistence manager
using
conventional synchronized methods. e.g. see
AbstractBundlePersistenceManager.load(NodeId)

> ..        Session ItemManager Cache:
>
> o   Items are cached from the underlying persistence manager on a per
> session basis.
>
> o   Limit cannot be set.

not sure, but I think this cache is also managed (at least partially) by the

CacheManager.

> o   Uses a ReferenceMap which can be emptied by the JVM GC as required

that's the 'other part' that manages the cache ;)

items that are still referenced in the application will force the reference
map
to keep the respective ItemState instances (using weak references).

> o   Synchronised access using the itemCache object
>
> ..        CacheManager Cache:
>
> o   Limit can only be set programmatically via the Workspace cacheManager
>
> o   http://wiki.apache.org/jackrabbit/CacheManager
>
> o   Defaults to 16MB
>
> o   Its not clear as yet how the CacheManager relates, if at all, to the
> ItemManager cache

this only happens indirectly. see above.

> 2 questions:
>
> ..        What is the purpose of the CacheManager and which caches does it
> actually control?

It controls *all* the caches that contain ItemState instances.

> ..        For example, for a workspace with 100,000 nodes what is an
> appropriate setting for the Cache Manager?

I guess that depends on your JVM heap settings and the usage pattern. if you

have a lot of random reads over nearly all 100k nodes and performance is
critical you may consider caching all of them. have a look a
ItemState.calculateMemoryFootprint() for a formula on how the memory
consumption
is calculated.

regards
  marcel



Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Stefan Guggisberg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi sean

On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@...> wrote:
> Hi Marcel et al,
> 3 suggestions come to mind from this (perhaps for the develop list):
>
> 1) the ItemManager should be using Soft References rather than Weak
> References otherwise a PooledSessionInView pattern is not really effective
> as, pooled (but unused) sessions have their caches cleared immediately by
> the GC (using weak references).

ItemManager cashes ItemImpl instances. the 'cache' guarantees that there's
no more than 1 ItemImpl instance per item id and session. weak references
are ideal for this task. ItemManager is not meant to be a 'cache'
since ItemImpl
instance creation is IMO not performance critical. i remember that i once
experimented with soft references but they tended to fill the heap pretty fast
since soft references are typically cleared only when you're near an
OOM error...

ItemState caches are a different matter. LocalItemStateManager and
SharedItemStateManager do cache ItemState instances for performance
reasons. please take a look at the javadoc which should explain
why they're using weak references internally instead of soft references:

http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemStateReferenceCache.html

cheers
stefan

>
> 2) the CacheManager config needs to be externalised so it can be changed
> within the XML config, not programmatically.
>
> 3) its worth considering using a caching library (e.g. ehcahe) for the
> BundleCache at least? As a case study we've got multi-GB of binaries in
> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours after
> each restart filling /tmp. It would be great to use a caching library which
> supported a persistent cache etc. Obviously externalBlobs helps here.
>
> Regards,
> Shaun
>
> -----Original Message-----
> From: Marcel Reutegger [mailto:marcel.reutegger@...]
> Sent: 01 July 2008 09:47
> To: users@...
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
>
> Hi,
>
> sbarriba wrote:
>> ..        PersistenceManager Cache:
>>
>> o   The "bundleCacheSize" determines how many nodes the PersistenceManager
>> will cache. As this determines the lifetime of the references to the
>> temporary BLOB cache if its not large enough BLOBs will be continually
> read
>> from the database (if using externalBlobs=false).
>>
>> o   Configurable in <PersistenceManager> XML block
>>
>> o   Default size 8MB
>>
>> o   This cache is shared by all sessions.
>>
>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>> FineGrained
>
> correct, but there's additional synchronization in the persistence manager
> using
> conventional synchronized methods. e.g. see
> AbstractBundlePersistenceManager.load(NodeId)
>
>> ..        Session ItemManager Cache:
>>
>> o   Items are cached from the underlying persistence manager on a per
>> session basis.
>>
>> o   Limit cannot be set.
>
> not sure, but I think this cache is also managed (at least partially) by the
>
> CacheManager.
>
>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>
> that's the 'other part' that manages the cache ;)
>
> items that are still referenced in the application will force the reference
> map
> to keep the respective ItemState instances (using weak references).
>
>> o   Synchronised access using the itemCache object
>>
>> ..        CacheManager Cache:
>>
>> o   Limit can only be set programmatically via the Workspace cacheManager
>>
>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>
>> o   Defaults to 16MB
>>
>> o   Its not clear as yet how the CacheManager relates, if at all, to the
>> ItemManager cache
>
> this only happens indirectly. see above.
>
>> 2 questions:
>>
>> ..        What is the purpose of the CacheManager and which caches does it
>> actually control?
>
> It controls *all* the caches that contain ItemState instances.
>
>> ..        For example, for a workspace with 100,000 nodes what is an
>> appropriate setting for the Cache Manager?
>
> I guess that depends on your JVM heap settings and the usage pattern. if you
>
> have a lot of random reads over nearly all 100k nodes and performance is
> critical you may consider caching all of them. have a look a
> ItemState.calculateMemoryFootprint() for a formula on how the memory
> consumption
> is calculated.
>
> regards
>  marcel
>
>
>

RE: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by sbarriba :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Stefan,
So the intention is that once the session is no longer used then the
ItemImpl instances are cleared up? That makes sense except that when
investigating the lock contention issues we found that the creation of
ItemImpl can become expensive as they queue up on DefaultISMLocking.

When relying on sessions to cache some item data (with a shared session per
request model) via the ItemManager we found that this significantly reduced
contention as clients using sessions with some ItemImpls didn't hit
DefaultISMLocking. By choosing a suitable X request per 1 session ratio we
could spread the locking to increase throughput.

With a pooled session per view model (where each request exclusively has
access to one session) we found no benefit from the ItemManger cache due to
the Weak Referenced data being cleared up after each request.

Are the LocalItemStateManager and SharedItemStateManager intended to help
reduce the load on DefaultISMLocking?

Regards,
Shaun



-----Original Message-----
From: stefan.guggisberg@... [mailto:stefan.guggisberg@...] On
Behalf Of Stefan Guggisberg
Sent: 16 July 2008 13:25
To: users@...
Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

hi sean

On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@...> wrote:
> Hi Marcel et al,
> 3 suggestions come to mind from this (perhaps for the develop list):
>
> 1) the ItemManager should be using Soft References rather than Weak
> References otherwise a PooledSessionInView pattern is not really effective
> as, pooled (but unused) sessions have their caches cleared immediately by
> the GC (using weak references).

ItemManager cashes ItemImpl instances. the 'cache' guarantees that there's
no more than 1 ItemImpl instance per item id and session. weak references
are ideal for this task. ItemManager is not meant to be a 'cache'
since ItemImpl
instance creation is IMO not performance critical. i remember that i once
experimented with soft references but they tended to fill the heap pretty
fast
since soft references are typically cleared only when you're near an
OOM error...

ItemState caches are a different matter. LocalItemStateManager and
SharedItemStateManager do cache ItemState instances for performance
reasons. please take a look at the javadoc which should explain
why they're using weak references internally instead of soft references:

http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemSt
ateReferenceCache.html

cheers
stefan

>
> 2) the CacheManager config needs to be externalised so it can be changed
> within the XML config, not programmatically.
>
> 3) its worth considering using a caching library (e.g. ehcahe) for the
> BundleCache at least? As a case study we've got multi-GB of binaries in
> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours
after
> each restart filling /tmp. It would be great to use a caching library
which

> supported a persistent cache etc. Obviously externalBlobs helps here.
>
> Regards,
> Shaun
>
> -----Original Message-----
> From: Marcel Reutegger [mailto:marcel.reutegger@...]
> Sent: 01 July 2008 09:47
> To: users@...
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
CacheManager
>
> Hi,
>
> sbarriba wrote:
>> ..        PersistenceManager Cache:
>>
>> o   The "bundleCacheSize" determines how many nodes the
PersistenceManager

>> will cache. As this determines the lifetime of the references to the
>> temporary BLOB cache if its not large enough BLOBs will be continually
> read
>> from the database (if using externalBlobs=false).
>>
>> o   Configurable in <PersistenceManager> XML block
>>
>> o   Default size 8MB
>>
>> o   This cache is shared by all sessions.
>>
>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>> FineGrained
>
> correct, but there's additional synchronization in the persistence manager
> using
> conventional synchronized methods. e.g. see
> AbstractBundlePersistenceManager.load(NodeId)
>
>> ..        Session ItemManager Cache:
>>
>> o   Items are cached from the underlying persistence manager on a per
>> session basis.
>>
>> o   Limit cannot be set.
>
> not sure, but I think this cache is also managed (at least partially) by
the
>
> CacheManager.
>
>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>
> that's the 'other part' that manages the cache ;)
>
> items that are still referenced in the application will force the
reference

> map
> to keep the respective ItemState instances (using weak references).
>
>> o   Synchronised access using the itemCache object
>>
>> ..        CacheManager Cache:
>>
>> o   Limit can only be set programmatically via the Workspace cacheManager
>>
>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>
>> o   Defaults to 16MB
>>
>> o   Its not clear as yet how the CacheManager relates, if at all, to the
>> ItemManager cache
>
> this only happens indirectly. see above.
>
>> 2 questions:
>>
>> ..        What is the purpose of the CacheManager and which caches does
it
>> actually control?
>
> It controls *all* the caches that contain ItemState instances.
>
>> ..        For example, for a workspace with 100,000 nodes what is an
>> appropriate setting for the Cache Manager?
>
> I guess that depends on your JVM heap settings and the usage pattern. if
you

>
> have a lot of random reads over nearly all 100k nodes and performance is
> critical you may consider caching all of them. have a look a
> ItemState.calculateMemoryFootprint() for a formula on how the memory
> consumption
> is calculated.
>
> regards
>  marcel
>
>
>



Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Stefan Guggisberg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi shaun

On Sun, Jul 20, 2008 at 2:13 PM, sbarriba <sbarriba@...> wrote:
> Hi Stefan,
> So the intention is that once the session is no longer used then the
> ItemImpl instances are cleared up?

yes, unless ItemImpl instances are still being externally refertenced
by client code.

> That makes sense except that when
> investigating the lock contention issues we found that the creation of
> ItemImpl can become expensive as they queue up on DefaultISMLocking.

i don't think so. i guess there's a misunderstanding and you're confusing
ItemImpl and ItemState instances.

let me try to clear things up.

ItemImpl (i.e. NodeImpl and PropertyImpl) instances implement the JCR
interfaces javax.jcr.Node and javax.jcr.Property. they're dealt with at
the top-most layer in jackrabbit and they're managed by
o.a.j.core.ItemManager. there's one ItemManager per session.
ItemImpl instance creation per se should never be expensive since they
only encapsulate/wrap an Itemstate instance.

ItemState instances OTOH represent the core 'data' of a node/property.
they're managed on 3 separate layers:
 - transient (session local, SessionItemStateManager SISM)
 - local (tx local, LocalItemStateManager LISM)
 - shared (global, SharedItemStateManager SISM)

DefaultISMLocking is used by SISM, i.e. at the bottom layer.
SISM maintains a workspace-global cache of ItemState instances
read from the persistence layer. this cache is not affected
by session lifetime since it's shared among all sessions.

cheers
stefan

>
> When relying on sessions to cache some item data (with a shared session per
> request model) via the ItemManager we found that this significantly reduced
> contention as clients using sessions with some ItemImpls didn't hit
> DefaultISMLocking. By choosing a suitable X request per 1 session ratio we
> could spread the locking to increase throughput.
>
> With a pooled session per view model (where each request exclusively has
> access to one session) we found no benefit from the ItemManger cache due to
> the Weak Referenced data being cleared up after each request.
>
> Are the LocalItemStateManager and SharedItemStateManager intended to help
> reduce the load on DefaultISMLocking?
>
> Regards,
> Shaun
>
>
>
> -----Original Message-----
> From: stefan.guggisberg@... [mailto:stefan.guggisberg@...] On
> Behalf Of Stefan Guggisberg
> Sent: 16 July 2008 13:25
> To: users@...
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
>
> hi sean
>
> On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@...> wrote:
>> Hi Marcel et al,
>> 3 suggestions come to mind from this (perhaps for the develop list):
>>
>> 1) the ItemManager should be using Soft References rather than Weak
>> References otherwise a PooledSessionInView pattern is not really effective
>> as, pooled (but unused) sessions have their caches cleared immediately by
>> the GC (using weak references).
>
> ItemManager cashes ItemImpl instances. the 'cache' guarantees that there's
> no more than 1 ItemImpl instance per item id and session. weak references
> are ideal for this task. ItemManager is not meant to be a 'cache'
> since ItemImpl
> instance creation is IMO not performance critical. i remember that i once
> experimented with soft references but they tended to fill the heap pretty
> fast
> since soft references are typically cleared only when you're near an
> OOM error...
>
> ItemState caches are a different matter. LocalItemStateManager and
> SharedItemStateManager do cache ItemState instances for performance
> reasons. please take a look at the javadoc which should explain
> why they're using weak references internally instead of soft references:
>
> http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemSt
> ateReferenceCache.html
>
> cheers
> stefan
>
>>
>> 2) the CacheManager config needs to be externalised so it can be changed
>> within the XML config, not programmatically.
>>
>> 3) its worth considering using a caching library (e.g. ehcahe) for the
>> BundleCache at least? As a case study we've got multi-GB of binaries in
>> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours
> after
>> each restart filling /tmp. It would be great to use a caching library
> which
>> supported a persistent cache etc. Obviously externalBlobs helps here.
>>
>> Regards,
>> Shaun
>>
>> -----Original Message-----
>> From: Marcel Reutegger [mailto:marcel.reutegger@...]
>> Sent: 01 July 2008 09:47
>> To: users@...
>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
> CacheManager
>>
>> Hi,
>>
>> sbarriba wrote:
>>> ..        PersistenceManager Cache:
>>>
>>> o   The "bundleCacheSize" determines how many nodes the
> PersistenceManager
>>> will cache. As this determines the lifetime of the references to the
>>> temporary BLOB cache if its not large enough BLOBs will be continually
>> read
>>> from the database (if using externalBlobs=false).
>>>
>>> o   Configurable in <PersistenceManager> XML block
>>>
>>> o   Default size 8MB
>>>
>>> o   This cache is shared by all sessions.
>>>
>>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>>> FineGrained
>>
>> correct, but there's additional synchronization in the persistence manager
>> using
>> conventional synchronized methods. e.g. see
>> AbstractBundlePersistenceManager.load(NodeId)
>>
>>> ..        Session ItemManager Cache:
>>>
>>> o   Items are cached from the underlying persistence manager on a per
>>> session basis.
>>>
>>> o   Limit cannot be set.
>>
>> not sure, but I think this cache is also managed (at least partially) by
> the
>>
>> CacheManager.
>>
>>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>>
>> that's the 'other part' that manages the cache ;)
>>
>> items that are still referenced in the application will force the
> reference
>> map
>> to keep the respective ItemState instances (using weak references).
>>
>>> o   Synchronised access using the itemCache object
>>>
>>> ..        CacheManager Cache:
>>>
>>> o   Limit can only be set programmatically via the Workspace cacheManager
>>>
>>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>>
>>> o   Defaults to 16MB
>>>
>>> o   Its not clear as yet how the CacheManager relates, if at all, to the
>>> ItemManager cache
>>
>> this only happens indirectly. see above.
>>
>>> 2 questions:
>>>
>>> ..        What is the purpose of the CacheManager and which caches does
> it
>>> actually control?
>>
>> It controls *all* the caches that contain ItemState instances.
>>
>>> ..        For example, for a workspace with 100,000 nodes what is an
>>> appropriate setting for the Cache Manager?
>>
>> I guess that depends on your JVM heap settings and the usage pattern. if
> you
>>
>> have a lot of random reads over nearly all 100k nodes and performance is
>> critical you may consider caching all of them. have a look a
>> ItemState.calculateMemoryFootprint() for a formula on how the memory
>> consumption
>> is calculated.
>>
>> regards
>>  marcel
>>
>>
>>
>
>
>

Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Thomas Müller-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

>> So the intention is that once the session is no longer used then the
>> ItemImpl instances are cleared up?

Unfortunately sessions are kept in a list currently, so you need to
close them manually. See also
https://issues.apache.org/jira/browse/JCR-1216 "Unreferenced sessions
should get garbage collected".

Regards,
Thomas

Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Stefan Guggisberg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 23, 2008 at 11:21 PM, Thomas Müller <thomas.mueller@...> wrote:
> Hi,
>
>>> So the intention is that once the session is no longer used then the
>>> ItemImpl instances are cleared up?
>
> Unfortunately sessions are kept in a list currently, so you need to
> close them manually. See also

what list are you refering to?

the only place i know where active sessions are kept is in
RepositoryImpl:

    /**
     * active sessions (weak references)
     */
    private final ReferenceMap activeSessions =
            new ReferenceMap(ReferenceMap.WEAK, ReferenceMap.WEAK);

cheers
stefan

> https://issues.apache.org/jira/browse/JCR-1216 "Unreferenced sessions
> should get garbage collected".
>
> Regards,
> Thomas
>

RE: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by sbarriba :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Stefan et al,

"DefaultISMLocking is used by SISM, i.e. at the bottom layer. SISM maintains
a workspace-global cache of ItemState instances read from the persistence
layer. this cache is not affected by session lifetime since it's shared
among all sessions."

OK that makes sense.
To summarise what we're seeing, potential bottlenecks we think we're seeing
and how we worked around them. Please note I'm not 100% familiar with the
JackRabbit design so some conclusions may be wrong:

 1) application uses Session to read a Node Property
 2) SessionImpl delegates to ItemManager
 3) ItemManager synch on a itemCache (Contention Point 1: Session Wide)
 4) On cache miss, ItemManager ultimately delegates to an SISM
 5) SISM synchs on ISMLocking (Contention Point 2: Global or per item
depending on DefaultISM or FineGrainedISM implementation)
 6) On cache miss, SISM delegates to persistence manager
 7) AbstractBundlePersistenceManager synchs on itself (Contention Point 3:
On persistence Manager)

In some cases our web application will read 2,000 or 3,000 Node properties
to deliver a single page request.

Initially we saw 7) as a bottleneck:
 - can JackRabbit leverage multiple database connections if its synched on a
single persistence manager?
 - we resolved this by configuring a large BundleCache

We then saw 5) as a bottleneck:
 - it seems as each node property is an item every property read contends on
ISMLocking. Is that correct? Is there scope for reading properties/lazy
loading in bulk for item?
 - we partly resolved this by moving from an "pooled session per view"
pattern to a "shared session per view" pattern

We now see contention occasionally on 3).

It feels like there is scope for improving the concurrency in a few places -
plus consolidate the caching configuration which is currently different for
BundleCache vs SISM etc.



Regards,
Shaun




-----Original Message-----
From: stefan.guggisberg@... [mailto:stefan.guggisberg@...] On
Behalf Of Stefan Guggisberg
Sent: 21 July 2008 11:04
To: users@...
Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

hi shaun

On Sun, Jul 20, 2008 at 2:13 PM, sbarriba <sbarriba@...> wrote:
> Hi Stefan,
> So the intention is that once the session is no longer used then the
> ItemImpl instances are cleared up?

yes, unless ItemImpl instances are still being externally refertenced
by client code.

> That makes sense except that when
> investigating the lock contention issues we found that the creation of
> ItemImpl can become expensive as they queue up on DefaultISMLocking.

i don't think so. i guess there's a misunderstanding and you're confusing
ItemImpl and ItemState instances.

let me try to clear things up.

ItemImpl (i.e. NodeImpl and PropertyImpl) instances implement the JCR
interfaces javax.jcr.Node and javax.jcr.Property. they're dealt with at
the top-most layer in jackrabbit and they're managed by
o.a.j.core.ItemManager. there's one ItemManager per session.
ItemImpl instance creation per se should never be expensive since they
only encapsulate/wrap an Itemstate instance.

ItemState instances OTOH represent the core 'data' of a node/property.
they're managed on 3 separate layers:
 - transient (session local, SessionItemStateManager SISM)
 - local (tx local, LocalItemStateManager LISM)
 - shared (global, SharedItemStateManager SISM)

DefaultISMLocking is used by SISM, i.e. at the bottom layer.
SISM maintains a workspace-global cache of ItemState instances
read from the persistence layer. this cache is not affected
by session lifetime since it's shared among all sessions.

cheers
stefan

>
> When relying on sessions to cache some item data (with a shared session
per
> request model) via the ItemManager we found that this significantly
reduced
> contention as clients using sessions with some ItemImpls didn't hit
> DefaultISMLocking. By choosing a suitable X request per 1 session ratio we
> could spread the locking to increase throughput.
>
> With a pooled session per view model (where each request exclusively has
> access to one session) we found no benefit from the ItemManger cache due
to

> the Weak Referenced data being cleared up after each request.
>
> Are the LocalItemStateManager and SharedItemStateManager intended to help
> reduce the load on DefaultISMLocking?
>
> Regards,
> Shaun
>
>
>
> -----Original Message-----
> From: stefan.guggisberg@... [mailto:stefan.guggisberg@...] On
> Behalf Of Stefan Guggisberg
> Sent: 16 July 2008 13:25
> To: users@...
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
CacheManager
>
> hi sean
>
> On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@...> wrote:
>> Hi Marcel et al,
>> 3 suggestions come to mind from this (perhaps for the develop list):
>>
>> 1) the ItemManager should be using Soft References rather than Weak
>> References otherwise a PooledSessionInView pattern is not really
effective

>> as, pooled (but unused) sessions have their caches cleared immediately by
>> the GC (using weak references).
>
> ItemManager cashes ItemImpl instances. the 'cache' guarantees that there's
> no more than 1 ItemImpl instance per item id and session. weak references
> are ideal for this task. ItemManager is not meant to be a 'cache'
> since ItemImpl
> instance creation is IMO not performance critical. i remember that i once
> experimented with soft references but they tended to fill the heap pretty
> fast
> since soft references are typically cleared only when you're near an
> OOM error...
>
> ItemState caches are a different matter. LocalItemStateManager and
> SharedItemStateManager do cache ItemState instances for performance
> reasons. please take a look at the javadoc which should explain
> why they're using weak references internally instead of soft references:
>
>
http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemSt

> ateReferenceCache.html
>
> cheers
> stefan
>
>>
>> 2) the CacheManager config needs to be externalised so it can be changed
>> within the XML config, not programmatically.
>>
>> 3) its worth considering using a caching library (e.g. ehcahe) for the
>> BundleCache at least? As a case study we've got multi-GB of binaries in
>> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours
> after
>> each restart filling /tmp. It would be great to use a caching library
> which
>> supported a persistent cache etc. Obviously externalBlobs helps here.
>>
>> Regards,
>> Shaun
>>
>> -----Original Message-----
>> From: Marcel Reutegger [mailto:marcel.reutegger@...]
>> Sent: 01 July 2008 09:47
>> To: users@...
>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
> CacheManager
>>
>> Hi,
>>
>> sbarriba wrote:
>>> ..        PersistenceManager Cache:
>>>
>>> o   The "bundleCacheSize" determines how many nodes the
> PersistenceManager
>>> will cache. As this determines the lifetime of the references to the
>>> temporary BLOB cache if its not large enough BLOBs will be continually
>> read
>>> from the database (if using externalBlobs=false).
>>>
>>> o   Configurable in <PersistenceManager> XML block
>>>
>>> o   Default size 8MB
>>>
>>> o   This cache is shared by all sessions.
>>>
>>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>>> FineGrained
>>
>> correct, but there's additional synchronization in the persistence
manager

>> using
>> conventional synchronized methods. e.g. see
>> AbstractBundlePersistenceManager.load(NodeId)
>>
>>> ..        Session ItemManager Cache:
>>>
>>> o   Items are cached from the underlying persistence manager on a per
>>> session basis.
>>>
>>> o   Limit cannot be set.
>>
>> not sure, but I think this cache is also managed (at least partially) by
> the
>>
>> CacheManager.
>>
>>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>>
>> that's the 'other part' that manages the cache ;)
>>
>> items that are still referenced in the application will force the
> reference
>> map
>> to keep the respective ItemState instances (using weak references).
>>
>>> o   Synchronised access using the itemCache object
>>>
>>> ..        CacheManager Cache:
>>>
>>> o   Limit can only be set programmatically via the Workspace
cacheManager

>>>
>>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>>
>>> o   Defaults to 16MB
>>>
>>> o   Its not clear as yet how the CacheManager relates, if at all, to the
>>> ItemManager cache
>>
>> this only happens indirectly. see above.
>>
>>> 2 questions:
>>>
>>> ..        What is the purpose of the CacheManager and which caches does
> it
>>> actually control?
>>
>> It controls *all* the caches that contain ItemState instances.
>>
>>> ..        For example, for a workspace with 100,000 nodes what is an
>>> appropriate setting for the Cache Manager?
>>
>> I guess that depends on your JVM heap settings and the usage pattern. if
> you
>>
>> have a lot of random reads over nearly all 100k nodes and performance is
>> critical you may consider caching all of them. have a look a
>> ItemState.calculateMemoryFootprint() for a formula on how the memory
>> consumption
>> is calculated.
>>
>> regards
>>  marcel
>>
>>
>>
>
>
>



Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Stefan Guggisberg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi shaun

On Thu, Jul 24, 2008 at 1:26 PM, sbarriba <sbarriba@...> wrote:

> Hi Stefan et al,
>
> "DefaultISMLocking is used by SISM, i.e. at the bottom layer. SISM maintains
> a workspace-global cache of ItemState instances read from the persistence
> layer. this cache is not affected by session lifetime since it's shared
> among all sessions."
>
> OK that makes sense.
> To summarise what we're seeing, potential bottlenecks we think we're seeing
> and how we worked around them. Please note I'm not 100% familiar with the
> JackRabbit design so some conclusions may be wrong:
>
>  1) application uses Session to read a Node Property
>  2) SessionImpl delegates to ItemManager
>  3) ItemManager synch on a itemCache (Contention Point 1: Session Wide)
>  4) On cache miss, ItemManager ultimately delegates to an SISM
>  5) SISM synchs on ISMLocking (Contention Point 2: Global or per item
> depending on DefaultISM or FineGrainedISM implementation)
>  6) On cache miss, SISM delegates to persistence manager
>  7) AbstractBundlePersistenceManager synchs on itself (Contention Point 3:
> On persistence Manager)
>
> In some cases our web application will read 2,000 or 3,000 Node properties
> to deliver a single page request.
>
> Initially we saw 7) as a bottleneck:
>  - can JackRabbit leverage multiple database connections if its synched on a
> single persistence manager?

no. the PM would need to be adapted/rewritten in order to benefit from
multiple db connections.

>  - we resolved this by configuring a large BundleCache
>
> We then saw 5) as a bottleneck:
>  - it seems as each node property is an item every property read contends on
> ISMLocking. Is that correct? Is there scope for reading properties/lazy
> loading in bulk for item?

that's what the bundle pm should actually be doing...

>  - we partly resolved this by moving from an "pooled session per view"
> pattern to a "shared session per view" pattern
>
> We now see contention occasionally on 3).

please note that a JCR session is not thread safe and should therefore not
be shared among mutliple threads.

if you're experiencing lock contention on ItemManager.itemCache you're
obviously do share sessions...

>
> It feels like there is scope for improving the concurrency in a few places -
> plus consolidate the caching configuration which is currently different for
> BundleCache vs SISM etc.

absolutely agreed, and thanks for your feedback/analysis. that's very
much appreciated.

cheers
stefan

>
>
>
> Regards,
> Shaun
>
>
>
>
> -----Original Message-----
> From: stefan.guggisberg@... [mailto:stefan.guggisberg@...] On
> Behalf Of Stefan Guggisberg
> Sent: 21 July 2008 11:04
> To: users@...
> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager
>
> hi shaun
>
> On Sun, Jul 20, 2008 at 2:13 PM, sbarriba <sbarriba@...> wrote:
>> Hi Stefan,
>> So the intention is that once the session is no longer used then the
>> ItemImpl instances are cleared up?
>
> yes, unless ItemImpl instances are still being externally refertenced
> by client code.
>
>> That makes sense except that when
>> investigating the lock contention issues we found that the creation of
>> ItemImpl can become expensive as they queue up on DefaultISMLocking.
>
> i don't think so. i guess there's a misunderstanding and you're confusing
> ItemImpl and ItemState instances.
>
> let me try to clear things up.
>
> ItemImpl (i.e. NodeImpl and PropertyImpl) instances implement the JCR
> interfaces javax.jcr.Node and javax.jcr.Property. they're dealt with at
> the top-most layer in jackrabbit and they're managed by
> o.a.j.core.ItemManager. there's one ItemManager per session.
> ItemImpl instance creation per se should never be expensive since they
> only encapsulate/wrap an Itemstate instance.
>
> ItemState instances OTOH represent the core 'data' of a node/property.
> they're managed on 3 separate layers:
>  - transient (session local, SessionItemStateManager SISM)
>  - local (tx local, LocalItemStateManager LISM)
>  - shared (global, SharedItemStateManager SISM)
>
> DefaultISMLocking is used by SISM, i.e. at the bottom layer.
> SISM maintains a workspace-global cache of ItemState instances
> read from the persistence layer. this cache is not affected
> by session lifetime since it's shared among all sessions.
>
> cheers
> stefan
>
>>
>> When relying on sessions to cache some item data (with a shared session
> per
>> request model) via the ItemManager we found that this significantly
> reduced
>> contention as clients using sessions with some ItemImpls didn't hit
>> DefaultISMLocking. By choosing a suitable X request per 1 session ratio we
>> could spread the locking to increase throughput.
>>
>> With a pooled session per view model (where each request exclusively has
>> access to one session) we found no benefit from the ItemManger cache due
> to
>> the Weak Referenced data being cleared up after each request.
>>
>> Are the LocalItemStateManager and SharedItemStateManager intended to help
>> reduce the load on DefaultISMLocking?
>>
>> Regards,
>> Shaun
>>
>>
>>
>> -----Original Message-----
>> From: stefan.guggisberg@... [mailto:stefan.guggisberg@...] On
>> Behalf Of Stefan Guggisberg
>> Sent: 16 July 2008 13:25
>> To: users@...
>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
> CacheManager
>>
>> hi sean
>>
>> On Tue, Jul 1, 2008 at 7:11 PM, sbarriba <sbarriba@...> wrote:
>>> Hi Marcel et al,
>>> 3 suggestions come to mind from this (perhaps for the develop list):
>>>
>>> 1) the ItemManager should be using Soft References rather than Weak
>>> References otherwise a PooledSessionInView pattern is not really
> effective
>>> as, pooled (but unused) sessions have their caches cleared immediately by
>>> the GC (using weak references).
>>
>> ItemManager cashes ItemImpl instances. the 'cache' guarantees that there's
>> no more than 1 ItemImpl instance per item id and session. weak references
>> are ideal for this task. ItemManager is not meant to be a 'cache'
>> since ItemImpl
>> instance creation is IMO not performance critical. i remember that i once
>> experimented with soft references but they tended to fill the heap pretty
>> fast
>> since soft references are typically cleared only when you're near an
>> OOM error...
>>
>> ItemState caches are a different matter. LocalItemStateManager and
>> SharedItemStateManager do cache ItemState instances for performance
>> reasons. please take a look at the javadoc which should explain
>> why they're using weak references internally instead of soft references:
>>
>>
> http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/core/state/ItemSt
>> ateReferenceCache.html
>>
>> cheers
>> stefan
>>
>>>
>>> 2) the CacheManager config needs to be externalised so it can be changed
>>> within the XML config, not programmatically.
>>>
>>> 3) its worth considering using a caching library (e.g. ehcahe) for the
>>> BundleCache at least? As a case study we've got multi-GB of binaries in
>>> BLOBs in the database and the BundleCache (at 100MB+)  spends 2 hours
>> after
>>> each restart filling /tmp. It would be great to use a caching library
>> which
>>> supported a persistent cache etc. Obviously externalBlobs helps here.
>>>
>>> Regards,
>>> Shaun
>>>
>>> -----Original Message-----
>>> From: Marcel Reutegger [mailto:marcel.reutegger@...]
>>> Sent: 01 July 2008 09:47
>>> To: users@...
>>> Subject: Re: JackRabbit Caching: BundleCache vs ItemManager vs
>> CacheManager
>>>
>>> Hi,
>>>
>>> sbarriba wrote:
>>>> ..        PersistenceManager Cache:
>>>>
>>>> o   The "bundleCacheSize" determines how many nodes the
>> PersistenceManager
>>>> will cache. As this determines the lifetime of the references to the
>>>> temporary BLOB cache if its not large enough BLOBs will be continually
>>> read
>>>> from the database (if using externalBlobs=false).
>>>>
>>>> o   Configurable in <PersistenceManager> XML block
>>>>
>>>> o   Default size 8MB
>>>>
>>>> o   This cache is shared by all sessions.
>>>>
>>>> o   Synchronised access using the ISMLocking stategy e.g. Default or
>>>> FineGrained
>>>
>>> correct, but there's additional synchronization in the persistence
> manager
>>> using
>>> conventional synchronized methods. e.g. see
>>> AbstractBundlePersistenceManager.load(NodeId)
>>>
>>>> ..        Session ItemManager Cache:
>>>>
>>>> o   Items are cached from the underlying persistence manager on a per
>>>> session basis.
>>>>
>>>> o   Limit cannot be set.
>>>
>>> not sure, but I think this cache is also managed (at least partially) by
>> the
>>>
>>> CacheManager.
>>>
>>>> o   Uses a ReferenceMap which can be emptied by the JVM GC as required
>>>
>>> that's the 'other part' that manages the cache ;)
>>>
>>> items that are still referenced in the application will force the
>> reference
>>> map
>>> to keep the respective ItemState instances (using weak references).
>>>
>>>> o   Synchronised access using the itemCache object
>>>>
>>>> ..        CacheManager Cache:
>>>>
>>>> o   Limit can only be set programmatically via the Workspace
> cacheManager
>>>>
>>>> o   http://wiki.apache.org/jackrabbit/CacheManager
>>>>
>>>> o   Defaults to 16MB
>>>>
>>>> o   Its not clear as yet how the CacheManager relates, if at all, to the
>>>> ItemManager cache
>>>
>>> this only happens indirectly. see above.
>>>
>>>> 2 questions:
>>>>
>>>> ..        What is the purpose of the CacheManager and which caches does
>> it
>>>> actually control?
>>>
>>> It controls *all* the caches that contain ItemState instances.
>>>
>>>> ..        For example, for a workspace with 100,000 nodes what is an
>>>> appropriate setting for the Cache Manager?
>>>
>>> I guess that depends on your JVM heap settings and the usage pattern. if
>> you
>>>
>>> have a lot of random reads over nearly all 100k nodes and performance is
>>> critical you may consider caching all of them. have a look a
>>> ItemState.calculateMemoryFootprint() for a formula on how the memory
>>> consumption
>>> is calculated.
>>>
>>> regards
>>>  marcel
>>>
>>>
>>>
>>
>>
>>
>
>
>

Re: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by Thomas Müller-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

> what list are you refering to?

See https://issues.apache.org/jira/browse/JCR-1216
The problem seems to be TransientRepository.session, which is a HashSet.

Regards,
Thomas

RE: JackRabbit Caching: BundleCache vs ItemManager vs CacheManager

by sbarriba :: Rate this Message: