How to delete lots of related keys at once

View: New views
15 Messages — Rating Filter:   Alert me  

How to delete lots of related keys at once

by Jarom Severson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

My main confusion or concern with memcache is how to
go about deleting a given set of keys in the
cache. For instance, we probably have 10-15 pages on
our site that hit the "articles" tables in our
database. So, that right there would be 10-15 unique
key/value pairs in memcache, assuming I store the
query as the key, and the result set as the value. Now
add on dynamic search strings, pagination, etc, and
you've probably got 100 unique key/value pairs in
memcache, all pertaining to articles.

So my question is, if we want to make sure our site is
never stale, then we need to invalidate all 100 of
those keys if someone inserts a new article. Now I'm
thinking I need to do a search in my keys to find all
article related keys and invalidate them.

Is the best way to do this then to store ANOTHER key
value pair which essentially stores all the article
related keys as its value? So that I can quickly and
easily invalidate the 100 other keys?

It seems like this might be what "namespaces" are for,
but I can't understand the example given on the
memcached FAG page.

Any help and direction in this area would be much
appreciated, or any other good articles I should read
that talk about best practices, implementation
techniques, etc.

thanks in advance.

Jay


Re: How to delete lots of related keys at once

by Clint Webb :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Why is everyone using memcache for the SQL query as a key?  That is what the query cache on your database is for.  You gain very little doing it this way, except now you have to manage the invalidation yourself, which the query-cache does for you.

Anyway, you can set your keys with an expiry so that they dissapear after a certain time and can be regenerated.  Each site is different, but for any kind of data that could change, and you dont have an easy way of identifying the affected keys when it does change, then you should give it an expiry time.  That way, eventually, changes will be replicated thru the cache as the keys expire and regenerate.

Namespaces probably isnt what you are looking for.  It merely adds a string to the begining of the key.  Useful if you are using the same memcache cluster for multiple applications, but thats about it.  If you normally use a key of "Article:id=5", but use a namespace of "Supersite-" then internally your key will now be "Supersite-Article:id=5".

I use namespaces for my sitelogging (web instances write logging info to cache using incremented keys, external process pulls those log entries out of cache and puts them in a file).  I have multiple sites use the same memcache cluster, and the same piece of code, but the sites logs are kept seperate, and copied out to different files, because when the different sites (and external processing script) connect to the cache, they use different namespaces.

On 8/3/07, Jarom Severson <j.severson@...> wrote:
My main confusion or concern with memcache is how to
go about deleting a given set of keys in the
cache. For instance, we probably have 10-15 pages on
our site that hit the "articles" tables in our
database. So, that right there would be 10-15 unique
key/value pairs in memcache, assuming I store the
query as the key, and the result set as the value. Now
add on dynamic search strings, pagination, etc, and
you've probably got 100 unique key/value pairs in
memcache, all pertaining to articles.

So my question is, if we want to make sure our site is
never stale, then we need to invalidate all 100 of
those keys if someone inserts a new article. Now I'm
thinking I need to do a search in my keys to find all
article related keys and invalidate them.

Is the best way to do this then to store ANOTHER key
value pair which essentially stores all the article
related keys as its value? So that I can quickly and
easily invalidate the 100 other keys?

It seems like this might be what "namespaces" are for,
but I can't understand the example given on the
memcached FAG page.

Any help and direction in this area would be much
appreciated, or any other good articles I should read
that talk about best practices, implementation
techniques, etc.

thanks in advance.

Jay




--
"Be excellent to each other"

RE: How to delete lots of related keys at once

by Ramon Leon-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> My main confusion or concern with memcache is how to go about
> deleting a given set of keys in the cache. For instance, we
> probably have 10-15 pages on our site that hit the "articles"
> tables in our database. So, that right there would be 10-15
> unique key/value pairs in memcache, assuming I store the
> query as the key, and the result set as the value. Now add on
> dynamic search strings, pagination, etc, and you've probably
> got 100 unique key/value pairs in memcache, all pertaining to
> articles.
>
> So my question is, if we want to make sure our site is never
> stale, then we need to invalidate all 100 of those keys if
> someone inserts a new article. Now I'm thinking I need to do
> a search in my keys to find all article related keys and
> invalidate them.
>
> Is the best way to do this then to store ANOTHER key value
> pair which essentially stores all the article related keys as
> its value? So that I can quickly and easily invalidate the
> 100 other keys?
>
> It seems like this might be what "namespaces" are for, but I
> can't understand the example given on the memcached FAG page.
>
> Any help and direction in this area would be much
> appreciated, or any other good articles I should read that
> talk about best practices, implementation techniques, etc.
>
> thanks in advance.
>
> Jay

If adding a new article invalidates a hundred keys, wouldn't it seem that
you're caching at the wrong granularity?  

IMHO, memcached isn't about caching result sets whole, it's about caching
individual objects.  Were I to cache a result set, it'd only cache it as a
list of matching id's for a very limited time, but I'd cache each article
individually.  

When a new article was added, I'd save it to the database and to memcached
at the same time, keeping the cache up to date and then allow the database
to do what it's good at, querying, but just for the id's that match a query.


I'd fetch the bulk of the article data individually from memcached.  With a
list of matching id's, you could use getmulti to fetch all the articles.
I'd never bother with expiring keys, I'd let them expire naturally on their
own, or get pushed out by new data.

Ramon Leon


Re: How to delete lots of related keys at once

by Iain Wade-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> So my question is, if we want to make sure our site is
> never stale, then we need to invalidate all 100 of
> those keys if someone inserts a new article. Now I'm
> thinking I need to do a search in my keys to find all
> article related keys and invalidate them.

An idea might be to determine the validity based on something other
than the existance of the cached data.

Keep track of the highest post id, or the most recent post timestamp,
and if the cached value returned is older than that, disregard the
info and refresh it.

--Iain

Re: How to delete lots of related keys at once

by Don MacAskill-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Clint Webb wrote:
>
> Why is everyone using memcache for the SQL query as a key?  That is what
> the query cache on your database is for.  You gain very little doing it
> this way, except now you have to manage the invalidation yourself, which
> the query-cache does for you.
>

While I have no experience with, say, PostgreSQL's or Oracle's query
caches, MySQL's is terrible for this.

The way it invalides the cache is "if TABLE has changed in any way, get
rid of any cache entries referencing this TABLE":

INSERT INTO table (id) VALUES (1);
SELECT * FROM table WHERE id=1;
// this query is now cached
INSERT INTO table (id) VALUES (2);
// the first SELECT is now not cached anymore, despite the 2nd insert
having nothing to do with it

You can, of course, do much better than this yourself if you goal is to
cache certain SQL queries.   Which is why some people do this, or
something like it, using memcached.

There are other reasons, too, like memory size.  What if you want to
cache, say, 1TB of queries?  Most of us don't have DB boxes with 1TB of
RAM...  But putting together 32 boxes with 32GB each is easily doable
(easily being relative to getting a 1TB server).

Don

Re: How to delete lots of related keys at once

by Clint Webb :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yes, of course you are right there.  I was being overly generic.   Or, rather, I was being specific to the original posters problem, along with the rash of people asking questions about similar problems. 

The reason why the query cache on mysql is dumped whenever anything on the table is updated, is because it is the safest option.  You can of course, determine if something is not unsafe yourself, when using memcache in front of it, but then you have this problem... figuring out which keys are now invalid because of some new data that has been added.

Something has to be the source of truth.  In most cases it would be the database.  In a normal real-life situation it is impossible to say that your memcache and your database will be entirely in sync, and that both can be the source of truth.  For example, A node in your cache cluster may die, and you may lose 25% of the keys..  So if a key is missing, build it from the database. 

Therefore, your cache can only be considered "mostly true".  If you've been very careful, it could be very near to be in sync with teh database... if it never goes offline, and none of your keys expire unexpectedly.. etc.  

My long winded point is.... It might be perfectly alright to cache your query results.  But it should be an explicit decision to do that, for a specific reason.  I would hate for people to have picked up memcached, plopped it on their servers and merely cache all the queries that they do, for a quick speed boost, because there can be ramifications that are difficult to overcome.  Especially since you cant do a wildcard delete (yet, maybe ever).

As far as speed boosts go... make sure you have appropriate indexes on your tables.  You might be surprised how many programmers out there have built database apps, and never thought about indexes, because quite frankly, it works without it.   Until you have a million records and your queries start taking 20 seconds to process... of course... the programmers easy fix for that is to add memcache in front of the queries :)

Btw, I agree that spreading the query cache over a cluster is a very good reason.

On 8/3/07, Don MacAskill <don@...> wrote:


Clint Webb wrote:
>
> Why is everyone using memcache for the SQL query as a key?  That is what
> the query cache on your database is for.  You gain very little doing it
> this way, except now you have to manage the invalidation yourself, which
> the query-cache does for you.
>

While I have no experience with, say, PostgreSQL's or Oracle's query
caches, MySQL's is terrible for this.

The way it invalides the cache is "if TABLE has changed in any way, get
rid of any cache entries referencing this TABLE":

INSERT INTO table (id) VALUES (1);
SELECT * FROM table WHERE id=1;
// this query is now cached
INSERT INTO table (id) VALUES (2);
// the first SELECT is now not cached anymore, despite the 2nd insert
having nothing to do with it

You can, of course, do much better than this yourself if you goal is to
cache certain SQL queries.   Which is why some people do this, or
something like it, using memcached.

There are other reasons, too, like memory size.  What if you want to
cache, say, 1TB of queries?  Most of us don't have DB boxes with 1TB of
RAM...  But putting together 32 boxes with 32GB each is easily doable
(easily being relative to getting a 1TB server).

Don



--
"Be excellent to each other"

Re: How to delete lots of related keys at once

by Perrin Harkins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 8/2/07, Don MacAskill <don@...> wrote:
> The way it invalides the cache is "if TABLE has changed in any way, get
> rid of any cache entries referencing this TABLE":
[...]
> You can, of course, do much better than this yourself if you goal is to
> cache certain SQL queries.

It's pretty hard to correctly determine which values a SQL statement
is going to modify without building your own RDBMS.  If you found a
way to do it for the general case, every database vendor would beat a
path to your door.  I'm sure you could do it for certain special cases
in a specific application though.

I think most people skip worrying about invalidation and opt to only
cache things that they can stand to have displaying incorrect data
until the cache expires.  Since computing a result from the source
data is slow (that's why you cache it), computing the results that are
affected by a change in the source data is also likely to be slow.

- Perrin

Re: How to delete lots of related keys at once

by BabyPunt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Why is everyone using memcache for the SQL query as a key?  That is what the query cache on your database is for.  You gain very little doing it this way, except now you have to manage the invalidation yourself, which the query-cache does for you.
 
I was wondering about this too, as I was seeing the query cache being just as fast.  What should qualify for memcache then?  I imagine html snippets which includes data already fetched.  However I'm already using Smarty, so I can't just grab the html and stuff it back to memcache, can I?
 
What are some good places to use memcache?
 

 

Re: How to delete lots of related keys at once

by Travis Boucher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

K J wrote:

>
>     Why is everyone using memcache for the SQL query as a key?  That
>     is what the query cache on your database is for.  You gain very
>     little doing it this way, except now you have to manage the
>     invalidation yourself, which the query-cache does for you.
>
>  
> I was wondering about this too, as I was seeing the query cache being
> just as fast.  What should qualify for memcache then?  I imagine html
> snippets which includes data already fetched.  However I'm already
> using Smarty, so I can't just grab the html and stuff it back to
> memcache, can I?
>  
> What are some good places to use memcache?
>  
>
>  
> !DSPAM:8,46b2d02297611262164653!
Although off-topic, it'd be easy to implement memcached as the cache
back end for Smarty.  See
http://smarty.php.net/manual/en/section.template.cache.handler.func.php.  
However I'd question the gain unless you have tons of templates.

Re: How to delete lots of related keys at once

by BabyPunt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

My main confusion or concern with memcache is how to
go about deleting a given set of keys in the
cache. For instance, we probably have 10-15 pages on
our site that hit the "articles" tables in our
database. So, that right there would be 10-15 unique
key/value pairs in memcache, assuming I store the
query as the key, and the result set as the value. Now
add on dynamic search strings, pagination, etc, and
you've probably got 100 unique key/value pairs in
memcache, all pertaining to articles.

So my question is, if we want to make sure our site is
never stale, then we need to invalidate all 100 of
those keys if someone inserts a new article. Now I'm
thinking I need to do a search in my keys to find all
article related keys and invalidate them.

Is the best way to do this then to store ANOTHER key
value pair which essentially stores all the article
related keys as its value? So that I can quickly and
easily invalidate the 100 other keys?

It seems like this might be what "namespaces" are for,
but I can't understand the example given on the
memcached FAG page.

Any help and direction in this area would be much
appreciated, or any other good articles I should read
that talk about best practices, implementation
techniques, etc.
 
I'm facing the same problem here.  From the responses I'm guessing that most of you think it's better to cache each individual "article" and perhaps the article titles for display.  Listings still hit the database directly.  The only savings would be that the app wouldn't have to do another SQL to fetch the article titles/links/data, as it can get it from the cache.
 
However, is there no way to cache these pages effectively?  For instance, what if say I set a default expiration time for anything that's not on page 1.  Then whenever an article gets inserted, the first 10 pages are purged, while pages 11 onwards are expired on their own.
 
What do you guys think of this solution?
 

Re: How to delete lots of related keys at once

by BabyPunt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For instance, I'd like to cache my search pages, as they are really database/cpu intensive.  For instance, on a social networking site someone could be searching based on sex, height, religion, hobbies, location, and other combined criteria.  This sort of query hits the databse hard, and it wouldn't take too many of those to crash the db server.
 
In this case wouldn't it be great to cache every single search that comes in, so that in a duplicate search the app wouldn't have to hit the database again?
 

Re: How to delete lots of related keys at once

by Dustin Sallings :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 19, 2007, at 7:02 , K J wrote:

I'm facing the same problem here.  From the responses I'm guessing that most of you think it's better to cache each individual "article" and perhaps the article titles for display.  Listings still hit the database directly.  The only savings would be that the app wouldn't have to do another SQL to fetch the article titles/links/data, as it can get it from the cache. 
 
However, is there no way to cache these pages effectively?  For instance, what if say I set a default expiration time for anything that's not on page 1.  Then whenever an article gets inserted, the first 10 pages are purged, while pages 11 onwards are expired on their own.
 
What do you guys think of this solution?

Have you looked at the proposed solutions (i.e. tags and regex)?

In general, having multiple copies of the same data seems like a bad idea.  When you cache the article,  you're avoiding the DB hit.  When you cache the page, you're memoizing a template render.  These are solving different problems.

On Sep 19, 2007, at 7:07 , K J wrote:

For instance, I'd like to cache my search pages, as they are really database/cpu intensive.  For instance, on a social networking site someone could be searching based on sex, height, religion, hobbies, location, and other combined criteria.  This sort of query hits the databse hard, and it wouldn't take too many of those to crash the db server.
 
In this case wouldn't it be great to cache every single search that comes in, so that in a duplicate search the app wouldn't have to hit the database again?

What do you expect the hit ratio to be on this?  What aspect of your search is intensive?  Are you, perhaps pulling too much data out with your search results such that you're pulling out information on each user that you could look up from cache?

-- 
Dustin Sallings



Re: How to delete lots of related keys at once

by Jarom Severson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For what it's worth, here is the solution I came up
with and am now using successfully on our site (which
gets about 300k page views per day, so it's had its
trial by fire so to speak).

Sorry there are so few comments in this file, I
haven't had the time to really clean it up well.

After you've read through the file, here is an example
of how I use it in the ORM to store the query, and
another example of how I bust the cache on a new
article insert:

//Create our memcache key based on our ORM function
name and parameters
$strCacheKey =
MyMemcache::QueryToKey("Article->LoadAllEnabled(" .
serialize($objOptionalClauses) . ")");

//Check memcache
if(false === ($objArticleArray =
MyMemcache::G()->GetQuery($strCacheKey))) {
  //If not in memcache, do the work
  $objArticleArray = Article::QueryArray(
    QQ::AndCondition(
      QQ::Equal(QQN::Article()->User->IsEnabled,true),
      QQ::Equal(QQN::Article()->IsBuried,false),
      QQ::Equal(QQN::Article()->IsLive,true)
    ),
    $objOptionalClauses

  );
  //Stick result set into memcache and put the key
into the "article" set
  MyMemcache::G()->SetQuery($strCacheKey,
$objArticleArray,'article');
}


//When new article is posted, or one is deleted, etc
MyMemcache::G()->DeleteBySet(array('article',MyMemcache::GetUserSetName($intArticleUserId)));

Hopefully that helps.

Jay


--- Dustin Sallings <dustin@...> wrote:

>
> On Sep 19, 2007, at 7:02 , K J wrote:
>
> > I'm facing the same problem here.  From the
> responses I'm guessing  
> > that most of you think it's better to cache each
> individual  
> > "article" and perhaps the article titles for
> display.  Listings  
> > still hit the database directly.  The only savings
> would be that  
> > the app wouldn't have to do another SQL to fetch
> the article titles/
> > links/data, as it can get it from the cache.
> >
> > However, is there no way to cache these pages
> effectively?  For  
> > instance, what if say I set a default expiration
> time for anything  
> > that's not on page 1.  Then whenever an article
> gets inserted, the  
> > first 10 pages are purged, while pages 11 onwards
> are expired on  
> > their own.
> >
> > What do you guys think of this solution?
>
> Have you looked at the proposed solutions (i.e.
> tags and regex)?
>
> In general, having multiple copies of the same data
> seems like a bad  
> idea.  When you cache the article,  you're avoiding
> the DB hit.  When  
> you cache the page, you're memoizing a template
> render.  These are  
> solving different problems.
>
> On Sep 19, 2007, at 7:07 , K J wrote:
>
> > For instance, I'd like to cache my search pages,
> as they are really  
> > database/cpu intensive.  For instance, on a social
> networking site  
> > someone could be searching based on sex, height,
> religion, hobbies,  
> > location, and other combined criteria.  This sort
> of query hits the  
> > databse hard, and it wouldn't take too many of
> those to crash the  
> > db server.
> >
> > In this case wouldn't it be great to cache every
> single search that  
> > comes in, so that in a duplicate search the app
> wouldn't have to  
> > hit the database again?
>
> What do you expect the hit ratio to be on this?
> What aspect of your  
> search is intensive?  Are you, perhaps pulling too
> much data out with  
> your search results such that you're pulling out
> information on each  
> user that you could look up from cache?
>
> --
> Dustin Sallings
>
>
>


MyMemcache.zip (2K) Download Attachment

Re: How to delete lots of related keys at once

by BabyPunt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Have you looked at the proposed solutions (i.e. tags and regex)?

 
In general, having multiple copies of the same data seems like a bad idea.  When you cache the article,  you're avoiding the DB hit.  When you cache the page, you're memoizing a template render.  These are solving different problems.
 
Where can I find the proposed tags and regex solutions?
 

Re: How to delete lots of related keys at once

by Dustin Sallings :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 20, 2007, at 1:40 , K J wrote:

Where can I find the proposed tags and regex solutions?

There's not a clear proposal on tags, but this is a description of how each would work:


For tags, the idea is to have some extra commands (tag key tag_name; delete_tag tag_name; maybe a couple others) and have the amortized invalidation mechanism described above.

-- 
Dustin Sallings


LightInTheBox - Buy quality products at wholesale price!