OS and architecture migration for DCC

View: New views
4 Messages — Rating Filter:   Alert me  

OS and architecture migration for DCC

by Gary Mills :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

I'm planning an upgrade of our e-mail server, which is also one of our
dccd and grey servers.  I'm doing this first on a test e-mail server
that has only isolated dccd and grey servers.  It gets a bit of spam,
but the databases are mostly empty.  The migration is from Solaris 9
to Solaris 10, from SPARC (big-endian) to x86 (little-endian), and
from UFS to ZFS.  Talk about a torture test!  I only expect the byte
order to be a problem.

I'm using the same version of DCC in both cases, but recompiled for
the Solaris 10 x86 server.  On that server, I started with a DCC
directory (/usr/local/dcc) that was a copy of the one from Solaris 9
SPARC.  I then reinstalled DCC so that the executables would all be
x86 ones.  The daemons logged some interesting errors, presumably due
to the byte order, but eventually seemed to run normally.  This was
for dccd:

Jun  9 16:03:02 setup01 dccd[1198]: [ID 702911 mail.error] dcc_db has page size 16128 incompatible with 15654912 in dcc_db.hash
Jun  9 16:03:02 setup01 dccd[1198]: [ID 702911 mail.error] dcc_db says it contains 1158682945536393216 bytes or more than the actual size of 8257536
Jun  9 16:03:02 setup01 dccd[1199]: [ID 702911 mail.notice] database initially broken; starting `/usr/local/dcc/libexec/dbclean -Pq4SRbad -i 9003`
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 839192 mail.notice] 1.3.86 repairing /usr/local/dcc/dcc_db
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 702911 mail.error] explicit repair of dcc_db
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 702911 mail.error] unexpected EOF in dcc_db at 0x7e0000 instead of 0x1014780000000000
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 394617 mail.notice] expired 19 records and 17 checksums, obsoleted 63 checksums in dcc_db
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 582593 mail.notice] hashed 262205 records containing 393286 checksums, compressed 25 records
Jun  9 16:03:03 setup01 dbclean[1199]: [ID 838263 mail.notice] 6709240 hash entries total, 131619 or 1% used
Jun  9 16:03:03 setup01 dccd[1198]: [ID 702911 mail.notice] unrecognized hash_len=33554432 in /usr/local/dcc/dccd_clients
Jun  9 16:03:03 setup01 dccd[1198]: [ID 702911 mail.notice] 1.3.86 listening to port 6277  /usr/local/dcc  window=1911MB  real=33,553,660KB  max RSS=1920MB  DB max=2400MB

This was for grey:

Jun  9 16:08:38 setup01 dccd grey[1230]: [ID 702911 mail.error] grey_db is not a greylist database but must be
Jun  9 16:08:38 setup01 dccd grey[1232]: [ID 702911 mail.notice] database initially broken; starting `/usr/local/dcc/libexec/dbclean -Pq4SRbad -Gon -i 9003 -Gon`
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 839192 mail.notice] 1.3.86 repairing /usr/local/dcc/grey_db
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 702911 mail.error] explicit repair of grey_db
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 702911 mail.error] unexpected EOF in grey_db at 0x888000 instead of 0x60217b0000000000
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 304167 mail.notice] expired 4 records and 10 checksums in grey_db
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 582593 mail.notice] hashed 268691 records containing 403101 checksums, compressed 0 records
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 838263 mail.notice] 638968 hash entries total, 135365 or 21% used
Jun  9 16:08:39 setup01 dccd grey[1230]: [ID 702911 mail.notice] unrecognized hash_len=33554432 in /usr/local/dcc/grey_clients
Jun  9 16:08:39 setup01 dccd grey[1230]: [ID 702911 mail.notice] 1.3.86 listening to port 6276  /usr/local/dcc  window=273MB  real=33,553,660KB  max RSS=1920MB  DB max=2400MB

dccm was much better behaved:

Jun  9 16:12:06 setup01 dccm[1256]: [ID 702911 mail.notice] 1.3.86 listening to inet:3331 with /usr/local/dcc

When I first ran `cdcc info', I got these errors logged:

Jun  9 16:13:19 setup01 dccd[1198]: [ID 702911 mail.notice] bad client or server-ID 25165824 from 130.179.16.64,33473 for NOP
Jun  9 16:13:19 setup01 dccd grey[1230]: [ID 702911 mail.notice] bad client or server-ID 25165824 from 130.179.16.64,33473 for NOP

I fixed that by removing the `map' file and reloading it from `map.txt'.
After that, `cdcc info' ran normally.

With single dccd and grey servers, is there any other way to do this
migration and still maintain the data in the databases?  Is there a
way to check their consistency, or has this already been done in the
startup?  On the production e-mail server, where there are also dccd and
grey servers running on another machine, is there a better way to use
those to rebuild the database?  I'm assuming that the network protocol
between DCC clients and servers is independant of byte order, so it's
only the on-disk databases that might have problems.

--
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: OS and architecture migration for DCC

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Gary Mills

> but the databases are mostly empty.  The migration is from Solaris 9
> to Solaris 10, from SPARC (big-endian) to x86 (little-endian), and
> from UFS to ZFS.  Talk about a torture test!  

If it hurts (and it will), then don't do that.

> I'm using the same version of DCC in both cases, but recompiled for
> the Solaris 10 x86 server.  On that server, I started with a DCC
> directory (/usr/local/dcc) that was a copy of the one from Solaris 9
> SPARC.  I then reinstalled DCC so that the executables would all be
> x86 ones.  The daemons logged some interesting errors, presumably due
> to the byte order, but eventually seemed to run normally.  This was
> for dccd:
>
> Jun  9 16:03:02 setup01 dccd[1198]: [ID 702911 mail.error] dcc_db has page size 16128 incompatible with 15654912 in dcc_db.hash


Don't do that.  Instead, copy the only the ASCII files.


> Jun  9 16:12:06 setup01 dccm[1256]: [ID 702911 mail.notice] 1.3.86 listening to inet:3331 with /usr/local/dcc
>
> When I first ran `cdcc info', I got these errors logged:

Instead copy only the ASCII files including map.txt, dcc_conf, and
flod, and only some of those.  For example, lines for your external
DCC flooding peers must not be in flod files on more than one system.


> With single dccd and grey servers, is there any other way to do this
> migration and still maintain the data in the databases?

No, there is no way.  Don't bother to waste time copying the database
files among systems.  Instead, let flooding rebuild the greylist and
main DCC databases on the new system from scratch as if the new
system had suffered a disk crash.

While you have both old and new systems running, you will want both
changes to be echoed on all systems.  So you will want to get flooding
going among the old and new systems.  With flooding, just let the new
system act as if it suffered a disk error that trashed the databases.
Disk or RAM problems are not exactly rare on new boxes, and so that
might happen even if you don't plan to start from that state.


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: OS and architecture migration for DCC

by Gary Mills :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 10, 2008 at 02:05:49AM +0000, Vernon Schryver wrote:

> > From: Gary Mills
>
> > I'm using the same version of DCC in both cases, but recompiled for
> > the Solaris 10 x86 server.  On that server, I started with a DCC
> > directory (/usr/local/dcc) that was a copy of the one from Solaris 9
> > SPARC.  I then reinstalled DCC so that the executables would all be
> > x86 ones.  The daemons logged some interesting errors, presumably due
> > to the byte order, but eventually seemed to run normally.  This was
> > for dccd:
> >
> > Jun  9 16:03:02 setup01 dccd[1198]: [ID 702911 mail.error] dcc_db has page size 16128 incompatible with 15654912 in dcc_db.hash
>
> Don't do that.  Instead, copy the only the ASCII files.

Well, that was certainly cleaner.  The databases are empty with an
isolated server, of course.  It will be adequate for me to test dccm
with sendmail.

> > With single dccd and grey servers, is there any other way to do this
> > migration and still maintain the data in the databases?
>
> No, there is no way.  Don't bother to waste time copying the database
> files among systems.

Okay, I see.

> While you have both old and new systems running, you will want both
> changes to be echoed on all systems.  So you will want to get flooding
> going among the old and new systems.  With flooding, just let the new
> system act as if it suffered a disk error that trashed the databases.

So with two servers with both dccd and grey databases and floods
between them, I should just start the upgraded server up with no
databases.  They will get initialized during startup and populated
from the other server.  Is that the correct procedure?  Of course,
there are external floods as well.  The client (dccm) would initially
communicate with the other server if I started it first and the
mapping file was correct.

--
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: OS and architecture migration for DCC

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Gary Mills

> So with two servers with both dccd and grey databases and floods
> between them, I should just start the upgraded server up with no
> databases.  They will get initialized during startup and populated
> from the other server.  Is that the correct procedure?

That is what I would do.

It would take a little while for the database to flood to the new
system.  Until the database on the new system is full, fitering
effectiveness would be reduced.  You could fiddle with the map files
to keep dccm from talking to the new system for a day or two.
Or just not worry about it, since after an hour or two the database
will caught up enough so that effectiveness won't be reduced my more
than 10%.


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc
LightInTheBox - Buy quality products at wholesale price