|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Negative position errorHi,
we are encountering the following error when initializing HOWL. Do you have any idea what can be causing this? Thank you Miro Halas Caused by: java.lang.IllegalArgumentException: Negative position at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:613) at org.objectweb.howl.log.BlockLogBuffer.read(BlockLogBuffer.java:412) at org.objectweb.howl.log.LogFileManager.read(LogFileManager.java:641) at org.objectweb.howl.log.LogBufferManager.replay(LogBufferManager.java:792) at org.objectweb.howl.log.Logger.replay(Logger.java:372) at ... -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Negative position errorCould you provide a bit more of the stack trace showing how replay was
invoked? Thanks Michael Giroux objectweb@bastafi dli.com To 11/20/2006 02:56 howl@... PM cc Subject Please respond to [howl] Negative position error howl@... g Hi, we are encountering the following error when initializing HOWL. Do you have any idea what can be causing this? Thank you Miro Halas Caused by: java.lang.IllegalArgumentException: Negative position at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:613) at org.objectweb.howl.log.BlockLogBuffer.read(BlockLogBuffer.java:412) at org.objectweb.howl.log.LogFileManager.read(LogFileManager.java:641) at org.objectweb.howl.log.LogBufferManager.replay(LogBufferManager.java:792) at org.objectweb.howl.log.Logger.replay(Logger.java:372) at ... -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Negative position errorHi,
I sent you more of the trace and also the code yesterday. Did you get it? For some reason it doesn't appear in the mailing list archive. Thank you, Miro -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Negative position errorYes, I did receive it, but it really doesn't help much.
The only thing that comes to mind is a problem we had with an old version of Linux. In that case we were getting some incorrect file positioning. Is it possible this applies? What system are you running on? Michael objectweb@bastafi dli.com To 11/21/2006 12:27 howl@... PM cc Subject Please respond to Re: Re: [howl] Negative position howl@... error g Hi, I sent you more of the trace and also the code yesterday. Did you get it? For some reason it doesn't appear in the mailing list archive. Thank you, Miro -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Negative position errorHi Micheal,
the system is Windows Server 2003 Standard Edition. I have just noticed something, which may be the cause of the problem. Could you please review my logic and see if I am on the right track? The file size of the cache causing the problem is 2,82GB. If the file position is int, this may be cause of a problem since it might be larger than 2GB and therefore would overflow and have negative value. I think this is caused by me since a long time ago I was caching quite a bit of data and I have configured HOWL with something like this (as you can see from the code I have sent you) cacheConfig.setMaxBlocksPerFile(s_iPersistorCount * s_iBundleSize * 100); s_iPersistorCount ~= 20 s_iBundleSize ~= 100 If I understand correctly, block size is configured using cacheConfig.setBufferSize and it is limited to 32, which represents 32K block which is 32695 bytes. Therefore the above config would result to max file size of 6539000000bytes or ~6GB. If this is the issue, I would recommend that HOWL should be checking for such illegal max size and maybe throw an exception during the configuration. Please let me know, what you think. Thank you, Miro Halas -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Negative position errorThanks for the additional information.
File positions are always long, but you have uncovered a problem in HOWL. The record key assigned to each record is composed of an int block sequence, and an offset within the block. The problem you discovered could occur with relatively small files as well. In HOWL, a block sequence number is never reused. As files roll over, the bsn continues to increment. This technique protects the application from requesting a block that has been overwritten. If the sequence numbers were related to physical addresses, then the application could request block 1 and would get some data from block 1. However, if the block had been overwritten, that would not be the actual data the application wanted. To solve this, the bsn is constantly incremented, so position 0 of a file would be block 1 initially, then when the file wraps around, it would be overwritten with block 100 for example. So if the files are around long enough, we will get into a situation where the bsn approaches 32 bits and the computation for seek address will result in a negative number. I'll have to develop a test case for this, then figure out how to resolve it. Michael objectweb@bastafi dli.com To 11/22/2006 11:11 howl@... AM cc Subject Please respond to Re: Re: Re: [howl] Negative howl@... position error g Hi Micheal, the system is Windows Server 2003 Standard Edition. I have just noticed something, which may be the cause of the problem. Could you please review my logic and see if I am on the right track? The file size of the cache causing the problem is 2,82GB. If the file position is int, this may be cause of a problem since it might be larger than 2GB and therefore would overflow and have negative value. I think this is caused by me since a long time ago I was caching quite a bit of data and I have configured HOWL with something like this (as you can see from the code I have sent you) cacheConfig.setMaxBlocksPerFile(s_iPersistorCount * s_iBundleSize * 100); s_iPersistorCount ~= 20 s_iBundleSize ~= 100 If I understand correctly, block size is configured using cacheConfig.setBufferSize and it is limited to 32, which represents 32K block which is 32695 bytes. Therefore the above config would result to max file size of 6539000000bytes or ~6GB. If this is the issue, I would recommend that HOWL should be checking for such illegal max size and maybe throw an exception during the configuration. Please let me know, what you think. Thank you, Miro Halas -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Negative position errorHello Micheal,
thank your for your assistance with this issue. Do you have any ETA when this bug could be resolved? Is there anything I can do or help with? We have an upcoming release of our application and I would like to include the updated HOWL if possible. Thank you, Miro Halas -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Negative position errorI was hoping there was no pressure on this. I am only in the office for 5
more days, then I'm out till January. Solving the problem is going to require a little thought. The primary issue is that the block sequence numbers increment continuously. The value is carried in an int, so the easiest solution to avoid it going negative is to increment then mask 31 bits. But that gets to a problem that I have not considered previously -- ultimately, the sequence number wraps around to zero. This creates an issue with restart because I use BSN to locate the logical end of the log by scanning until I find a block that has a BSN lower than the previous block. Essentially, once the jounal space is reused, the previous data will have very old BSNs with values lower than newer blocks, so the last good block is the one with the largest BSN. That strategy needs to be augmented a bit once BSNs wrap around to zero. The easy solution here is to include the Time field as part of the check. This might be an easy change, but I also need to look into the seek address calculations used by the methods that read the journal. Currently the calculations assume that BSN is ever increasing. I will have to look into this area as well. Since the main purpose of HOWL is to support recovery, these areas need to be very stable, so I need to develop some test cases to recreate the situation, and verify that any changes do not break recovery once the journal gets into this situation. The bottom line is that I'm not sure I can get this done in the next week. Until an update is available, I think the only avoidance is to delete and recreate the journal files periodically. If you ever get to a clean point where there is no data in the journal that needs recovery, you could delete and recreate the files. Not a friendly solution, sorry. Michael objectweb@bastafi dli.com To 12/01/2006 10:48 howl@... AM cc Subject Please respond to Re: Re: Re: Re: [howl] Negative howl@... position error g Hello Micheal, thank your for your assistance with this issue. Do you have any ETA when this bug could be resolved? Is there anything I can do or help with? We have an upcoming release of our application and I would like to include the updated HOWL if possible. Thank you, Miro Halas -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Re: Negative position errorMacheal,
thank you for paying attention to this problem. The issue we (one probably other users that may have across this problems) encounter is that once this problem occurs, the log becomes unusable since howl within the application cannot even start because the the replay listener throws an exception. Basically at this time we do not have a choice and we have to delete/remove the log file and start from scratch. This may be happening actually quite often, since I was getting reports about our app (handling millions of transactions every day) having problems to restart (thankfully when individual servers are taken offline for maintenance, not due to failure) more frequently than I would expect and the only solution was to remove the old logs. Regarding your solution, once thing which concerns me with the timestamp solution is that time can change (e.g. DST, synchronization, etc.) and this may cause unexpected situations. Not pretending I know much about internals of HOWL, have you consider recording during log file switch in the log file header the first and last recorded BSN for the log file and the first BSN for the current use of the log file? During recovery you could use this information to distinguish the old from the new ones since the old records are in between the first and last BSN for the previous use of the log file (here you have to account for the wrap around since first > last) and the new ones are the ones larger than the first written BSN for the current use of the lod that are not in the previously mentioned range. Therefore the last log record satisfying these two confitions would be the end of the log. Hope you have a good vacation. Miro -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Re: Negative position errorobjectweb@... wrote on 12/01/2006 01:20:38 PM: > Macheal, > > thank you for paying attention to this problem. The issue we (one > probably other users that may have across this problems) encounter is > that once this problem occurs, the log becomes unusable since howl within > the application cannot even start because the the replay listener throws > an exception. Basically at this time we do not have a choice and we have > to delete/remove the log file and start from scratch. This may be > happening actually quite often, I would be surprised if the frequency was very high. You have to write 2.1 billion blocks before the situaion occurs. If each block required 1 milli-second to write, and I doubt there is any hardware that can achieve that, it would take 24 days for the problem to occur. I agree that once it occurs, you are forced to start with new log files, so this is fairly serious. I'll figure something out. > since I was getting reports about our app > (handling millions of transactions every day) having problems to restart > (thankfully when individual servers are taken offline for maintenance, > not due to failure) more frequently than I would expect and the only > solution was to remove the old logs. Once the situation occurs, this is the only solution. > > Regarding your solution, once thing which concerns me with the timestamp > solution is that time can change (e.g. DST, synchronization, etc.) and > this may cause unexpected situations. Time stamp is System.currentTimeMillis(), so it is not effected by DST. > Not pretending I know much about > internals of HOWL, have you consider recording during log file switch in > the log file header the first and last recorded BSN for the log file and > the first BSN for the current use of the log file? During recovery you > could use this information to distinguish the old from the new ones since > the old records are in between the first and last BSN for the previous > use of the log file (here you have to account for the wrap around since > first > last) and the new ones are the ones larger than the first written > BSN for the current use of the lod that are not in the previously > mentioned range. Therefore the last log record satisfying these two > confitions would be the end of the log. Since there have been no bugs reported in a while, I have not had to look at the code for a while. I'll have to get my head back into this before I feel comfortable saying yes or no to any ideas. First requirement is to write some test case to reproduce this. It sounds as if you can wait for the fix. Good. > > Hope you have a good vacation. > > Miro > > > -- > You receive this message as a subscriber of the howl@... mailing list. > To unsubscribe: mailto:howl-unsubscribe@... > For general help: mailto:sympa@...?subject=help > ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Re: Negative position errorMiro,
I'm looking into a modification that should prevent this issue from occurring. I would like to check some info with you before I procede. The basic problem as I have described is that the integer BSN has rolled over on you. I was suggesting that if you write a new journal block every millisecond, then the rollover occurs every 24 days or so. My thought is to increase the size of the BSN to 40 bits. ( I cannot go to a full 64 bits because the keys returned by Logger.put include both a BSN and an offset within the block. ) I'm currently reserving 24 bits for offset, but this could be reduced to 16 or 20, but lets look at what 40 bits does for us. If we assume you are writing a journal block every millisecond continuously 24/7, then a 40 bit BSN would roll over once every 34 years. If you think this solution works for you, then I will start investigating the changes that need to be made. Anyone else on the list who might be watching this is welcome to offer opinions. Thanks Michael objectweb@bastafi dli.com To 12/01/2006 01:22 howl@... PM cc Subject Please respond to Re: Re: Re: Re: Re: [howl] Negative howl@... position error g Macheal, thank you for paying attention to this problem. The issue we (one probably other users that may have across this problems) encounter is that once this problem occurs, the log becomes unusable since howl within the application cannot even start because the the replay listener throws an exception. Basically at this time we do not have a choice and we have to delete/remove the log file and start from scratch. This may be happening actually quite often, since I was getting reports about our app (handling millions of transactions every day) having problems to restart (thankfully when individual servers are taken offline for maintenance, not due to failure) more frequently than I would expect and the only solution was to remove the old logs. Regarding your solution, once thing which concerns me with the timestamp solution is that time can change (e.g. DST, synchronization, etc.) and this may cause unexpected situations. Not pretending I know much about internals of HOWL, have you consider recording during log file switch in the log file header the first and last recorded BSN for the log file and the first BSN for the current use of the log file? During recovery you could use this information to distinguish the old from the new ones since the old records are in between the first and last BSN for the previous use of the log file (here you have to account for the wrap around since first > last) and the new ones are the ones larger than the first written BSN for the current use of the lod that are not in the previously mentioned range. Therefore the last log record satisfying these two confitions would be the end of the log. Hope you have a good vacation. Miro -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Re: Negative position errorMiro,
I am on vacation till Jan 2, but I decided to look at this a little more. I managed to reproduce your negative seek issue. This part of the problem is strictly related to journal files > 2gb. You can avoid this problem with current version of HOWL by changing configuration to use files that will be < 2 gb each. The issue with block sequence numbers is always detected by HOWL and results in an InvalidLogKeyException. This needs to be fixed as well, but it is not as severe as the large file issue. I would suggest reducing the size of your files until I'm able to generate a fix. Not sure how much time I'll be able to put into this while on vacation cause the honeydo list is pretty long :) Thanks for reporting this problem. I should be able to fix it quickly now that I have managed to generate a test case. Michael objectweb@bastafi dli.com To 12/01/2006 01:22 howl@... PM cc Subject Please respond to Re: Re: Re: Re: Re: [howl] Negative howl@... position error g Macheal, thank you for paying attention to this problem. The issue we (one probably other users that may have across this problems) encounter is that once this problem occurs, the log becomes unusable since howl within the application cannot even start because the the replay listener throws an exception. Basically at this time we do not have a choice and we have to delete/remove the log file and start from scratch. This may be happening actually quite often, since I was getting reports about our app (handling millions of transactions every day) having problems to restart (thankfully when individual servers are taken offline for maintenance, not due to failure) more frequently than I would expect and the only solution was to remove the old logs. Regarding your solution, once thing which concerns me with the timestamp solution is that time can change (e.g. DST, synchronization, etc.) and this may cause unexpected situations. Not pretending I know much about internals of HOWL, have you consider recording during log file switch in the log file header the first and last recorded BSN for the log file and the first BSN for the current use of the log file? During recovery you could use this information to distinguish the old from the new ones since the old records are in between the first and last BSN for the previous use of the log file (here you have to account for the wrap around since first > last) and the new ones are the ones larger than the first written BSN for the current use of the lod that are not in the previously mentioned range. Therefore the last log record satisfying these two confitions would be the end of the log. Hope you have a good vacation. Miro -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws -- You receive this message as a subscriber of the howl@... mailing list. To unsubscribe: mailto:howl-unsubscribe@... For general help: mailto:sympa@...?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
|
|
Re: Re: Re: Re: Negative position errorMiro,
I'm cleaning up my inbox and noticed this message. Just in case you did not notice, I did issue an update that resolves this problem. Michael objectweb@bastafi dli.com To 12/01/2006 10:48 howl@... AM |