[jira] Created: (JCR-1605) RepositoryLock does not work on NFS sometimes

View: New views
5 Messages — Rating Filter:   Alert me  

[jira] Created: (JCR-1605) RepositoryLock does not work on NFS sometimes

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RepositoryLock does not work on NFS sometimes
---------------------------------------------

                 Key: JCR-1605
                 URL: https://issues.apache.org/jira/browse/JCR-1605
             Project: Jackrabbit
          Issue Type: Bug
            Reporter: Thomas Mueller
            Assignee: Thomas Mueller


The RepositoryLock mechanism currently used in Jackrabbit uses FileLock. This doesn't work on some NFS file system. It looks like only NFS version 4 and newer supports locking. Older implementations may throw a IOException "No locks available", which means the NFS does not support byte-range locking.

I propose to add a second locking mechanism, and add a configuration option to use it. For example: <FileLocking class="acme" />. This second locking mechanism is a cooperative locking protocol that uses a background (watchdog) thread and only uses regular file operations.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1605) RepositoryLock does not work on NFS sometimes

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/JCR-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597165#action_12597165 ]

Julian Reschke commented on JCR-1605:
-------------------------------------

Does this need to be configurable? Wouldn't it be sufficient to catch the Exception and then fall back to the new approach?


> RepositoryLock does not work on NFS sometimes
> ---------------------------------------------
>
>                 Key: JCR-1605
>                 URL: https://issues.apache.org/jira/browse/JCR-1605
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> The RepositoryLock mechanism currently used in Jackrabbit uses FileLock. This doesn't work on some NFS file system. It looks like only NFS version 4 and newer supports locking. Older implementations may throw a IOException "No locks available", which means the NFS does not support byte-range locking.
> I propose to add a second locking mechanism, and add a configuration option to use it. For example: <FileLocking class="acme" />. This second locking mechanism is a cooperative locking protocol that uses a background (watchdog) thread and only uses regular file operations.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1605) RepositoryLock does not work on NFS sometimes

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/JCR-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597450#action_12597450 ]

Thomas Mueller commented on JCR-1605:
-------------------------------------

> Does this need to be configurable?

Yes, unless we only want to support the new mechanism.

> Wouldn't it be sufficient to catch the Exception and then fall back to the new approach?

No. The file system may not always throw an exception. The message "No locks available" sounds like there is a number of locks, and if more locks are used this exception occurs. This would mean that sometimes the new algorithm is used and sometimes the old. This wouldn't work correctly.

The new mechanism I have in mind is a cooperative algorithm. This algorithm is already in use in the H2 database. It goes like this:

*  When the lock file does not exist, it is created (using the atomic operation File.createNewFile). Then, the process waits a little bit (20ms) and checks the file again. If the file was changed during this time, the operation is aborted. This protects against a race condition when a process deletes the lock file just after one create it, and a third process creates the file again. It does not occur if there are only two writers.

* If the file can be created, a random number is inserted. Afterwards, a watchdog thread is started that checks regularly (every second once by default) if the file was deleted or modified by another (challenger) thread / process. Whenever that occurs, the file is overwritten with the old data. The watchdog thread runs with high priority so that a change to the lock file does not get through undetected even if the system is very busy. However, the watchdog thread does use very little resources (CPU time), because it waits most of the time. Also, the watchdog only reads from the hard disk and does not write to it.

* If the lock file exists, and it was modified in the 20 ms, the process waits for some time (up to 10 times). If it was still changed, an exception is thrown ("locked"). This is done to eliminate race conditions with many concurrent writers. Afterwards, the file is overwritten with a new version (challenge). After that, the thread waits for 2 seconds. If there is a watchdog thread protecting the file, he will overwrite the change and this process will fail to lock. However, if there is no watchdog thread, the lock file will still be as written by this thread. In this case, the file is deleted and atomically created again. The watchdog thread is started in this case and the file is locked.



> RepositoryLock does not work on NFS sometimes
> ---------------------------------------------
>
>                 Key: JCR-1605
>                 URL: https://issues.apache.org/jira/browse/JCR-1605
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> The RepositoryLock mechanism currently used in Jackrabbit uses FileLock. This doesn't work on some NFS file system. It looks like only NFS version 4 and newer supports locking. Older implementations may throw a IOException "No locks available", which means the NFS does not support byte-range locking.
> I propose to add a second locking mechanism, and add a configuration option to use it. For example: <FileLocking class="acme" />. This second locking mechanism is a cooperative locking protocol that uses a background (watchdog) thread and only uses regular file operations.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1605) RepositoryLock does not work on NFS sometimes

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/JCR-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597907#action_12597907 ]

Alexander Klimetschek commented on JCR-1605:
--------------------------------------------

> The message "No locks available" sounds like there is a number of locks, and if more locks are used this exception occurs.

No, this error message meant that the file system does not provide the file locking feature at all. An NFS 2 or 3 was used which don't have the feature at all. Only NFS 4 allows locking. But it's not recommended to rely on it.

Anyway, I would also be against an automatic mix of lock algorithms. For example, current file locking might work perfectly on the NFS server, since it accesses the local file system directly, which will most probably provide file locking. Another jackrabbit running as an NFS (2/3) client, accessing the same repository directory, would get an exception "No locks available". If it would continue with another algorithm, it will not see the repository as locked and start it.

> RepositoryLock does not work on NFS sometimes
> ---------------------------------------------
>
>                 Key: JCR-1605
>                 URL: https://issues.apache.org/jira/browse/JCR-1605
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> The RepositoryLock mechanism currently used in Jackrabbit uses FileLock. This doesn't work on some NFS file system. It looks like only NFS version 4 and newer supports locking. Older implementations may throw a IOException "No locks available", which means the NFS does not support byte-range locking.
> I propose to add a second locking mechanism, and add a configuration option to use it. For example: <FileLocking class="acme" />. This second locking mechanism is a cooperative locking protocol that uses a background (watchdog) thread and only uses regular file operations.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1605) RepositoryLock does not work on NFS sometimes

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/JCR-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599299#action_12599299 ]

Thomas Mueller commented on JCR-1605:
-------------------------------------

A configurable lock mechanism will also help support a 'file system less' repository (where no files are stored at all). Use cases: database-based repository; in-memory repository (for example for testing)

> RepositoryLock does not work on NFS sometimes
> ---------------------------------------------
>
>                 Key: JCR-1605
>                 URL: https://issues.apache.org/jira/browse/JCR-1605
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> The RepositoryLock mechanism currently used in Jackrabbit uses FileLock. This doesn't work on some NFS file system. It looks like only NFS version 4 and newer supports locking. Older implementations may throw a IOException "No locks available", which means the NFS does not support byte-range locking.
> I propose to add a second locking mechanism, and add a configuration option to use it. For example: <FileLocking class="acme" />. This second locking mechanism is a cooperative locking protocol that uses a background (watchdog) thread and only uses regular file operations.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.