NTLM usrname/password failure after each 5 mins

View: New views
5 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Re: NTLM usrname/password failure after each 5 mins

by AsafM :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

After investigating some more on the issue, I've found out that the load test problem is not in the synchronize issue I've raised, but its in the hiccup you've described.

The Hiccup as I've seen it
The transport is not being used for 5 minutes, since the load test is using the same users, thus jCIFS is using the cache. Once the transport performs the disconnect and then the connect, the transport.server.encryptionKey changes. Since the encryptionKey is the challenge jCIFS returns to the browser on the type2 message, it is the heart of the problem.
The problem occurs when we send the challenge, before the disconnect ,in the type2 message. While the browser is processing and preparing the type3 message using that challenge (let's call it A), the disconnect/connect occurs and we have a new encryption key / challenge we're using when we communicate with the DC (Let's call this challenge B). After the connect completes, we receive the type3 message which was prepared using challenge A, and send it to the DC, while he expects us to use challenge B. That's why we get the "bad username or password" exception.


Trying to solve it
I can detect in the NtlmFilter, this situation, if I save the challenge, used in the type2 message, on the session, and compare it with the challenge currently being used in the transport, before processing the type 3 message.

I've tried sending a 401 with WWW-Authenticate: NTLM, to restart the process, this time with right challenge, but it didn't work.

I've also tried sendind a redirect (301) to the same resource, to restart the NTLM, but it fails with a circular redirect.

If the load test is using a small number of threads, then the problem doesn't happen, since the time between the type3 message is processed quick enough, before we the disconnect occurs.
If we load it with many threads, it always happens.

Guidance is needed
1. Can we some how know how much time the socket have before reaching timeout? If we did, maybe we could use it when we prepare type2 message. If we're too close to the disconnect, then we'll force it, or wait until it happens and then send in the type2 message.

2. What's the implications of keeping it open indefenitely? Or just "keeping it alive" somehow?

3. Do you have any idea on how to handle this issue?


Thank you,

Asaf


On Mon, Jun 16, 2008 at 1:00 AM, Michael B Allen <ioplex@...> wrote:
On 6/15/08, AsafM <asaf.mesika@...> wrote:
>
>  Hi all,
>
>  I'm reviving a 2 years old topic, regarding load testing.
>  You can take a look at the entire thread of the discussion
>  http://www.nabble.com/NTLM-usrname-password-failure-after-each-5-mins-td5381546.html#a5391633
>  here
>
>  I'll start with a quick summary, and then shed lots of details to make it
>  clearer:
>  After the transport disconnects (due to socket timeout) and connects, the
>  first 10 , or so, attempts to authenticate against the DC fails on bad
>  username/password. After those failures, all attempts succeeds.
>  I've gained some knowledge I'll now share, but I'm still missing some key
>  elements to figuring this out.
>
>  Load Testing Setup
>  110 threads, consistently accessing a protected resource on Tomcat, which
>  requires an NTLM authentication.
>  Each thread is using one user. For example: Thread-34 is logging in as user
>  TEST34.
>
>  The turn of events
>  1. The first thread accessing the resource, setups the session
>  (SmbSession.sessionSetup()), which blocks all other threads, since each
>  thread (user) requires to setup a session of its own.
>     The session setup runs the Transport.connect(), creates a tree for the
>  default user (to enable SMB signing), and send the SmbComSessionSetupAndX to
>  the DC, for authentication.
>
>  2. Once the 1st session setup is done, all other threads follows, each
>  creating its own session, attached to one transport object (Transport-1
>  thread).
>
>  3. On the second iteration of the test threads, there's no need for session
>  setup. The session object is retrieved from the transport (it's cached
>  there).
>  This usage of cache causes the lack of usage in the transport socket.
>
>  4. After soTimeout (jcifs constant of 5 min), the loop() method of Transport
>  receives a SocketTimeoutException, and calls Transport.disconnect() which in
>  turn calls SmbTransport.doDisconnect().
>
>  5. The doDisconnect() logs off all sessions attached to the transport
>  object, closes down the socket and finally resets the digest property, which
>  is used to sign each request sent to the DC (this is set in the first
>  sessionSetup in SmbSession).
>
>     ** First Problem**
>  While disconnects logs-off sessions, other threads were using them, and
>  acting as-if the transport is connected.

It is ok for other threads to reference sessions. If there is no
activity on the socket then it should be possible to close the
sessions even if there are 100 threads constantly calling
SmbSession.logon().

But the "acting as-if they transport is connected" sounds suspicious.
When a transport is shutdown it should call logoff() on each session
which should call treeDisconnect() on each transport which should set
treeConnected = false. Then, if threads regain access to calling
SmbSession.logon() they should see treeConnected = false and the first
thread should reconnect the tree, re-logon the session and reconnect
the transport. Then subsequent threads see treeConnected and you're
back in the steady-state.

>  I've bypassed this issue, by:
>  a) Setting the Transport.state to 0 in the Transport.disconnect() function.
>  This causes the Transport.connect() to actually connect.
>  b) Adding a synchronize (this) block on both disconnect() and connect()
>  methods, which prevents running connect() while disconnect() is commencing.

I don't understand this. The Transport.connect()/disconnect() methods
are already synchronized and the transport state is changed to 0 in
disconnect().

>  6. While disconnect() was running, all other threads were waiting in queue,
>  to run transport.connect(), in the SmbTree.treeConnect() method.
>     Once the disconnect finished, each thread in its turn, ran the connect
>  and cotinued for creating a session by running SmbSession.sessionSetup().
>  Since that function is syncrhonized on transport(), sessions were created
>  once at a time, for each thread.
>
>  7. The first session to run the setup, identified that the transport.digest
>  was empty (due to SmbTransport.doDisconnect()), thus ran treeConnect on the
>  default username, used for SMB signing.
>  Once that was finished successfully, it sent the SmbComSessionSetupAndX for
>  the user it was trying to authenticate.
>  It failed in the DC. SmbComSessionSetupAndXResponse returned with an error
>  code: Logon failure: unknown user name or bad password
>
>  8. Allot of threads after the first thread inline, failed also on the exact
>  spot in the sessionSetup().

There is a known "hiccup" that occurs whenever the connection is
recycled due to the soTimeout. I don't know what the problem is. I
assume the challenge is momentarily wrong.

>  9. From some magical reason, which I'm yet to figure out, after 10 or so
>  failures, the DC started returning success in the
>  SmbComSessionSetupAndXResponse.

Is the NTLM challenge old? Log the hexdump of the NTLM challenge and
see if it changes with the result of the
SmbComSessionSetupAndXResponse. If it does that confirms that the
challenge isn't being handled properly. If it does not change and the
new challenge is being used correctly, but the DC is returning
different results given the same input then that would be very
interesting.

This is the best analysis of the "hiccup" bug that I've seen. Aside
from my comments, everything you say is true and is expected behavior.
The interesting parts are the "acting as-if they transport is
connected" bit and what the challenge is spanning the authentication
failure / success.

Mike

--
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/


 


Re: NTLM usrname/password failure after each 5 mins

by Michael B Allen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 6/24/08, Asaf Mesika <asaf.mesika@...> wrote:

> After investigating some more on the issue, I've found out that the load
> test problem is not in the synchronize issue I've raised, but its in the
> hiccup you've described.
>
>  The Hiccup as I've seen it
>  The transport is not being used for 5 minutes, since the load test is using
> the same users, thus jCIFS is using the cache. Once the transport performs
> the disconnect and then the connect, the transport.server.encryptionKey
> changes. Since the encryptionKey is the challenge jCIFS returns to the
> browser on the type2 message, it is the heart of the problem.
>  The problem occurs when we send the challenge, before the disconnect ,in
> the type2 message.

Aha. Interesting. Nice work!

Unfortunately this is a little difficult to fix because there is no
way to know if or when the HTTP client will submit hashes that use
that challenge. The HTTP client could get the challenge, sleep for an
hour, and then submit the type-3-message. Or it could get the
challenge and never submit hashes at all. There's no way to handle
both of those cases gracefully. It's just a limitation in trying to do
stateful authentication over a stateless protocol.

However, in practice there are idle periods where no clients are
authenticating and it is safe to disconnect the client and thereby
free associated resources. We just need to >>stop the transport from
disconnecting as long as there are outstanding challenges<<.

To implement this, add a "refcount" to jcifs.util.transport.Transport.
In loop(), detect the soTimeout case (I don't recall what type of
Exception this triggers) and do NOT call disconnect() if refcount > 0.

Then, in SmbSession.interrogate(), change trans.server.encryptionKey
to trans.getEncryptionKey() and in SmbTransport.getEncryptionKey(),
increment trans.refcount++. Also, in SmbSession.logon(), decrement
trans.refcount-- if it is > 0 after the challenge is used.

However, this will result in a net positive refcount value because
clients may not send the type-3-message and, again, in general there
are many things that can go wrong that prevent logon from ever being
called. So, you also need to decrement trans.refcount in another more
aggressive way. For example, you could decrement the refcount every
time you get the exception triggered by soTimeout but do not
disconnect. Meaning refcount will be decremented every time soTimeout
fires. That could be enough to offset any misbehaving clients. You
might also cap the refcount value at some high value that would almost
never be reached with HTTP clients that are behaving properly.

I'm just brainstorming at this point but basically you need to stop
the transport from disconnecting if there are outstanding challenges.
How you detect when there are outstanding challenges and when they
should no longer be counted is up to you.

Mike


--
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/

Re: NTLM usrname/password failure after each 5 mins

by Kevin Tapperson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I ran into this same problem a few years ago.  I implemented a refcount as described and it helped prevent this type of issue.  I think I've still got the modified code laying around somewhere.  If it helps, I could dig it up and provide a diff of the changes.

http://article.gmane.org/gmane.network.samba.java/4372/match=nt%5fstatus%5faccess%5fviolation


On Tue, Jun 24, 2008 at 1:17 PM, Michael B Allen <ioplex@...> wrote:
On 6/24/08, Asaf Mesika <asaf.mesika@...> wrote:
> After investigating some more on the issue, I've found out that the load
> test problem is not in the synchronize issue I've raised, but its in the
> hiccup you've described.
>
>  The Hiccup as I've seen it
>  The transport is not being used for 5 minutes, since the load test is using
> the same users, thus jCIFS is using the cache. Once the transport performs
> the disconnect and then the connect, the transport.server.encryptionKey
> changes. Since the encryptionKey is the challenge jCIFS returns to the
> browser on the type2 message, it is the heart of the problem.
>  The problem occurs when we send the challenge, before the disconnect ,in
> the type2 message.

Aha. Interesting. Nice work!

Unfortunately this is a little difficult to fix because there is no
way to know if or when the HTTP client will submit hashes that use
that challenge. The HTTP client could get the challenge, sleep for an
hour, and then submit the type-3-message. Or it could get the
challenge and never submit hashes at all. There's no way to handle
both of those cases gracefully. It's just a limitation in trying to do
stateful authentication over a stateless protocol.

However, in practice there are idle periods where no clients are
authenticating and it is safe to disconnect the client and thereby
free associated resources. We just need to >>stop the transport from
disconnecting as long as there are outstanding challenges<<.

To implement this, add a "refcount" to jcifs.util.transport.Transport.
In loop(), detect the soTimeout case (I don't recall what type of
Exception this triggers) and do NOT call disconnect() if refcount > 0.

Then, in SmbSession.interrogate(), change trans.server.encryptionKey
to trans.getEncryptionKey() and in SmbTransport.getEncryptionKey(),
increment trans.refcount++. Also, in SmbSession.logon(), decrement
trans.refcount-- if it is > 0 after the challenge is used.

However, this will result in a net positive refcount value because
clients may not send the type-3-message and, again, in general there
are many things that can go wrong that prevent logon from ever being
called. So, you also need to decrement trans.refcount in another more
aggressive way. For example, you could decrement the refcount every
time you get the exception triggered by soTimeout but do not
disconnect. Meaning refcount will be decremented every time soTimeout
fires. That could be enough to offset any misbehaving clients. You
might also cap the refcount value at some high value that would almost
never be reached with HTTP clients that are behaving properly.

I'm just brainstorming at this point but basically you need to stop
the transport from disconnecting if there are outstanding challenges.
How you detect when there are outstanding challenges and when they
should no longer be counted is up to you.

Mike


--
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/



--
Kevin Tapperson
kevin@...
(615) 403-0817

Re: NTLM usrname/password failure after each 5 mins

by Vircos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


We have noticed something similar but a slight different. We have pre-authentication enabled. When the first user connects, the user is authenticated fine. If a second user connects after the first user this user is not able to authenticate until the session of the first user has timed out (after 5 minutes).

We are using the latest JCIFS version with Apache/Tomcat 5.5 on a W2k3 server.
Any ideas?

Re: NTLM usrname/password failure after each 5 mins

by Matt Parker-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> Actually accessing IPC$ is normal. I don't know what I was thinking
> before. I don't know what the problem is.
>
> Although I think the jcifs.smb.client.dfs.disabled property was added in 1.2.22.
>
> Mike

Did you ever get any insight to this? I'm now running into this
consistently. I'll post a test case later but thought I'd first ask.
< Prev | 1 - 2 | Next >
LightInTheBox - Buy quality products at wholesale price!