CEE Taxonomy: Enumeration or Language?

31 Messages Forum Options Options
Permalink
1 2
heinbockel
CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink

While there is general agreement that log types
need to be
consistently and accurately identified, there has
been no
agreement as to the best way to pursue this need.


GOAL: To be able to unambiguously group logs based
upon
the representative event type.


For example, if given a log file, I should be able
to
quickly identify all logs dealing with events
related to
"authentication", "privilege elevation", or
"configuration
changes". In order to more closely relate
regulatory and
policy-level mandates to the actual logs, this is
a
necessity for most (all?) users.


There are two ways that this can be handled:

1. Language -- every event can be described as a
(possibly
unknown) SUBJECT performing an ACTION on an
OBJECT. The
process requires a developer to select the most
appropriate word choice from each of the 3
categories.


2. Enumeration -- provide a listing of all of the
event
types. Each event matches to exactly one
enumerated type.

In the most recent draft (v2.0) of XDAS, there is
a
multi-level dotted-notation (similar to SNMP OIDs)
to
enumerate events. The first level is the registry
id, then
the provider id, followed by the "event space"
(category),
and finally the singleton event. For instance,
"0.0.7.0"
identifies the "Create association with data item"
event
type in the "Data Item or Resource Element Content
Access
Events" category, as provided by OpenGroup and
defined
in the OpenGroup registry. Other examples of the
current
categories are "Account Management Events", "Trust
Management Events", and "Peer Association
Management
Events", consisting of singleton event types such
as
"Backup datastore", "Invoke service or
application",
"Create account".

(Both approaches agree must be some expression of
the result.)

Now, each of these approaches has merit. What this
discussion breaks down to is that a language, like
OVAL,
better captures nuances and allows for more
flexibility.
An enumeration (e.g., CWE, CVE) is more precise,
requires
more "well-defined" boundaries, and is better for
computers.

The CEE Taxonomy problem can be solved with either
a
language or an enumeration-based approach. With
past
standardization efforts, MITRE has used use-cases
as the
primary driver. With CVE, the primary use was the
differentiation of vulnerabilities, for which an
enumeration works really well. For OVAL, there are
too
many different ways of validation/verification
across
platforms.


Just something to put some thought into over the
holiday.
I am interested in hearing any feedback from this
group as
to which you think is more appropriate for
expressing the
type of log.


William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
heinbockel@...
781-271-2615




smime.p7s (4K) Download Attachment
Sanford Whitehouse
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
1.  Language.

a.  Enumeration can change.

Logs written with enumerations are subject to keeping track of
enumeration dictionary changes.  Each message will need to have a
version, or some other form or relating the dictionary to the log or
message.  In SNMP, the OID fields have a hierarchy that has been
constant for quite a while.  But use of fields, past the enterprise
identifier, is up to the vendor.  It's mapped to the MIB.  They can use
it any way they like.  (Correct me if I'm wrong.  It's been a while...)

That's not the case here.  The taxonomy definition will have a set of
defined fields, and values for those field that will, presumably, remain
constant.  There will be changes to the definitions until a balance has
been found.  Changes to enumeration will change archived logs.

b.  The first level users are administrators (system and product.)  The
logs they read should be understandable without translation.  

Administrators and users are not fond of enumeration of any kind, let
alone one that would limit them to guessing what has been logged.

2.  Where can enumeration apply?  At the level where data is exchanged
between log consumers.  

a.  It's a useful shorthand for keeping traffic down and structure the
representation.  However, I wouldn't use it for long term archival.
Keeping track of dictionaries and log versions is too complex.

b.  In cases where event descriptions, and other values, are enumerated
is usually in databases and database applications, or complex logging
environments such as i5/OS or z/OS.  It's a natural for them, and very
useful for reporting.  All of those have an environment or
infrastructure that supports use.  One can't use the logs or log access
tools without the environment being complete and up to date.  When the
logs are moved out, the dictionaries have to move with them.  It's a big
pain.

3.  If I can't get to it, it's not much use.

The basic argument is today I can read most logs with just about any
editor.  If taxonomy enumeration is used, that may not be possible.  I
like it simple when it comes to me and me logs.

Sanford

-----Original Message-----
From: Heinbockel, Bill [mailto:heinbockel@...]
Sent: Wednesday, July 02, 2008 7:12 AM
To: CEE-DISCUSSION-LIST@...
Subject: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language?


While there is general agreement that log types need to be consistently
and accurately identified, there has been no agreement as to the best
way to pursue this need.


GOAL: To be able to unambiguously group logs based upon the
representative event type.


For example, if given a log file, I should be able to quickly identify
all logs dealing with events related to "authentication", "privilege
elevation", or "configuration changes". In order to more closely relate
regulatory and policy-level mandates to the actual logs, this is a
necessity for most (all?) users.


There are two ways that this can be handled:

1. Language -- every event can be described as a (possibly
unknown) SUBJECT performing an ACTION on an OBJECT. The process requires
a developer to select the most appropriate word choice from each of the
3 categories.


2. Enumeration -- provide a listing of all of the event types. Each
event matches to exactly one enumerated type.

In the most recent draft (v2.0) of XDAS, there is a multi-level
dotted-notation (similar to SNMP OIDs) to enumerate events. The first
level is the registry id, then the provider id, followed by the "event
space"
(category),
and finally the singleton event. For instance, "0.0.7.0"
identifies the "Create association with data item"
event
type in the "Data Item or Resource Element Content Access Events"
category, as provided by OpenGroup and defined in the OpenGroup
registry. Other examples of the current categories are "Account
Management Events", "Trust Management Events", and "Peer Association
Management Events", consisting of singleton event types such as "Backup
datastore", "Invoke service or application", "Create account".

(Both approaches agree must be some expression of the result.)

Now, each of these approaches has merit. What this discussion breaks
down to is that a language, like OVAL, better captures nuances and
allows for more flexibility.
An enumeration (e.g., CWE, CVE) is more precise, requires more
"well-defined" boundaries, and is better for computers.

The CEE Taxonomy problem can be solved with either a language or an
enumeration-based approach. With past standardization efforts, MITRE has
used use-cases as the primary driver. With CVE, the primary use was the
differentiation of vulnerabilities, for which an enumeration works
really well. For OVAL, there are too many different ways of
validation/verification across platforms.


Just something to put some thought into over the holiday.
I am interested in hearing any feedback from this group as to which you
think is more appropriate for expressing the type of log.


William Heinbockel
Infosec Engineer, Sr.
The MITRE Corporation
202 Burlington Rd. MS S145
Bedford, MA 01730
heinbockel@...
781-271-2615
David Corlette
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
In reply to this post by heinbockel
GOAL: To be able to unambiguously group logs based upon the representative event type.


My 2c:

One of the great benefits we get from a multi-level taxonomy (which in Sentinel we've been applying for years by dint of lots of manual effort applied to the chaotic log messages we get from a wide variety of vendors) is the ability to filter and group things easily.  So for example I can say "show me all account management activity (i.e. 0.0.0.* in the current proposed XDAS taxonomy)" and "show me all data access activity (i.e. 0.0.7.*)".  If I only want file/table writes, then I can get more specific "show me only data writes (0.0.7.5)".

Whether this hierarchic taxonomy is expressed as dotted numbers or as words is probably unimportant; with XDAS we went with numbers because they are more compact and can be matched/filtered more easily (and when processing thousands of events per second this becomes important). But there's a one-to-one correspondence with verbs built into the taxonomy already, so conversion is trivial if you want to read the logs manually.

The point being, based on our experience we feel that a true hierarchic taxonomy is critical to a proper event standard.
heinbockel
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink

>-----Original Message-----
>From: David Corlette
>[mailto:DCorlette@...]
>Sent: Wednesday, 02 July 2008 15:34
>To: cee-discussion-list CEE-Related Discussion
>Subject: Re: [CEE-DISCUSSION-LIST] CEE
>Taxonomy: Enumeration or Language?
>
>GOAL: To be able to unambiguously group logs
>based upon the representative event type.
>
>
>My 2c:
>
>One of the great benefits we get from a multi-
>level taxonomy (which in Sentinel we've been
>applying for years by dint of lots of manual
>effort applied to the chaotic log messages we
>get from a wide variety of vendors) is the
>ability to filter and group things easily.  So
>for example I can say "show me all account
>management activity (i.e. 0.0.0.* in the
>current proposed XDAS taxonomy)" and "show me
>all data access activity (i.e. 0.0.7.*)".  If I
>only want file/table writes, then I can get
>more specific "show me only data writes
>(0.0.7.5)".
>
>Whether this hierarchic taxonomy is expressed
>as dotted numbers or as words is probably
>unimportant; with XDAS we went with numbers
>because they are more compact and can be
>matched/filtered more easily (and when
>processing thousands of events per second this
>becomes important). But there's a one-to-one
>correspondence with verbs built into the
>taxonomy already, so conversion is trivial if
>you want to read the logs manually.
Actually, I am not convinced there is a one-to-one
correspondence with the language verbs.

For example take something simple like
authentication.
Is it important to distinguish in the taxonomy the
difference between a user authenticating to an
operating system, a web service, or su/sudo? What
about
things like SSO, where an application
authenticates
on your behalf?


>
>The point being, based on our experience we
>feel that a true hierarchic taxonomy is
>critical to a proper event standard.

The problem here is that any "true hierarchic"
taxonomy
is based on a single use case. With enumerations,
it is
easy to create a hierarchy. By strictly limiting
the scope
to security audit, the case can be made to support
a
security audit taxonomy.



smime.p7s (4K) Download Attachment
Tina Bird
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
Sanford's response incorporated much of what I was going to say. I strongly
prefer a language-based approach to an enumerative approach, primarily
because as a system administrator, I don't want to be dependent on some sort
of translator that takes me from a numeric event to words (a la SNMP).
Current log messages are a mess, as far as useability goes, but I'd still
rather see "login failed" (even without a failure reason, or a host IP
address) than anything that looks like an OID.

> For example take something simple like
> authentication.
> Is it important to distinguish in the taxonomy the
> difference between a user authenticating to an
> operating system, a web service, or su/sudo? What
> about
> things like SSO, where an application
> authenticates
> on your behalf?

It's both simpler *and* more complex than this (although it's a complexity I
think we can clarify). From the subject-verb-object point of view that Bill
described originally, the user is the *object*, not the subject. The user
provides credentials, but in each case, the *subject* (ie. the entity that
is causing the log message to be created) is whichever application or system
process the user is trying to access: login on a UNIX box, the Local
Security Authority on a Windows system, or an application like sudo or a
database. So each of these authentication examples boils down to

subject: system process or application requiring authentication
verb: auth succeeded or auth failed
object: user (or process or application) from which authentication is
required

This is good, because it unifies how we format and interpret all
authentication events, no matter which entities (users, processes, apps) are
involved.

It's complex because it's unintuitive, as demonstrated by the familiar
phrase "the user authenticating" (which implies that the user is the subject
of the action, which is not true) or even worse, "log on to our Web site"
(for a site that does not require authentication), which usually means "use
your Web browser to access our nifty content" and has nothing to do with
authentication whatsoever.

Using a subject-verb-object model will require us to provide very clear and
specific instructions on how to *identify* the subject and object correctly,
but that's Merely a Matter of Documentation :-) I'm especially fond of it
because using this format strongly encourages programmers, analysts and
whoever else ends up influencing logging to think rigorously about the
"workflow" for the given event. [I was going to say "forces" rather than
"strongly encourages," but then common sense kicked in.]

> The problem here is that any "true hierarchic"
> taxonomy
> is based on a single use case. With enumerations,
> it is
> easy to create a hierarchy. By strictly limiting
> the scope
> to security audit, the case can be made to support
> a
> security audit taxonomy.

I think I agree with this, if I understand what Bill is saying. Defining a
single hierarchy that will incorporate all the various types of logs out
there seems, uh, implausible. But for particular situations -- credit card
transactional data, user management, system and application updates -- we
can probably provide meaningful hierarchies.

cheers -- tbird
Tina Bird
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
One small correction:

> So each of these authentication examples boils down to
>
> subject: system process or application requiring authentication
> verb: auth succeeded or auth failed
> object: user (or process or application) from which authentication is
required

It's more precise to say that the subject is the process or application
requesting or mediating the authentication...
David Corlette
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
In reply to this post by Tina Bird
> Current log messages are a mess, as far as useability goes, but I'd still
> rather see "login failed" (even without a failure reason, or a host IP
> address) than anything that looks like an OID.

I think it's important to note that this is only one particular use case.  Most of our customers are far more interested in automated processing and analysis than in spending their time poring through logs manually (although of course the latter is necessary on occasion).

>> Is it important to distinguish in the taxonomy the
>> difference between a user authenticating to an
>> operating system, a web service

I say no, the action is still an authentication, but the "target" is different.


>> su/sudo?

We call this privilege escalation, which is fundamentally different, although maybe "escalation" makes too many assumptions.


>> What
>> about
>> things like SSO, where an application
>> authenticates
>> on your behalf?

Still authentication. The act of creating an authorized session is a separate event in my opinion.


> It's both simpler *and* more complex than this (although it's a complexity I
> think we can clarify). From the subject-verb-object point of view that Bill
> described originally, the user is the *object*, not the subject. The user
> provides credentials, but in each case, the *subject* (ie. the entity that
> is causing the log message to be created) is whichever application or system
> process the user is trying to access: login on a UNIX box, the Local
> Security Authority on a Windows system, or an application like sudo or a
> database. So each of these authentication examples boils down to
>
> subject: system process or application requiring authentication
> verb: auth succeeded or auth failed
> object: user (or process or application) from which authentication is
> required

The semantic confusion caused by the above we find to be disheartening. This is a separate topic than taxonomy, but XDAS defines three different "objects" that relate to any event:

Initiator: The user, service, and/or system that *causes* an event to occur
Target: The user, service, system, trust, or data object that is *affected* by an event
Observer: The service or system that *detected* the event and generates a log message reporting that fact

The above covers the authentication case easily even if there's a third-party authentication engine involved and if the event is actually reported by an IDS or similar system.


> It's complex because it's unintuitive, as demonstrated by the familiar
> phrase "the user authenticating" (which implies that the user is the subject
> of the action, which is not true) or even worse, "log on to our Web site"
> (for a site that does not require authentication), which usually means "use
> your Web browser to access our nifty content" and has nothing to do with
> authentication whatsoever.

I don't find this one particularly complex if put in the context of the above structure. If an actual person is authenticating, then that person is the Initiator, but of course real people aren't really represented in computerese (except in Identity Management systems, which we can discuss later).  It should be obvious that that real person is attempting to access an account, which is therefore the Target of the event.

I think the confusion comes about simply because people use the term "user" loosely to represent the actual person, the account, and so forth. If we simply define our terms carefully, the confusion disappears.


> Using a subject-verb-object model will require us to provide very clear and
> specific instructions on how to *identify* the subject and object correctly,
> but that's Merely a Matter of Documentation :-)

I agree with this exactly, I believe we're saying the same thing.  So it just becomes a matter of definition.


>> The problem here is that any "true hierarchic"
>> taxonomy
>> is based on a single use case. With enumerations,
>> it is
>> easy to create a hierarchy. By strictly limiting
>> the scope
>> to security audit, the case can be made to support
>> a
>> security audit taxonomy.
>
> I think I agree with this, if I understand what Bill is saying. Defining a
> single hierarchy that will incorporate all the various types of logs out
> there seems, uh, implausible. But for particular situations -- credit card
> transactional data, user management, system and application updates -- we
> can probably provide meaningful hierarchies.

I think I agree with this too, but the way this is stated sounds a bit limiting. What we have found in working with and taxonomizing event logs from many vendors over many years is that *most*  events fall into pretty obvious taxonomic categories that are very useful for a quite broad set of use cases including most forensic analysis, enterprise reporting, compliance, etc etc.  In many cases the events that "don't fit" are really just a different viewpoint from the vendor - they could easily be rewritten into an equally valid form that would conform to an event standard.

On the other hand, there are always use cases where the events that are produced really don't quite fit the model that we've set up.  In those cases I think it might be perfectly valid to simply come up with a different model.  So for example imagine we have three event models:

1) One model that expresses interactions between domain objects using the Initiator, Target, Observer, Action described above, e.g. XDAS
  - this would cover the vast majority of compliance, enterprise reporting, and most forensic use cases

2) Another model that covers "current state" events, i.e. how much bandwidth, disk space, what state a variable is in, etc
  - This would cover many operational use cases where you want to track statistics and such

3) A third model which covers "debug" events - e.g. stack traces and component failures and that sort of thing
  - This would be more for debug and deep forensic analaysis

I don't see a reason why three very simple models couldn't cover virtually all the use cases we are aware of and be flexible enough to adapt to new ones.  But I think trying to force-fit one model from the above list into one of the other models may be tricky, which is what we seem to be trying to do.  If the models follow the same basic expressive structure, a simple flag could tell us which model was in use, and therefore how to parse it (and therefore the overarching model could be extensible).  Finally, the transport and other recommendations that are being defined in CEE could simply say "what you transport over this mechanism is one of the defined event models" (where the definition of those models comes from XDAS and possibly other standards).  So CEE becomes "IP" and XDAS, debug, etc become "UDP", "TCP", etc  ;-)


The overall message here is that what I saw at the SIG was mostly people raising exceptions that wouldn't fit in the proposed models.  To which I say, let's make the model simpler, but have more models (within a common framework), rather than making a ridiculously complex model that no one can understand.

Thoughts?
John Calcote-2
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Corlette wrote:

> ...
> The above covers the authentication case easily even if there's a
> third-party authentication engine involved and if the event is
> actually reported by an IDS or similar system.
>
>> It's complex because it's unintuitive, as demonstrated by the
>> familiar phrase "the user authenticating" (which implies that the
>> user is the subject of the action, which is not true) or even worse,
>> "log on to our Web site" (for a site that does not require
>> authentication), which usually means "use your Web browser to access
>> our nifty content" and has nothing to do with authentication
>> whatsoever.
>> ...

It seems to me there are a couple of desirable goals being discussed here:

1) Administrators like to read human language based log messages.
2) Taxonomy must be well-defined so there's no confusion about what is
meant by any given message.

But on further analysis, these two goals are VERY difficult to
reconcile. Human language introduces ambiguity -- almost by definition
- -- but numeric or enumerated (whether hierarchical or flat) taxonomy is
difficult to read by an administrator, without the aid of translation tools.

Now, let's look at our motivation for these two goals:

1) Why use text-based values? Because we want to be able to look at a
log without the aid of translation tools. Are there any other reasons? I
can't really conceive of them.

2) Why define a standardized taxonomy? Because we want the entire world
to understand that a given event is a proxy-authentication-by-a-service
event, regardless of the event source.

XDAS (v2.0) chose the hierarchical approach because it's very flexible,
allowing entire sub-taxonomies to be inserted where required.
Additionally, as David mentioned, it's very easy to parse. Given the
explosive growth in volume of events to be processed that we will no
doubt see in the near future (we've already seen some of this growth),
this is VERY important for analysis system scalability. And that
explosive growth creates the very need to target support for such
automated analysis systems.

Another important benefit of a hierarchical taxonomy is that it allows
for future refinement. We can do our best to define a "global" taxonomy,
but we will always make someone angry at our lack of foresight, with
respect to their particular use cases. I picture it as an ongoing
incremental process, allowing individual organizations to introduce
corrections into their own systems by hooking into the existing standard
taxonomy at reasonable points, and then approaching future standards
committee meetings with these changes when they feel they've matured
enough to be accepted by the community.

To use the example already mentioned in this thread, suppose we provide
a standard "authentication" event, and later (probably shortly after the
standard is released), some group feels that there really should be
various sub-types of authentication. They define those sub-types as a
sub-hierarchy beneath the existing (now more generic) authentication event.

As with other ongoing standards processes, the committee members would
then review the proposed additions, modify them so they are more
palatable to the community, and then amend the existing standard with
updates that include a new sub-hierarchy beneath the existing
authentication event type. This is a well-understood -- and more
importantly -- a well-accepted methodology.

Existing analysis software continues to work (albeit, with less
event-type granularity for authentication events), but newer software
can then take advantage of the newer authentication event sub-classes.

Now, all of that said, I fully realize that CEE was founded on the very
key concept of language-based event logging. It's a really neat idea --
*IF* it can be done efficiently. Such a standard would necessarily have
to include VERY specific rules for how human language is interpreted by
log analysis engines. The amount of processing power required to
*properly* analyze such a log file would be tremendous.

And to what end? So administrators don't have to use a translation tool
when glancing at a log file, which is quite frankly a secondary form of
analysis in today's world.

For heaven's sake! A high-school student could write a filter in a
half-hour in bourne shell script (using sed or awk) that would convert
OIDs to human-readable text - using his favorite vernacular, no less!

$ cat event.log | oids2text | more

John

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl
xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR
=eL+N
-----END PGP SIGNATURE-----
Sanford Whitehouse
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
The root of the taxonomy is a message level schema.  It's a very simple
schema.  It presumes nothing about use.  Only describes the event.  Any
other use, above this root level, is the classification/categorization
aspect of the taxonomy.

The first level might look like ...
        action=login object=user
        action=login object=service

(Note:  this is not to suggest the form or names at this level.  It's
only an example for the language/enumeration discussion).

The classification level looks at all events with the same taxon values
or values that share a similar characteristic.  "login" might be
classified as an Authentication event.  Any Kerberos tickets and file
access controls can be put into an Authorization event classification.

Categorization, by my definition (wholly incorrect) is anything someone
judges to fit within a group.  Risk management might use system
monitoring and badge use at building entrances to be the same thing.
It's up to them.

On to the language part.

At the root level, the action/object and other terms can be enumerations
or words.  Ultimately, enumerations have language equivalents.
Interpretation applies to both, with all the risks of transition and
existing localized definitions.  The method to address it is the same
other professions use; a simple, unambiguous definition with little or
no overlap.

This has other issues, such as architectural or domain common use.  A
port for a shipping management system and TCP/IP port use the same term.
(This also touches a number of other issues, such as the desired event
logging level.)

Anyway, the word and the numeration are synonyms.  Words are easy to
read.

Sanford



 

-----Original Message-----
From: John Calcote [mailto:john.calcote@...]
Sent: Wednesday, July 02, 2008 5:23 PM
To: CEE-DISCUSSION-LIST@...
Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or
Language?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Corlette wrote:

> ...
> The above covers the authentication case easily even if there's a
> third-party authentication engine involved and if the event is
> actually reported by an IDS or similar system.
>
>> It's complex because it's unintuitive, as demonstrated by the
>> familiar phrase "the user authenticating" (which implies that the
>> user is the subject of the action, which is not true) or even worse,
>> "log on to our Web site" (for a site that does not require
>> authentication), which usually means "use your Web browser to access
>> our nifty content" and has nothing to do with authentication
>> whatsoever.
>> ...

It seems to me there are a couple of desirable goals being discussed
here:

1) Administrators like to read human language based log messages.
2) Taxonomy must be well-defined so there's no confusion about what is
meant by any given message.

But on further analysis, these two goals are VERY difficult to
reconcile. Human language introduces ambiguity -- almost by definition
- -- but numeric or enumerated (whether hierarchical or flat) taxonomy
is difficult to read by an administrator, without the aid of translation
tools.

Now, let's look at our motivation for these two goals:

1) Why use text-based values? Because we want to be able to look at a
log without the aid of translation tools. Are there any other reasons? I
can't really conceive of them.

2) Why define a standardized taxonomy? Because we want the entire world
to understand that a given event is a proxy-authentication-by-a-service
event, regardless of the event source.

XDAS (v2.0) chose the hierarchical approach because it's very flexible,
allowing entire sub-taxonomies to be inserted where required.
Additionally, as David mentioned, it's very easy to parse. Given the
explosive growth in volume of events to be processed that we will no
doubt see in the near future (we've already seen some of this growth),
this is VERY important for analysis system scalability. And that
explosive growth creates the very need to target support for such
automated analysis systems.

Another important benefit of a hierarchical taxonomy is that it allows
for future refinement. We can do our best to define a "global" taxonomy,
but we will always make someone angry at our lack of foresight, with
respect to their particular use cases. I picture it as an ongoing
incremental process, allowing individual organizations to introduce
corrections into their own systems by hooking into the existing standard
taxonomy at reasonable points, and then approaching future standards
committee meetings with these changes when they feel they've matured
enough to be accepted by the community.

To use the example already mentioned in this thread, suppose we provide
a standard "authentication" event, and later (probably shortly after the
standard is released), some group feels that there really should be
various sub-types of authentication. They define those sub-types as a
sub-hierarchy beneath the existing (now more generic) authentication
event.

As with other ongoing standards processes, the committee members would
then review the proposed additions, modify them so they are more
palatable to the community, and then amend the existing standard with
updates that include a new sub-hierarchy beneath the existing
authentication event type. This is a well-understood -- and more
importantly -- a well-accepted methodology.

Existing analysis software continues to work (albeit, with less
event-type granularity for authentication events), but newer software
can then take advantage of the newer authentication event sub-classes.

Now, all of that said, I fully realize that CEE was founded on the very
key concept of language-based event logging. It's a really neat idea --
*IF* it can be done efficiently. Such a standard would necessarily have
to include VERY specific rules for how human language is interpreted by
log analysis engines. The amount of processing power required to
*properly* analyze such a log file would be tremendous.

And to what end? So administrators don't have to use a translation tool
when glancing at a log file, which is quite frankly a secondary form of
analysis in today's world.

For heaven's sake! A high-school student could write a filter in a
half-hour in bourne shell script (using sed or awk) that would convert
OIDs to human-readable text - using his favorite vernacular, no less!

$ cat event.log | oids2text | more

John

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl
xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR
=eL+N
-----END PGP SIGNATURE-----
David Corlette
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
In reply to this post by John Calcote-2
Actually I'm not sure we have a problem here at all.  Part of what we've discussed doing with XDAS is defining a number of expressive formats, with well-defined translations between them.

We were thinking about this in the context of JSON, XML, field, delimited, binary, and so forth, with the idea that:

...<initiator><account><name>dcorlette</name><id>130</id><domain>AD-DOMAIN</domain></account></initiator>...

is exactly equivalent to:

...{ Initiator: { account: { name: "dcorlette", id: 130, domain: "AD-DOMAIN" } } }...

is exactly equivalent to:

...|INIT|dcorlette|130|AD-DOMAIN|...


BUT - there's nothing to prevent us from defining equivalencies between compact and readable versions of certain normalized *data* fields too, so in the compact form you use:

...|ACTION|0.0.8.1|...

and in the verbose version you use:

...{ Action: { Taxonomy: "OpenGroup.XDAS.System.Shutdown" } }...


Since these translations would be pre-defined in the standard itself, one would imagine that most tools would provide trivial methods to convert between them - tools like what John mentioned or simply different "representations" depending on whether it's displayed, on the wire, automatically processed, stored in a DB, or stored in a text file.



>>> On Wed, Jul 2, 2008 at  8:23 PM, in message <486C1BE7.9010506@...>, John
Calcote <john.calcote@...> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David Corlette wrote:
>> ...
>> The above covers the authentication case easily even if there's a
>> third-party authentication engine involved and if the event is
>> actually reported by an IDS or similar system.
>>
>>> It's complex because it's unintuitive, as demonstrated by the
>>> familiar phrase "the user authenticating" (which implies that the
>>> user is the subject of the action, which is not true) or even worse,
>>> "log on to our Web site" (for a site that does not require
>>> authentication), which usually means "use your Web browser to access
>>> our nifty content" and has nothing to do with authentication
>>> whatsoever.
>>> ...
>
> It seems to me there are a couple of desirable goals being discussed here:
>
> 1) Administrators like to read human language based log messages.
> 2) Taxonomy must be well-defined so there's no confusion about what is
> meant by any given message.
>
> But on further analysis, these two goals are VERY difficult to
> reconcile. Human language introduces ambiguity -- almost by definition
> - -- but numeric or enumerated (whether hierarchical or flat) taxonomy is
> difficult to read by an administrator, without the aid of translation tools.
>
> Now, let's look at our motivation for these two goals:
>
> 1) Why use text-based values? Because we want to be able to look at a
> log without the aid of translation tools. Are there any other reasons? I
> can't really conceive of them.
>
> 2) Why define a standardized taxonomy? Because we want the entire world
> to understand that a given event is a proxy-authentication-by-a-service
> event, regardless of the event source.
>
> XDAS (v2.0) chose the hierarchical approach because it's very flexible,
> allowing entire sub-taxonomies to be inserted where required.
> Additionally, as David mentioned, it's very easy to parse. Given the
> explosive growth in volume of events to be processed that we will no
> doubt see in the near future (we've already seen some of this growth),
> this is VERY important for analysis system scalability. And that
> explosive growth creates the very need to target support for such
> automated analysis systems.
>
> Another important benefit of a hierarchical taxonomy is that it allows
> for future refinement. We can do our best to define a "global" taxonomy,
> but we will always make someone angry at our lack of foresight, with
> respect to their particular use cases. I picture it as an ongoing
> incremental process, allowing individual organizations to introduce
> corrections into their own systems by hooking into the existing standard
> taxonomy at reasonable points, and then approaching future standards
> committee meetings with these changes when they feel they've matured
> enough to be accepted by the community.
>
> To use the example already mentioned in this thread, suppose we provide
> a standard "authentication" event, and later (probably shortly after the
> standard is released), some group feels that there really should be
> various sub-types of authentication. They define those sub-types as a
> sub-hierarchy beneath the existing (now more generic) authentication event.
>
> As with other ongoing standards processes, the committee members would
> then review the proposed additions, modify them so they are more
> palatable to the community, and then amend the existing standard with
> updates that include a new sub-hierarchy beneath the existing
> authentication event type. This is a well-understood -- and more
> importantly -- a well-accepted methodology.
>
> Existing analysis software continues to work (albeit, with less
> event-type granularity for authentication events), but newer software
> can then take advantage of the newer authentication event sub-classes.
>
> Now, all of that said, I fully realize that CEE was founded on the very
> key concept of language-based event logging. It's a really neat idea --
> *IF* it can be done efficiently. Such a standard would necessarily have
> to include VERY specific rules for how human language is interpreted by
> log analysis engines. The amount of processing power required to
> *properly* analyze such a log file would be tremendous.
>
> And to what end? So administrators don't have to use a translation tool
> when glancing at a log file, which is quite frankly a secondary form of
> analysis in today's world.
>
> For heaven's sake! A high-school student could write a filter in a
> half-hour in bourne shell script (using sed or awk) that would convert
> OIDs to human-readable text - using his favorite vernacular, no less!
>
> $ cat event.log | oids2text | more
>
> John
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (GNU/Linux)
> Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl
> xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR
> =eL+N
> -----END PGP SIGNATURE-----
Eric Fitzgerald
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
In reply to this post by Tina Bird
I agree with you Tina.

In general I would like to avoid the unqualified use of the term "user" in events.

I have identified several different use cases for a "User" in an event record:

Subject/Actor
  * primary [e.g. user account identity associated with a running process]
  * impersonated/on-behalf-of [e.g. on whose behalf a task is being performed]
  * caller-of-interface
Object/Target
  * user as account object
  * user as object of authentication/logon



-----Original Message-----
From: Tina Bird [mailto:tbird@...]
Sent: Wednesday, July 02, 2008 2:15 PM
To: CEE-DISCUSSION-LIST@...
Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language?

One small correction:

> So each of these authentication examples boils down to
>
> subject: system process or application requiring authentication
> verb: auth succeeded or auth failed
> object: user (or process or application) from which authentication is
required

It's more precise to say that the subject is the process or application
requesting or mediating the authentication...
heinbockel
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
In reply to this post by John Calcote-2

After having this discussion with several
coworkers,
I would like focus this conversion to two points.


Should CEE support:

1. Hierarchical/structured taxonomies

My opinion:
     Any structure implied to data is based on a
particular
     view or use case. There are definite benefits
to this
     as both Dave and John have pointed out.
However, without
     narrowing the scope of logs & CEE I do not
see how a
     all event types can be represented in a
universally
     applicable hierarchy, nor do I see any value
in defining
     multiple hierarchies.



2. Enumerated taxonomy choices

My opinion:
     I think that CEE needs to provide some
direction as to
     the words (subject, object, action) people
should choose.
     Too many words are overloaded ('user',
'logon') and others
     have many synonyms ('logon', 'login',
'authentication',
     'password accepted', etc.).
     I don't think it matters whether these are
expressed in
     word lists or numbers/indices -- maybe this
is a syntax-level
     declaration.

     However, what I do think is important, is
that there be:
     (1) a way to express event types not defined
within the current
     'official' taxonomy, and
     (2) a way to express specific details/names
relating to each word
     choice (e.g., 'account' == Joe, 'file' ==
/etc/passwd)



smime.p7s (4K) Download Attachment
Tina Bird
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink
Nicely worded, Bill. I concur. A couple of minor comments below:

> Should CEE support:
>
> 1. Hierarchical/structured taxonomies
>
> My opinion:
>      Any structure implied to data is based on a
> particular
>      view or use case. There are definite benefits
> to this
>      as both Dave and John have pointed out.
> However, without
>      narrowing the scope of logs & CEE I do not
> see how a
>      all event types can be represented in a
> universally
>      applicable hierarchy, nor do I see any value
> in defining
>      multiple hierarchies.

Perhaps what CEE should incorporate is a method for building these
hierarchies, to maximize the chances that users (and vendors) will create
hierarchies that other organizations would be able to use?

> 2. Enumerated taxonomy choices
>
> My opinion:
>      I think that CEE needs to provide some
> direction as to
>      the words (subject, object, action) people
> should choose.
>      Too many words are overloaded ('user',
> 'logon') and others
>      have many synonyms ('logon', 'login',
> 'authentication',
>      'password accepted', etc.).
>      I don't think it matters whether these are
> expressed in
>      word lists or numbers/indices -- maybe this
> is a syntax-level
>      declaration.

I agree.

>      However, what I do think is important, is
> that there be:
>      (1) a way to express event types not defined
> within the current
>      'official' taxonomy, and
>      (2) a way to express specific details/names
> relating to each word
>      choice (e.g., 'account' == Joe, 'file' ==
> /etc/passwd)

Since there's no way we'll be able to capture every kind of data that all
the humans, machines and software out there are likely to use, we *have* to
have a defined mechanism for defining types locally, whether or not they
would ever be added to the "official" list; as well as having some kind of
mechanism for nominating new event types to the official ones.

Item 2 suggests a specific "structure" that identifies attribute/value pairs
within a given dataset? That makes sense to me.

cheers -- tbird
Tina Bird
Re: CEE Taxonomy: Enumeration or Language?
Reply Threaded More
Print post
Permalink