|
|
|
1
2
|
|
heinbockel
|
While there is general agreement that log types need to be consistently and accurately identified, there has been no agreement as to the best way to pursue this need. GOAL: To be able to unambiguously group logs based upon the representative event type. For example, if given a log file, I should be able to quickly identify all logs dealing with events related to "authentication", "privilege elevation", or "configuration changes". In order to more closely relate regulatory and policy-level mandates to the actual logs, this is a necessity for most (all?) users. There are two ways that this can be handled: 1. Language -- every event can be described as a (possibly unknown) SUBJECT performing an ACTION on an OBJECT. The process requires a developer to select the most appropriate word choice from each of the 3 categories. 2. Enumeration -- provide a listing of all of the event types. Each event matches to exactly one enumerated type. In the most recent draft (v2.0) of XDAS, there is a multi-level dotted-notation (similar to SNMP OIDs) to enumerate events. The first level is the registry id, then the provider id, followed by the "event space" (category), and finally the singleton event. For instance, "0.0.7.0" identifies the "Create association with data item" event type in the "Data Item or Resource Element Content Access Events" category, as provided by OpenGroup and defined in the OpenGroup registry. Other examples of the current categories are "Account Management Events", "Trust Management Events", and "Peer Association Management Events", consisting of singleton event types such as "Backup datastore", "Invoke service or application", "Create account". (Both approaches agree must be some expression of the result.) Now, each of these approaches has merit. What this discussion breaks down to is that a language, like OVAL, better captures nuances and allows for more flexibility. An enumeration (e.g., CWE, CVE) is more precise, requires more "well-defined" boundaries, and is better for computers. The CEE Taxonomy problem can be solved with either a language or an enumeration-based approach. With past standardization efforts, MITRE has used use-cases as the primary driver. With CVE, the primary use was the differentiation of vulnerabilities, for which an enumeration works really well. For OVAL, there are too many different ways of validation/verification across platforms. Just something to put some thought into over the holiday. I am interested in hearing any feedback from this group as to which you think is more appropriate for expressing the type of log. William Heinbockel Infosec Engineer, Sr. The MITRE Corporation 202 Burlington Rd. MS S145 Bedford, MA 01730 heinbockel@... 781-271-2615 |
|||||||||||||||||
|
Sanford Whitehouse
|
1. Language.
a. Enumeration can change. Logs written with enumerations are subject to keeping track of enumeration dictionary changes. Each message will need to have a version, or some other form or relating the dictionary to the log or message. In SNMP, the OID fields have a hierarchy that has been constant for quite a while. But use of fields, past the enterprise identifier, is up to the vendor. It's mapped to the MIB. They can use it any way they like. (Correct me if I'm wrong. It's been a while...) That's not the case here. The taxonomy definition will have a set of defined fields, and values for those field that will, presumably, remain constant. There will be changes to the definitions until a balance has been found. Changes to enumeration will change archived logs. b. The first level users are administrators (system and product.) The logs they read should be understandable without translation. Administrators and users are not fond of enumeration of any kind, let alone one that would limit them to guessing what has been logged. 2. Where can enumeration apply? At the level where data is exchanged between log consumers. a. It's a useful shorthand for keeping traffic down and structure the representation. However, I wouldn't use it for long term archival. Keeping track of dictionaries and log versions is too complex. b. In cases where event descriptions, and other values, are enumerated is usually in databases and database applications, or complex logging environments such as i5/OS or z/OS. It's a natural for them, and very useful for reporting. All of those have an environment or infrastructure that supports use. One can't use the logs or log access tools without the environment being complete and up to date. When the logs are moved out, the dictionaries have to move with them. It's a big pain. 3. If I can't get to it, it's not much use. The basic argument is today I can read most logs with just about any editor. If taxonomy enumeration is used, that may not be possible. I like it simple when it comes to me and me logs. Sanford -----Original Message----- From: Heinbockel, Bill [mailto:heinbockel@...] Sent: Wednesday, July 02, 2008 7:12 AM To: CEE-DISCUSSION-LIST@... Subject: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language? While there is general agreement that log types need to be consistently and accurately identified, there has been no agreement as to the best way to pursue this need. GOAL: To be able to unambiguously group logs based upon the representative event type. For example, if given a log file, I should be able to quickly identify all logs dealing with events related to "authentication", "privilege elevation", or "configuration changes". In order to more closely relate regulatory and policy-level mandates to the actual logs, this is a necessity for most (all?) users. There are two ways that this can be handled: 1. Language -- every event can be described as a (possibly unknown) SUBJECT performing an ACTION on an OBJECT. The process requires a developer to select the most appropriate word choice from each of the 3 categories. 2. Enumeration -- provide a listing of all of the event types. Each event matches to exactly one enumerated type. In the most recent draft (v2.0) of XDAS, there is a multi-level dotted-notation (similar to SNMP OIDs) to enumerate events. The first level is the registry id, then the provider id, followed by the "event space" (category), and finally the singleton event. For instance, "0.0.7.0" identifies the "Create association with data item" event type in the "Data Item or Resource Element Content Access Events" category, as provided by OpenGroup and defined in the OpenGroup registry. Other examples of the current categories are "Account Management Events", "Trust Management Events", and "Peer Association Management Events", consisting of singleton event types such as "Backup datastore", "Invoke service or application", "Create account". (Both approaches agree must be some expression of the result.) Now, each of these approaches has merit. What this discussion breaks down to is that a language, like OVAL, better captures nuances and allows for more flexibility. An enumeration (e.g., CWE, CVE) is more precise, requires more "well-defined" boundaries, and is better for computers. The CEE Taxonomy problem can be solved with either a language or an enumeration-based approach. With past standardization efforts, MITRE has used use-cases as the primary driver. With CVE, the primary use was the differentiation of vulnerabilities, for which an enumeration works really well. For OVAL, there are too many different ways of validation/verification across platforms. Just something to put some thought into over the holiday. I am interested in hearing any feedback from this group as to which you think is more appropriate for expressing the type of log. William Heinbockel Infosec Engineer, Sr. The MITRE Corporation 202 Burlington Rd. MS S145 Bedford, MA 01730 heinbockel@... 781-271-2615 |
|||||||||||||||||
|
David Corlette
|
In reply to this post by heinbockel
GOAL: To be able to unambiguously group logs based upon the representative event type.
My 2c: One of the great benefits we get from a multi-level taxonomy (which in Sentinel we've been applying for years by dint of lots of manual effort applied to the chaotic log messages we get from a wide variety of vendors) is the ability to filter and group things easily. So for example I can say "show me all account management activity (i.e. 0.0.0.* in the current proposed XDAS taxonomy)" and "show me all data access activity (i.e. 0.0.7.*)". If I only want file/table writes, then I can get more specific "show me only data writes (0.0.7.5)". Whether this hierarchic taxonomy is expressed as dotted numbers or as words is probably unimportant; with XDAS we went with numbers because they are more compact and can be matched/filtered more easily (and when processing thousands of events per second this becomes important). But there's a one-to-one correspondence with verbs built into the taxonomy already, so conversion is trivial if you want to read the logs manually. The point being, based on our experience we feel that a true hierarchic taxonomy is critical to a proper event standard. |
||||
|
heinbockel
|
>-----Original Message----- >From: David Corlette >[mailto:DCorlette@...] >Sent: Wednesday, 02 July 2008 15:34 >To: cee-discussion-list CEE-Related Discussion >Subject: Re: [CEE-DISCUSSION-LIST] CEE >Taxonomy: Enumeration or Language? > >GOAL: To be able to unambiguously group logs >based upon the representative event type. > > >My 2c: > >One of the great benefits we get from a multi- >level taxonomy (which in Sentinel we've been >applying for years by dint of lots of manual >effort applied to the chaotic log messages we >get from a wide variety of vendors) is the >ability to filter and group things easily. So >for example I can say "show me all account >management activity (i.e. 0.0.0.* in the >current proposed XDAS taxonomy)" and "show me >all data access activity (i.e. 0.0.7.*)". If I >only want file/table writes, then I can get >more specific "show me only data writes >(0.0.7.5)". > >Whether this hierarchic taxonomy is expressed >as dotted numbers or as words is probably >unimportant; with XDAS we went with numbers >because they are more compact and can be >matched/filtered more easily (and when >processing thousands of events per second this >becomes important). But there's a one-to-one >correspondence with verbs built into the >taxonomy already, so conversion is trivial if >you want to read the logs manually. correspondence with the language verbs. For example take something simple like authentication. Is it important to distinguish in the taxonomy the difference between a user authenticating to an operating system, a web service, or su/sudo? What about things like SSO, where an application authenticates on your behalf? > >The point being, based on our experience we >feel that a true hierarchic taxonomy is >critical to a proper event standard. The problem here is that any "true hierarchic" taxonomy is based on a single use case. With enumerations, it is easy to create a hierarchy. By strictly limiting the scope to security audit, the case can be made to support a security audit taxonomy. |
|||||||||||||||||
|
Tina Bird
|
Sanford's response incorporated much of what I was going to say. I strongly
prefer a language-based approach to an enumerative approach, primarily because as a system administrator, I don't want to be dependent on some sort of translator that takes me from a numeric event to words (a la SNMP). Current log messages are a mess, as far as useability goes, but I'd still rather see "login failed" (even without a failure reason, or a host IP address) than anything that looks like an OID. > For example take something simple like > authentication. > Is it important to distinguish in the taxonomy the > difference between a user authenticating to an > operating system, a web service, or su/sudo? What > about > things like SSO, where an application > authenticates > on your behalf? It's both simpler *and* more complex than this (although it's a complexity I think we can clarify). From the subject-verb-object point of view that Bill described originally, the user is the *object*, not the subject. The user provides credentials, but in each case, the *subject* (ie. the entity that is causing the log message to be created) is whichever application or system process the user is trying to access: login on a UNIX box, the Local Security Authority on a Windows system, or an application like sudo or a database. So each of these authentication examples boils down to subject: system process or application requiring authentication verb: auth succeeded or auth failed object: user (or process or application) from which authentication is required This is good, because it unifies how we format and interpret all authentication events, no matter which entities (users, processes, apps) are involved. It's complex because it's unintuitive, as demonstrated by the familiar phrase "the user authenticating" (which implies that the user is the subject of the action, which is not true) or even worse, "log on to our Web site" (for a site that does not require authentication), which usually means "use your Web browser to access our nifty content" and has nothing to do with authentication whatsoever. Using a subject-verb-object model will require us to provide very clear and specific instructions on how to *identify* the subject and object correctly, but that's Merely a Matter of Documentation :-) I'm especially fond of it because using this format strongly encourages programmers, analysts and whoever else ends up influencing logging to think rigorously about the "workflow" for the given event. [I was going to say "forces" rather than "strongly encourages," but then common sense kicked in.] > The problem here is that any "true hierarchic" > taxonomy > is based on a single use case. With enumerations, > it is > easy to create a hierarchy. By strictly limiting > the scope > to security audit, the case can be made to support > a > security audit taxonomy. I think I agree with this, if I understand what Bill is saying. Defining a single hierarchy that will incorporate all the various types of logs out there seems, uh, implausible. But for particular situations -- credit card transactional data, user management, system and application updates -- we can probably provide meaningful hierarchies. cheers -- tbird |
|||||||||||||||||
|
Tina Bird
|
One small correction:
> So each of these authentication examples boils down to > > subject: system process or application requiring authentication > verb: auth succeeded or auth failed > object: user (or process or application) from which authentication is required It's more precise to say that the subject is the process or application requesting or mediating the authentication... |
|||||||||||||||||
|
David Corlette
|
In reply to this post by Tina Bird
> Current log messages are a mess, as far as useability goes, but I'd still
> rather see "login failed" (even without a failure reason, or a host IP > address) than anything that looks like an OID. I think it's important to note that this is only one particular use case. Most of our customers are far more interested in automated processing and analysis than in spending their time poring through logs manually (although of course the latter is necessary on occasion). >> Is it important to distinguish in the taxonomy the >> difference between a user authenticating to an >> operating system, a web service I say no, the action is still an authentication, but the "target" is different. >> su/sudo? We call this privilege escalation, which is fundamentally different, although maybe "escalation" makes too many assumptions. >> What >> about >> things like SSO, where an application >> authenticates >> on your behalf? Still authentication. The act of creating an authorized session is a separate event in my opinion. > It's both simpler *and* more complex than this (although it's a complexity I > think we can clarify). From the subject-verb-object point of view that Bill > described originally, the user is the *object*, not the subject. The user > provides credentials, but in each case, the *subject* (ie. the entity that > is causing the log message to be created) is whichever application or system > process the user is trying to access: login on a UNIX box, the Local > Security Authority on a Windows system, or an application like sudo or a > database. So each of these authentication examples boils down to > > subject: system process or application requiring authentication > verb: auth succeeded or auth failed > object: user (or process or application) from which authentication is > required The semantic confusion caused by the above we find to be disheartening. This is a separate topic than taxonomy, but XDAS defines three different "objects" that relate to any event: Initiator: The user, service, and/or system that *causes* an event to occur Target: The user, service, system, trust, or data object that is *affected* by an event Observer: The service or system that *detected* the event and generates a log message reporting that fact The above covers the authentication case easily even if there's a third-party authentication engine involved and if the event is actually reported by an IDS or similar system. > It's complex because it's unintuitive, as demonstrated by the familiar > phrase "the user authenticating" (which implies that the user is the subject > of the action, which is not true) or even worse, "log on to our Web site" > (for a site that does not require authentication), which usually means "use > your Web browser to access our nifty content" and has nothing to do with > authentication whatsoever. I don't find this one particularly complex if put in the context of the above structure. If an actual person is authenticating, then that person is the Initiator, but of course real people aren't really represented in computerese (except in Identity Management systems, which we can discuss later). It should be obvious that that real person is attempting to access an account, which is therefore the Target of the event. I think the confusion comes about simply because people use the term "user" loosely to represent the actual person, the account, and so forth. If we simply define our terms carefully, the confusion disappears. > Using a subject-verb-object model will require us to provide very clear and > specific instructions on how to *identify* the subject and object correctly, > but that's Merely a Matter of Documentation :-) I agree with this exactly, I believe we're saying the same thing. So it just becomes a matter of definition. >> The problem here is that any "true hierarchic" >> taxonomy >> is based on a single use case. With enumerations, >> it is >> easy to create a hierarchy. By strictly limiting >> the scope >> to security audit, the case can be made to support >> a >> security audit taxonomy. > > I think I agree with this, if I understand what Bill is saying. Defining a > single hierarchy that will incorporate all the various types of logs out > there seems, uh, implausible. But for particular situations -- credit card > transactional data, user management, system and application updates -- we > can probably provide meaningful hierarchies. I think I agree with this too, but the way this is stated sounds a bit limiting. What we have found in working with and taxonomizing event logs from many vendors over many years is that *most* events fall into pretty obvious taxonomic categories that are very useful for a quite broad set of use cases including most forensic analysis, enterprise reporting, compliance, etc etc. In many cases the events that "don't fit" are really just a different viewpoint from the vendor - they could easily be rewritten into an equally valid form that would conform to an event standard. On the other hand, there are always use cases where the events that are produced really don't quite fit the model that we've set up. In those cases I think it might be perfectly valid to simply come up with a different model. So for example imagine we have three event models: 1) One model that expresses interactions between domain objects using the Initiator, Target, Observer, Action described above, e.g. XDAS - this would cover the vast majority of compliance, enterprise reporting, and most forensic use cases 2) Another model that covers "current state" events, i.e. how much bandwidth, disk space, what state a variable is in, etc - This would cover many operational use cases where you want to track statistics and such 3) A third model which covers "debug" events - e.g. stack traces and component failures and that sort of thing - This would be more for debug and deep forensic analaysis I don't see a reason why three very simple models couldn't cover virtually all the use cases we are aware of and be flexible enough to adapt to new ones. But I think trying to force-fit one model from the above list into one of the other models may be tricky, which is what we seem to be trying to do. If the models follow the same basic expressive structure, a simple flag could tell us which model was in use, and therefore how to parse it (and therefore the overarching model could be extensible). Finally, the transport and other recommendations that are being defined in CEE could simply say "what you transport over this mechanism is one of the defined event models" (where the definition of those models comes from XDAS and possibly other standards). So CEE becomes "IP" and XDAS, debug, etc become "UDP", "TCP", etc ;-) The overall message here is that what I saw at the SIG was mostly people raising exceptions that wouldn't fit in the proposed models. To which I say, let's make the model simpler, but have more models (within a common framework), rather than making a ridiculously complex model that no one can understand. Thoughts? |
|||||||||||||||||
|
John Calcote-2
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 David Corlette wrote: > ... > The above covers the authentication case easily even if there's a > third-party authentication engine involved and if the event is > actually reported by an IDS or similar system. > >> It's complex because it's unintuitive, as demonstrated by the >> familiar phrase "the user authenticating" (which implies that the >> user is the subject of the action, which is not true) or even worse, >> "log on to our Web site" (for a site that does not require >> authentication), which usually means "use your Web browser to access >> our nifty content" and has nothing to do with authentication >> whatsoever. >> ... It seems to me there are a couple of desirable goals being discussed here: 1) Administrators like to read human language based log messages. 2) Taxonomy must be well-defined so there's no confusion about what is meant by any given message. But on further analysis, these two goals are VERY difficult to reconcile. Human language introduces ambiguity -- almost by definition - -- but numeric or enumerated (whether hierarchical or flat) taxonomy is difficult to read by an administrator, without the aid of translation tools. Now, let's look at our motivation for these two goals: 1) Why use text-based values? Because we want to be able to look at a log without the aid of translation tools. Are there any other reasons? I can't really conceive of them. 2) Why define a standardized taxonomy? Because we want the entire world to understand that a given event is a proxy-authentication-by-a-service event, regardless of the event source. XDAS (v2.0) chose the hierarchical approach because it's very flexible, allowing entire sub-taxonomies to be inserted where required. Additionally, as David mentioned, it's very easy to parse. Given the explosive growth in volume of events to be processed that we will no doubt see in the near future (we've already seen some of this growth), this is VERY important for analysis system scalability. And that explosive growth creates the very need to target support for such automated analysis systems. Another important benefit of a hierarchical taxonomy is that it allows for future refinement. We can do our best to define a "global" taxonomy, but we will always make someone angry at our lack of foresight, with respect to their particular use cases. I picture it as an ongoing incremental process, allowing individual organizations to introduce corrections into their own systems by hooking into the existing standard taxonomy at reasonable points, and then approaching future standards committee meetings with these changes when they feel they've matured enough to be accepted by the community. To use the example already mentioned in this thread, suppose we provide a standard "authentication" event, and later (probably shortly after the standard is released), some group feels that there really should be various sub-types of authentication. They define those sub-types as a sub-hierarchy beneath the existing (now more generic) authentication event. As with other ongoing standards processes, the committee members would then review the proposed additions, modify them so they are more palatable to the community, and then amend the existing standard with updates that include a new sub-hierarchy beneath the existing authentication event type. This is a well-understood -- and more importantly -- a well-accepted methodology. Existing analysis software continues to work (albeit, with less event-type granularity for authentication events), but newer software can then take advantage of the newer authentication event sub-classes. Now, all of that said, I fully realize that CEE was founded on the very key concept of language-based event logging. It's a really neat idea -- *IF* it can be done efficiently. Such a standard would necessarily have to include VERY specific rules for how human language is interpreted by log analysis engines. The amount of processing power required to *properly* analyze such a log file would be tremendous. And to what end? So administrators don't have to use a translation tool when glancing at a log file, which is quite frankly a secondary form of analysis in today's world. For heaven's sake! A high-school student could write a filter in a half-hour in bourne shell script (using sed or awk) that would convert OIDs to human-readable text - using his favorite vernacular, no less! $ cat event.log | oids2text | more John -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR =eL+N -----END PGP SIGNATURE----- |
|||||||||||||||||
|
Sanford Whitehouse
|
The root of the taxonomy is a message level schema. It's a very simple
schema. It presumes nothing about use. Only describes the event. Any other use, above this root level, is the classification/categorization aspect of the taxonomy. The first level might look like ... action=login object=user action=login object=service (Note: this is not to suggest the form or names at this level. It's only an example for the language/enumeration discussion). The classification level looks at all events with the same taxon values or values that share a similar characteristic. "login" might be classified as an Authentication event. Any Kerberos tickets and file access controls can be put into an Authorization event classification. Categorization, by my definition (wholly incorrect) is anything someone judges to fit within a group. Risk management might use system monitoring and badge use at building entrances to be the same thing. It's up to them. On to the language part. At the root level, the action/object and other terms can be enumerations or words. Ultimately, enumerations have language equivalents. Interpretation applies to both, with all the risks of transition and existing localized definitions. The method to address it is the same other professions use; a simple, unambiguous definition with little or no overlap. This has other issues, such as architectural or domain common use. A port for a shipping management system and TCP/IP port use the same term. (This also touches a number of other issues, such as the desired event logging level.) Anyway, the word and the numeration are synonyms. Words are easy to read. Sanford -----Original Message----- From: John Calcote [mailto:john.calcote@...] Sent: Wednesday, July 02, 2008 5:23 PM To: CEE-DISCUSSION-LIST@... Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language? -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 David Corlette wrote: > ... > The above covers the authentication case easily even if there's a > third-party authentication engine involved and if the event is > actually reported by an IDS or similar system. > >> It's complex because it's unintuitive, as demonstrated by the >> familiar phrase "the user authenticating" (which implies that the >> user is the subject of the action, which is not true) or even worse, >> "log on to our Web site" (for a site that does not require >> authentication), which usually means "use your Web browser to access >> our nifty content" and has nothing to do with authentication >> whatsoever. >> ... It seems to me there are a couple of desirable goals being discussed here: 1) Administrators like to read human language based log messages. 2) Taxonomy must be well-defined so there's no confusion about what is meant by any given message. But on further analysis, these two goals are VERY difficult to reconcile. Human language introduces ambiguity -- almost by definition - -- but numeric or enumerated (whether hierarchical or flat) taxonomy is difficult to read by an administrator, without the aid of translation tools. Now, let's look at our motivation for these two goals: 1) Why use text-based values? Because we want to be able to look at a log without the aid of translation tools. Are there any other reasons? I can't really conceive of them. 2) Why define a standardized taxonomy? Because we want the entire world to understand that a given event is a proxy-authentication-by-a-service event, regardless of the event source. XDAS (v2.0) chose the hierarchical approach because it's very flexible, allowing entire sub-taxonomies to be inserted where required. Additionally, as David mentioned, it's very easy to parse. Given the explosive growth in volume of events to be processed that we will no doubt see in the near future (we've already seen some of this growth), this is VERY important for analysis system scalability. And that explosive growth creates the very need to target support for such automated analysis systems. Another important benefit of a hierarchical taxonomy is that it allows for future refinement. We can do our best to define a "global" taxonomy, but we will always make someone angry at our lack of foresight, with respect to their particular use cases. I picture it as an ongoing incremental process, allowing individual organizations to introduce corrections into their own systems by hooking into the existing standard taxonomy at reasonable points, and then approaching future standards committee meetings with these changes when they feel they've matured enough to be accepted by the community. To use the example already mentioned in this thread, suppose we provide a standard "authentication" event, and later (probably shortly after the standard is released), some group feels that there really should be various sub-types of authentication. They define those sub-types as a sub-hierarchy beneath the existing (now more generic) authentication event. As with other ongoing standards processes, the committee members would then review the proposed additions, modify them so they are more palatable to the community, and then amend the existing standard with updates that include a new sub-hierarchy beneath the existing authentication event type. This is a well-understood -- and more importantly -- a well-accepted methodology. Existing analysis software continues to work (albeit, with less event-type granularity for authentication events), but newer software can then take advantage of the newer authentication event sub-classes. Now, all of that said, I fully realize that CEE was founded on the very key concept of language-based event logging. It's a really neat idea -- *IF* it can be done efficiently. Such a standard would necessarily have to include VERY specific rules for how human language is interpreted by log analysis engines. The amount of processing power required to *properly* analyze such a log file would be tremendous. And to what end? So administrators don't have to use a translation tool when glancing at a log file, which is quite frankly a secondary form of analysis in today's world. For heaven's sake! A high-school student could write a filter in a half-hour in bourne shell script (using sed or awk) that would convert OIDs to human-readable text - using his favorite vernacular, no less! $ cat event.log | oids2text | more John -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR =eL+N -----END PGP SIGNATURE----- |
|||||||||||||||||
|
David Corlette
|
In reply to this post by John Calcote-2
Actually I'm not sure we have a problem here at all. Part of what we've discussed doing with XDAS is defining a number of expressive formats, with well-defined translations between them.
We were thinking about this in the context of JSON, XML, field, delimited, binary, and so forth, with the idea that: ...<initiator><account><name>dcorlette</name><id>130</id><domain>AD-DOMAIN</domain></account></initiator>... is exactly equivalent to: ...{ Initiator: { account: { name: "dcorlette", id: 130, domain: "AD-DOMAIN" } } }... is exactly equivalent to: ...|INIT|dcorlette|130|AD-DOMAIN|... BUT - there's nothing to prevent us from defining equivalencies between compact and readable versions of certain normalized *data* fields too, so in the compact form you use: ...|ACTION|0.0.8.1|... and in the verbose version you use: ...{ Action: { Taxonomy: "OpenGroup.XDAS.System.Shutdown" } }... Since these translations would be pre-defined in the standard itself, one would imagine that most tools would provide trivial methods to convert between them - tools like what John mentioned or simply different "representations" depending on whether it's displayed, on the wire, automatically processed, stored in a DB, or stored in a text file. >>> On Wed, Jul 2, 2008 at 8:23 PM, in message <486C1BE7.9010506@...>, John Calcote <john.calcote@...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > David Corlette wrote: >> ... >> The above covers the authentication case easily even if there's a >> third-party authentication engine involved and if the event is >> actually reported by an IDS or similar system. >> >>> It's complex because it's unintuitive, as demonstrated by the >>> familiar phrase "the user authenticating" (which implies that the >>> user is the subject of the action, which is not true) or even worse, >>> "log on to our Web site" (for a site that does not require >>> authentication), which usually means "use your Web browser to access >>> our nifty content" and has nothing to do with authentication >>> whatsoever. >>> ... > > It seems to me there are a couple of desirable goals being discussed here: > > 1) Administrators like to read human language based log messages. > 2) Taxonomy must be well-defined so there's no confusion about what is > meant by any given message. > > But on further analysis, these two goals are VERY difficult to > reconcile. Human language introduces ambiguity -- almost by definition > - -- but numeric or enumerated (whether hierarchical or flat) taxonomy is > difficult to read by an administrator, without the aid of translation tools. > > Now, let's look at our motivation for these two goals: > > 1) Why use text-based values? Because we want to be able to look at a > log without the aid of translation tools. Are there any other reasons? I > can't really conceive of them. > > 2) Why define a standardized taxonomy? Because we want the entire world > to understand that a given event is a proxy-authentication-by-a-service > event, regardless of the event source. > > XDAS (v2.0) chose the hierarchical approach because it's very flexible, > allowing entire sub-taxonomies to be inserted where required. > Additionally, as David mentioned, it's very easy to parse. Given the > explosive growth in volume of events to be processed that we will no > doubt see in the near future (we've already seen some of this growth), > this is VERY important for analysis system scalability. And that > explosive growth creates the very need to target support for such > automated analysis systems. > > Another important benefit of a hierarchical taxonomy is that it allows > for future refinement. We can do our best to define a "global" taxonomy, > but we will always make someone angry at our lack of foresight, with > respect to their particular use cases. I picture it as an ongoing > incremental process, allowing individual organizations to introduce > corrections into their own systems by hooking into the existing standard > taxonomy at reasonable points, and then approaching future standards > committee meetings with these changes when they feel they've matured > enough to be accepted by the community. > > To use the example already mentioned in this thread, suppose we provide > a standard "authentication" event, and later (probably shortly after the > standard is released), some group feels that there really should be > various sub-types of authentication. They define those sub-types as a > sub-hierarchy beneath the existing (now more generic) authentication event. > > As with other ongoing standards processes, the committee members would > then review the proposed additions, modify them so they are more > palatable to the community, and then amend the existing standard with > updates that include a new sub-hierarchy beneath the existing > authentication event type. This is a well-understood -- and more > importantly -- a well-accepted methodology. > > Existing analysis software continues to work (albeit, with less > event-type granularity for authentication events), but newer software > can then take advantage of the newer authentication event sub-classes. > > Now, all of that said, I fully realize that CEE was founded on the very > key concept of language-based event logging. It's a really neat idea -- > *IF* it can be done efficiently. Such a standard would necessarily have > to include VERY specific rules for how human language is interpreted by > log analysis engines. The amount of processing power required to > *properly* analyze such a log file would be tremendous. > > And to what end? So administrators don't have to use a translation tool > when glancing at a log file, which is quite frankly a secondary form of > analysis in today's world. > > For heaven's sake! A high-school student could write a filter in a > half-hour in bourne shell script (using sed or awk) that would convert > OIDs to human-readable text - using his favorite vernacular, no less! > > $ cat event.log | oids2text | more > > John > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.9 (GNU/Linux) > Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org > > iEYEARECAAYFAkhsG+YACgkQdcgqmRY/OH9zWQCfXQql1elwgVUqygTIvEOzsxPl > xLAAn1nJyGT8CtJWuwHAAXEPbkYWQrYR > =eL+N > -----END PGP SIGNATURE----- |
|||||||||||||||||
|
Eric Fitzgerald
|
In reply to this post by Tina Bird
I agree with you Tina.
In general I would like to avoid the unqualified use of the term "user" in events. I have identified several different use cases for a "User" in an event record: Subject/Actor * primary [e.g. user account identity associated with a running process] * impersonated/on-behalf-of [e.g. on whose behalf a task is being performed] * caller-of-interface Object/Target * user as account object * user as object of authentication/logon -----Original Message----- From: Tina Bird [mailto:tbird@...] Sent: Wednesday, July 02, 2008 2:15 PM To: CEE-DISCUSSION-LIST@... Subject: Re: [CEE-DISCUSSION-LIST] CEE Taxonomy: Enumeration or Language? One small correction: > So each of these authentication examples boils down to > > subject: system process or application requiring authentication > verb: auth succeeded or auth failed > object: user (or process or application) from which authentication is required It's more precise to say that the subject is the process or application requesting or mediating the authentication... |
|||||||||||||||||
|
heinbockel
|
In reply to this post by John Calcote-2
After having this discussion with several coworkers, I would like focus this conversion to two points. Should CEE support: 1. Hierarchical/structured taxonomies My opinion: Any structure implied to data is based on a particular view or use case. There are definite benefits to this as both Dave and John have pointed out. However, without narrowing the scope of logs & CEE I do not see how a all event types can be represented in a universally applicable hierarchy, nor do I see any value in defining multiple hierarchies. 2. Enumerated taxonomy choices My opinion: I think that CEE needs to provide some direction as to the words (subject, object, action) people should choose. Too many words are overloaded ('user', 'logon') and others have many synonyms ('logon', 'login', 'authentication', 'password accepted', etc.). I don't think it matters whether these are expressed in word lists or numbers/indices -- maybe this is a syntax-level declaration. However, what I do think is important, is that there be: (1) a way to express event types not defined within the current 'official' taxonomy, and (2) a way to express specific details/names relating to each word choice (e.g., 'account' == Joe, 'file' == /etc/passwd) |
|||||||||||||||||
|
Tina Bird
|
Nicely worded, Bill. I concur. A couple of minor comments below:
> Should CEE support: > > 1. Hierarchical/structured taxonomies > > My opinion: > Any structure implied to data is based on a > particular > view or use case. There are definite benefits > to this > as both Dave and John have pointed out. > However, without > narrowing the scope of logs & CEE I do not > see how a > all event types can be represented in a > universally > applicable hierarchy, nor do I see any value > in defining > multiple hierarchies. Perhaps what CEE should incorporate is a method for building these hierarchies, to maximize the chances that users (and vendors) will create hierarchies that other organizations would be able to use? > 2. Enumerated taxonomy choices > > My opinion: > I think that CEE needs to provide some > direction as to > the words (subject, object, action) people > should choose. > Too many words are overloaded ('user', > 'logon') and others > have many synonyms ('logon', 'login', > 'authentication', > 'password accepted', etc.). > I don't think it matters whether these are > expressed in > word lists or numbers/indices -- maybe this > is a syntax-level > declaration. I agree. > However, what I do think is important, is > that there be: > (1) a way to express event types not defined > within the current > 'official' taxonomy, and > (2) a way to express specific details/names > relating to each word > choice (e.g., 'account' == Joe, 'file' == > /etc/passwd) Since there's no way we'll be able to capture every kind of data that all the humans, machines and software out there are likely to use, we *have* to have a defined mechanism for defining types locally, whether or not they would ever be added to the "official" list; as well as having some kind of mechanism for nominating new event types to the official ones. Item 2 suggests a specific "structure" that identifies attribute/value pairs within a given dataset? That makes sense to me. cheers -- tbird |
|||||||||||||||||
|
Tina Bird
|
|