Hello,
I agree with this.
I believe that we can carefully choose event fields and taxonomy
so that it can be stored either in XML, one-line text or binary formats. Each
format has its own advantages and drawbacks.
XML is an interesting syntax for people working on a new format,
because everybody can easily understand the fields and quickly develop test implementations
without having to create a new parser/encoder. Then when it's time for
operational use, it may always be converted to simple text or binary to become
more effective.
One useful thing is to keep the XML schema as straightforward as
possible. For example each field should be easily extracted with a simple XPath
expression such as "/event/action" or "/event/source/address".
Philippe.
Hey Daniel,
I am not a huge XML fan- I think people often tend to leap to XML as a solution
before they understand the problem- but I don't think it's inappropriate in
this case and I will explain why.
I want to separate the concept of log storage from the concept of data
exchange.
For data interchange between different systems- for example between a host and
a log collector- XML was designed to solve this kind of problem. By
making the data carry its own description (called "infoset" in
XMLspeak) and by using a well-understood format, XML minimizes the chance for
error between the two systems and minimizes developer burden.
For log storage, XML is horrible. First, since the log storage format is
primarily consumed by the system generating the log, which completely understands
the log format, storing an XML infoset (and XML formatting) is a huge waste of
space. Also, the primary file i/o operation on a log is append. You
can't append well-formed XML to well-formed XML and end up with well-formed
XML. QED
I don't really even want to discuss local log formats, because there is little
likelihood Microsoft would adopt such a format in Windows. However I
would point out that if your log can be easily translated into XML, then XSL
makes it easy to express in just about any format. I think that if any
non-XML format is proposed by the working group then my recommendation would be
that the format be trivially and unambiguously convertible to XML- that is,
that if the WG proposes something else, that we also demonstrate precisely how
to convert to XML.
I disagree that XML parsing is expensive. I would have agreed a couple of
years ago, but since then I prototyped SEM using XML representation of events
and, done properly, it's only marginally more expensive than parsing any other
text format. If you're doing high-volume work, then you need to use
simple schemas and SAX parsing (which validates as a side effect). DOM is
probably not performant enough for most log-related purposes, and DOM with
validation is definitely not. NB There are a number of binary XML
representations that make XML much more compact, and parse more quickly.
Finally, as a practical matter, another format imposes the burden of writing
parsers, and of figuring out how to validate if you have a well-formed piece of
formatted file. IMO developers frequently make mistakes in this
area. XML makes this unambiguous and simple- the tools already exist and
there is an unambiguous way to describe the data format, XSD.
Eric
________________________________________
From: Daniel Cid [dcid@...]
Sent: Saturday, September 29, 2007 6:21 AM
To: Eric Fitzgerald
Cc: CEE-DISCUSSION-LIST@...
Subject: Re: [CEE-DISCUSSION-LIST] Standardizing log contents (aka "taxono
my") for CEE
Hi Eric,
I disagree regarding XML. It is expensive to parse, wasteful of
resources (too much duplicated information) and hard to read. I can't
imagine a project switching from a one-line
log, like syslog to XML (yes, I doubt that OpenSSH, Apache and others
would do that).
In addition to that, most admins* will not like the idea that their
"cat/grep" foo will not work
anymore with a multi-line log...
I like the way CEF works, with pipes as separators. We could use
something like that or
even tabs or other less used character.... but let's keep it simple.
UNLESS, we do something like tcpdump for pcaps, where the logs could
even be in binary
format, and we just run "log-dump" and get the logs line per line...
Hum... :)
*Looking from a Unix point of view.
Thanks,
--
Daniel B. Cid
dcid ( at ) ossec.net
On 9/28/07, Eric Fitzgerald <Eric.Fitzgerald@...> wrote:
> One of the 4 core items on the CEE to-do list was to define an interchange
format.
>
> I am no fan of XML, but it seems we have an appropriate use case here-
interchange of data between disparate systems.
>
> Don't forget that we have to make the solution attractive to
developers. "Yet another format == yet another parser" and the
solution starts looking less desirable.
>
> Local storage is another matter; store in whatever format you want.
Just expose the data through in a standard format, and I think XML is
appropriate for that.
>
> On a separate note taxonomy is representable as xml in an unambiguous
way. Not advocating, just observing.
>
> Eric
>
>
> -----Original Message-----
> From: Raffael Marty [mailto:rmarty@...]
> Sent: Friday, September 28, 2007 12:53 PM
> To: CEE-DISCUSSION-LIST@...
> Subject: Re: [CEE-DISCUSSION-LIST] Standardizing log contents (aka
"taxono my") for CEE
>
> > Common XML schema for events and nothing else -> see IDMEF failed
> > standard.
> >
>
> What is this? XML? Don't get me started. And why just a schema for
> events? Who was saying that? We NEED a taxonomy. If anyone wants more
> reasons for having a taxonomy, have a look at the white paper or I
> will past here...
>
> -raffy
>