modeling challenges

View: New views
5 Messages — Rating Filter:   Alert me  

modeling challenges

by geiser ch. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi eXist universe
Coming from an SQL world where plenty of modeling tools exist, I kind of feel that I would need some advice on best modeling approach in an xml environment.
I am about to start building a new web-app and I would like to test the xforms/rest/xquery paradigm and therefore use eXist as a key piece for this. After some research on the web, I am still facing two main difficulties:
- I am still not really sure how to shift from the entity/relationship conceptual framework or how to make it properly fit in an xml hierarchical world. Also, though I see xml schema as a key building blocks in the modeling process, I wish I could stay at a higher conceptual level in the early phase of modeling. I came across the asset object modeling approach (http://www.aomodeling.org/KLEEN%20WhitePaper.pdf) which seems quite interesting in this case (though I could not make the tool work, and there does not seem to be any active user group). Does anybody uses a similar approach or could recommend one? I guess (hope?) I am not the only one experimenting this as a real challenge.
- This is more related to eXist. Is there any best practices or resources on how to structure the xml database re a model (e.g. what would make a good hierarchy of collection, how best to implement multiple relationship, how to enforce relational integrity,...). At this stage I am not so concerned with performance issues, but rather try to come up with a consistent development approach – it often pays off in the long run...
I know there is no simple answer to those questions, but any suggestion / recommendations would be welcome
Thank you
Christophe

Re: modeling challenges

by Chris Wallace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Christophe

These are all good questions.  Here are a few thoughts gleaned from the past couple of years working on project which involves quite large number of schemas spanning the range from tabular to hierarchical structured data and teaching both Relational and XML databases.  The application ( a faculty information system to support staff, students and administrators) is technically rather benign but is rich in data and needs to be very responsive to changed needs - an 'agile information system'.  

I believe ER modelling is just as important for an XML database as it is for relational databases.  That is conceptual ER modelling at a high enough level to abstract over the way relationships are implemented.  A good tool should be able to generated either a relational model or a multiple-schema model although it would need some hinting. I haven't found such a tool although QSEE has some useful features. Largely I think of XML databases as object-oriented databases but with meaningful keys, no methods (namespaced functions have to do instead) and no inheritance.

The main difference is in the modelling of composition or whole part relationships with parent/child relationships rather than foreign keys, which is the only mechanism for implementing relationships in RDBMS.  In my domain,  composition relationships account for about 80% of relationships, hence a significant reduction in the database complexity. The remainder of the (associative) relationships have to be represented using foreign keys, and as in RDBMS, any shared data values are usable.  However the ability to implement repeated elements in XML means that there is more flexibility in how one-many and especially many-many relationships are implemented.  Rather than necessitating a link table, it is common to implement many-many relationships with repeated foreign keys on whichever side of the relationship is most salient - surprisingly often this is an easy decision and reflects the real-world ownership of the relationship. Often the relationships are structured or ordered on one side.

We also make use of repeated elements as primary keys - for example people are identified in our system by name (a key idea is to minimise the distance between real-world data and system data) , but they can have multiple names so people can add names, and promote one to be the preferred name  but never 'change' name.

As far as integrity is concerned, I have a rather lax view.  In my domain, the database is almost never consistent since changes are made to different entities at different times by different people.  I want the database to  help in the concurrent development of the content,  and if this leads to inconsistencies which have to be analysed on demand, so be it - the web is never consistent either!  So I concentrated on the definition of rules covering both simple integrity, business rules and data quality and analysis of the database.

Thanks for the thought-provoking question - I will try to distill my experience for an article in the wiki or elsewhere . I don't know about asset-oriented modelling and will take a look.

Chris Wallace
UWE Bristol


geiser ch. wrote:
Hi eXist universe
Coming from an SQL world where plenty of modeling tools exist, I kind of feel that I would need some advice on best modeling approach in an xml environment.
I am about to start building a new web-app and I would like to test the xforms/rest/xquery paradigm and therefore use eXist as a key piece for this. After some research on the web, I am still facing two main difficulties:
- I am still not really sure how to shift from the entity/relationship conceptual framework or how to make it properly fit in an xml hierarchical world. Also, though I see xml schema as a key building blocks in the modeling process, I wish I could stay at a higher conceptual level in the early phase of modeling. I came across the asset object modeling approach (http://www.aomodeling.org/KLEEN%20WhitePaper.pdf) which seems quite interesting in this case (though I could not make the tool work, and there does not seem to be any active user group). Does anybody uses a similar approach or could recommend one? I guess (hope?) I am not the only one experimenting this as a real challenge.
- This is more related to eXist. Is there any best practices or resources on how to structure the xml database re a model (e.g. what would make a good hierarchy of collection, how best to implement multiple relationship, how to enforce relational integrity,...). At this stage I am not so concerned with performance issues, but rather try to come up with a consistent development approach – it often pays off in the long run...
I know there is no simple answer to those questions, but any suggestion / recommendations would be welcome
Thank you
Christophe

Re: modeling challenges

by geiser ch. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Chris
Thanks a lot for sharing those thoughts, that is really helpful.
For those who - like me - lack experience in xml modeling, I also came across this article that helped me clarify some concepts:
http://www.xml.com/pub/a/2002/12/04/normalizing.html

Cheers
Christophe

Chris Wallace wrote:
Hi Christophe

These are all good questions.  Here are a few thoughts gleaned from the past couple of years working on project which involves quite large number of schemas spanning the range from tabular to hierarchical structured data and teaching both Relational and XML databases.  The application ( a faculty information system to support staff, students and administrators) is technically rather benign but is rich in data and needs to be very responsive to changed needs - an 'agile information system'.  

I believe ER modelling is just as important for an XML database as it is for relational databases.  That is conceptual ER modelling at a high enough level to abstract over the way relationships are implemented.  A good tool should be able to generated either a relational model or a multiple-schema model although it would need some hinting. I haven't found such a tool although QSEE has some useful features. Largely I think of XML databases as object-oriented databases but with meaningful keys, no methods (namespaced functions have to do instead) and no inheritance.

The main difference is in the modelling of composition or whole part relationships with parent/child relationships rather than foreign keys, which is the only mechanism for implementing relationships in RDBMS.  In my domain,  composition relationships account for about 80% of relationships, hence a significant reduction in the database complexity. The remainder of the (associative) relationships have to be represented using foreign keys, and as in RDBMS, any shared data values are usable.  However the ability to implement repeated elements in XML means that there is more flexibility in how one-many and especially many-many relationships are implemented.  Rather than necessitating a link table, it is common to implement many-many relationships with repeated foreign keys on whichever side of the relationship is most salient - surprisingly often this is an easy decision and reflects the real-world ownership of the relationship. Often the relationships are structured or ordered on one side.

We also make use of repeated elements as primary keys - for example people are identified in our system by name (a key idea is to minimise the distance between real-world data and system data) , but they can have multiple names so people can add names, and promote one to be the preferred name  but never 'change' name.

As far as integrity is concerned, I have a rather lax view.  In my domain, the database is almost never consistent since changes are made to different entities at different times by different people.  I want the database to  help in the concurrent development of the content,  and if this leads to inconsistencies which have to be analysed on demand, so be it - the web is never consistent either!  So I concentrated on the definition of rules covering both simple integrity, business rules and data quality and analysis of the database.

Thanks for the thought-provoking question - I will try to distill my experience for an article in the wiki or elsewhere . I don't know about asset-oriented modelling and will take a look.

Chris Wallace
UWE Bristol

Re: modeling challenges

by Chris Wallace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Christophe

Yes Will's articles are very useful.  I often encounter the view that XML data is un-normalised (in the sense of non-redundant) which I assume is because they have seen data transport XML schemas where de-normalisation is common. Read-only XML documents often trade ease of navigation for document size and redundancy because the documents are not intended to be updatable.  XML designed for data storage is quite different and should be designed to avoid redundancy in just the same way as a relational database. It seems to me that XML databases are easier to normalise because their non-first normal form structure allows those sneaky bits of denormalisation which occur with 1 to 1.0001 relationships can be represented properly without the pain of another table.

Will discusses the use of the XML schema constructs of key and keyref.  These may be useful in a transport use case  where the whole of a domain is represented and the task is to check that a generated XML document conforms to its schema before accepting the document.  For XML databases they are less useful because they only work in document scope.  If the XML database designer organises the logical data storage as files in collections, these constructs have the wrong scope; if she organises all data in a single document, the cost of revalidating the whole database every update would be prohibitive.  Hence a rather unsatisfactory set of adhoc mechanisms seem to be used to check if not enforce integrity.  eXIst supports triggers which are useful here but I'm ashamed to say I have not used myself.

Chris


geiser ch. wrote:
Hi Chris
Thanks a lot for sharing those thoughts, that is really helpful.
For those who - like me - lack experience in xml modeling, I also came across this article that helped me clarify some concepts:
http://www.xml.com/pub/a/2002/12/04/normalizing.html

Cheers
Christophe

Re: modeling challenges

by Adam Retter-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I believe ER modelling is just as important for an XML database as it is for
> relational databases.  That is conceptual ER modelling at a high enough
> level to abstract over the way relationships are implemented.  A good tool
> should be able to generated either a relational model or a multiple-schema
> model although it would need some hinting. I haven't found such a tool
> although QSEE has some useful features. Largely I think of XML databases as
> object-oriented databases but with meaningful keys, no methods (namespaced
> functions have to do instead) and no inheritance.

Whilst not strictly ER, this may be of interest -
http://www.sparxsystems.com.au/products/ea/features.html


--
Adam Retter

eXist Developer
{ England }
adam@...
irc://irc.freenode.net/existdb

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open
LightInTheBox - Buy quality products at wholesale price