|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
parsing atom 1.0 document is incompleteHi Folks,
I'm brand new to Rome, so apologies if this is a newbie question. I'm trying to read an ATOM 1.0 document into the atom Feed object, but while it is partially successful, it does not appear to get all of the relevant parts of the document. I am unsure whether this is because there is a problem with the source document or a problem with the way that I have read it in. I am reading and interrogating the document as follows: // "is" is a passed FileInputStream for the document on-disk XmlReader reader = new XmlReader(is); SyndFeedInput input = new SyndFeedInput(); SyndFeed feed = input.build(reader); Feed atom = (Feed) feed.createWireFeed(); // mine the atom feed URI uri_a = new URI(atom.getId()); String title = atom.getTitle(); List<Person> authors = atom.getAuthors(); List<Link> otherLinks = atom.getOtherLinks(); List<Category> categories = atom.getCategories(); Date updated = atom.getUpdated(); Generator generator = atom.getGenerator(); String rights = atom.getRights(); Of all these variables, "categories", "otherLinks" and "generator" are null. I did some debugging, and it appears that further to this all the atom entries are also null although I haven't yet tried to extract them through the API. The short atom document I am parsing is as follows; it definitely has categories, a generator, a number of Other Links, and two entries: <?xml version="1.0"?> <atom:feed xmlns:atom="http://www.w3.org/2005/Atom" xmlns:ore="http://www.openarchives.org/ore/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" \ xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:c3fn="http://www.cheshire3.org/ns/xsl/" xmlns:mesur="http://www.mesur.org/schemas/2007-01/mesur#"> <atom:id>http://foresite.cheshire3.org/jstor/j100000/i201523/3082143/ore/</atom:id> <atom:title>A Title</atom:title> <atom:author> <atom:name>An Author</atom:name> </atom:author> <atom:link rel="related" href="http://localhost/stable/view/3082143"/> <atom:category scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/Aggregation" label="Aggregation"/> <atom:category scheme="http://purl.org/eprint/type/" term="http://purl.org/eprint/type/JournalItem" label="Misc Journal Item"/> <rdf:Description about="http://localhost/j100000/i201523/3082143/ore/"> <ore:isAggregatedBy>http://localhost/j100000/i201523//ore/</ore:isAggregatedBy> <dcterms:created>2001-01-005T00:00:00Z</dcterms:created> </rdf:Description> <atom:link rel="self" type="application/atom+xml" href="http://localhost/j100000/i201523/3082143/ore/atom.xml"/> <atom:updated>2008-04-21T18:21:09Z</atom:updated> <atom:generator uri="http://localhost/xsl/">XSLT Stylesheet</atom:generator> <atom:rights>This resource map has unknown rights</atom:rights> <rdf:Description about="http://localhost/j100000/i201523/3082143/ore/atom.xml"> <dcterms:created>2008-04-18T17:30:00Z</dcterms:created> </rdf:Description> <atom:entries> <atom:entry> <atom:id>http://localhost/j100000/i201523/3082143/ore/proxy/pdf</atom:id> <atom:title>the title (PDF)</atom:title> <atom:updated>2008-04-21T18:21:09Z</atom:updated> <atom:link rel="alternate" type="application/pdf" href="http://localhost/3082143.pdf"/> <atom:category scheme="http://purl.org/dc/dcmitype/" term="http://purl.org/dc/dcmitype/Text" label="Document"/> </atom:entry> <atom:entry> <atom:id>http://localhost/j100000/i201523/3082143/ore/proxy/page/1</atom:id> <atom:title>Page 1 of 'the title'</atom:title> <atom:updated>2008-04-21T18:21:09Z</atom:updated> <atom:link rel="alternate" href="http://localhost/3082143?seq=1"/> <atom:category scheme="http://purl.org/dc/dcmitype/" term="http://purl.org/dc/dcmitype/Image" label="Image"/> </atom:entry> </atom:entries> </atom:feed> Any clues as to where my problems lie would be gratefully received. Cheers, Richard --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
|
|
|
Re: parsing atom 1.0 document is incompleteHi Jeff,
> I can't answer your question directly, because I simply use the SyndFeed object. I have no problem getting to title, authors, links, categories, and the rest, on that object. > I've put some debug on this object as well, and it is certain that the SyndFeed that I get back here: SyndFeedInput input = new SyndFeedInput(); SyndFeed feed = input.build(reader); is lacking in all the details as well, so nothing is being lost in conversion to the atom wirefeed. Should I perhaps be configuring the SyndFeedInput class in some way in advance? It seems to correctly detect an atom 1.0 feed. > One other point: I wrap the InputStream in something more basic like InputStreamReader/BufferedReader, not in XmlReader. But then again, people tell me my codee has problems handling special characters ;-) > :) I tried switching to using an InputStreamReader, but it didn't make any difference. I have now stripped all the actual content which is recognised from my atom source document, so I am just trying to parse the following: <?xml version="1.0"?> <atom:feed xmlns:atom="http://www.w3.org/2005/Atom" xmlns:ore="http://www.openarchives.org/ore/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <atom:link rel="related" href="http://localhost/3082143"/> <atom:category scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/Aggregation" label="Aggregation"/> <atom:category scheme="http://purl.org/eprint/type/" term="http://purl.org/eprint/type/JournalItem" label="Misc Journal Item"/> <atom:link rel="self" type="application/atom+xml" href="http://localhost/j100000/i201523/3082143/ore/atom.xml"/> <atom:generator uri="http://localhost/xsl/">XSLT Stylesheet</atom:generator> <atom:entries> <atom:entry> <atom:id>http://localhost/j100000/i201523/3082143/ore/proxy/pdf</atom:id> <atom:title>The Title (PDF)</atom:title> <atom:updated>2008-04-21T18:21:09Z</atom:updated> <atom:link rel="alternate" type="application/pdf" href="http://localhost/3082143.pdf"/> <atom:category scheme="http://purl.org/dc/dcmitype/" term="http://purl.org/dc/dcmitype/Text" label="Document"/> </atom:entry> <atom:entry> <atom:id>http://localhost/j100000/i201523/3082143/ore/proxy/page/1</atom:id> <atom:title>Page 1 of 'The Title'</atom:title> <atom:updated>2008-04-21T18:21:09Z</atom:updated> <atom:link rel="alternate" href="http://localhost/3082143?seq=1"/> <atom:category scheme="http://purl.org/dc/dcmitype/" term="http://purl.org/dc/dcmitype/Image" label="Image"/> </atom:entry> </atom:entries> </atom:feed> I ran .toString() on both the SyndFeed and the atom Feed objects, with the following results SyndFeed: SyndFeedImpl.contributors=null SyndFeedImpl.link=null SyndFeedImpl.foreignMarkup=[] SyndFeedImpl.image=null SyndFeedImpl.links=null SyndFeedImpl.copyright=null SyndFeedImpl.interface=interface com.sun.syndication.feed.synd.SyndFeed SyndFeedImpl.descriptionEx=null SyndFeedImpl.supportedFeedTypes[0]=rss_0.91N SyndFeedImpl.supportedFeedTypes[1]=rss_0.93 SyndFeedImpl.supportedFeedTypes[2]=rss_0.92 SyndFeedImpl.supportedFeedTypes[3]=rss_1.0 SyndFeedImpl.supportedFeedTypes[4]=rss_0.94 SyndFeedImpl.supportedFeedTypes[5]=rss_2.0 SyndFeedImpl.supportedFeedTypes[6]=rss_0.91U SyndFeedImpl.supportedFeedTypes[7]=rss_0.9 SyndFeedImpl.supportedFeedTypes[8]=atom_1.0 SyndFeedImpl.supportedFeedTypes[9]=atom_0.3 SyndFeedImpl.uri=null SyndFeedImpl.titleEx=null SyndFeedImpl.author=null SyndFeedImpl.authors=null SyndFeedImpl.title=null SyndFeedImpl.feedType=atom_1.0 SyndFeedImpl.description=null SyndFeedImpl.encoding=null SyndFeedImpl.entries=[] SyndFeedImpl.categories=[] SyndFeedImpl.publishedDate=null SyndFeedImpl.language=null SyndFeedImpl.modules[0].publishers=[] SyndFeedImpl.modules[0].subjects=[] SyndFeedImpl.modules[0].identifiers=[] SyndFeedImpl.modules[0].coverages=[] SyndFeedImpl.modules[0].subject=null SyndFeedImpl.modules[0].rights=null SyndFeedImpl.modules[0].date=null SyndFeedImpl.modules[0].type=null SyndFeedImpl.modules[0].descriptions=[] SyndFeedImpl.modules[0].sources=[] SyndFeedImpl.modules[0].creator=null SyndFeedImpl.modules[0].formats=[] SyndFeedImpl.modules[0].publisher=null SyndFeedImpl.modules[0].languages=[] SyndFeedImpl.modules[0].title=null SyndFeedImpl.modules[0].dates=[] SyndFeedImpl.modules[0].description=null SyndFeedImpl.modules[0].contributor=null SyndFeedImpl.modules[0].types=[] SyndFeedImpl.modules[0].contributors=[] SyndFeedImpl.modules[0].relation=null SyndFeedImpl.modules[0].format=null SyndFeedImpl.modules[0].interface=interface com.sun.syndication.feed.module.DCModule SyndFeedImpl.modules[0].uri=http://purl.org/dc/elements/1.1/ SyndFeedImpl.modules[0].creators=[] SyndFeedImpl.modules[0].source=null SyndFeedImpl.modules[0].coverage=null SyndFeedImpl.modules[0].relations=[] SyndFeedImpl.modules[0].rightsList=[] SyndFeedImpl.modules[0].titles=[] SyndFeedImpl.modules[0].language=null SyndFeedImpl.modules[0].identifier=null Atom Feed: Feed.rights=null Feed.alternateLinks=[] Feed.xmlBase=null Feed.titleEx=null Feed.info=null Feed.id=null Feed.authors=[] Feed.title=null Feed.updated=null Feed.encoding=null Feed.generator=null Feed.entries=[] Feed.icon=null Feed.logo=null Feed.contributors=[] Feed.otherLinks=[] Feed.foreignMarkup=[] Feed.copyright=null Feed.modified=null Feed.feedType=atom_1.0 Feed.subtitle=null Feed.categories=null Feed.language=null Feed.modules[0].publishers=[] Feed.modules[0].subjects=[] Feed.modules[0].identifiers=[] Feed.modules[0].coverages=[] Feed.modules[0].subject=null Feed.modules[0].rights=null Feed.modules[0].date=null Feed.modules[0].type=null Feed.modules[0].descriptions=[] Feed.modules[0].sources=[] Feed.modules[0].creator=null Feed.modules[0].formats=[] Feed.modules[0].publisher=null Feed.modules[0].languages=[] Feed.modules[0].title=null Feed.modules[0].dates=[] Feed.modules[0].description=null Feed.modules[0].contributor=null Feed.modules[0].types=[] Feed.modules[0].contributors=[] Feed.modules[0].relation=null Feed.modules[0].format=null Feed.modules[0].interface=interface com.sun.syndication.feed.module.DCModule Feed.modules[0].uri=http://purl.org/dc/elements/1.1/ Feed.modules[0].creators=[] Feed.modules[0].source=null Feed.modules[0].coverage=null Feed.modules[0].relations=[] Feed.modules[0].rightsList=[] Feed.modules[0].titles=[] Feed.modules[0].language=null Feed.modules[0].identifier=null Feed.tagline=null I'm totally stumped :) Any other ideas? Cheers, Richard --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: parsing atom 1.0 document is incompleteHi Jeff,
I have found my answer, which is to read in the InputStream like this instead: XmlReader reader = new XmlReader(is); WireFeedInput input = new WireFeedInput(); Feed atom = (Feed) input.build(reader); in addition, there was an unnecessary/illegal <entities> tag which was further messing things up! Thanks for your help, Richard > Hi Jeff, > > >> I can't answer your question directly, because I simply use the SyndFeed object. I have no problem getting to title, authors, links, categories, and the rest, on that object. >> >> > > I've put some debug on this object as well, and it is certain that the > SyndFeed that I get back here: > > SyndFeedInput input = new SyndFeedInput(); > SyndFeed feed = input.build(reader); > > is lacking in all the details as well, so nothing is being lost in > conversion to the atom wirefeed. > > Should I perhaps be configuring the SyndFeedInput class in some way in > advance? It seems to correctly detect an atom 1.0 feed. > >> One other point: I wrap the InputStream in something more basic like InputStreamReader/BufferedReader, not in XmlReader. But then again, people tell me my codee has problems handling special characters ;-) >> >> > :) > > I tried switching to using an InputStreamReader, but it didn't make any > difference. > > I have now stripped all the actual content which is recognised from my > atom source document, so I am just trying to parse the following: > > <?xml version="1.0"?> > <atom:feed xmlns:atom="http://www.w3.org/2005/Atom" > xmlns:ore="http://www.openarchives.org/ore/terms/" > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > xmlns:dcterms="http://purl.org/dc/terms/" > xmlns:dc="http://purl.org/dc/elements/1.1/"> > > <atom:link rel="related" href="http://localhost/3082143"/> > <atom:category scheme="http://www.openarchives.org/ore/terms/" > term="http://www.openarchives.org/ore/terms/Aggregation" > label="Aggregation"/> > <atom:category scheme="http://purl.org/eprint/type/" > term="http://purl.org/eprint/type/JournalItem" label="Misc Journal Item"/> > <atom:link rel="self" type="application/atom+xml" > href="http://localhost/j100000/i201523/3082143/ore/atom.xml"/> > <atom:generator uri="http://localhost/xsl/">XSLT > Stylesheet</atom:generator> > > <atom:entries> > > <atom:entry> > > <atom:id>http://localhost/j100000/i201523/3082143/ore/proxy/pdf</atom:id> > <atom:title>The Title (PDF)</atom:title> > <atom:updated>2008-04-21T18:21:09Z</atom:updated> > <atom:link rel="alternate" type="application/pdf" > href="http://localhost/3082143.pdf"/> > <atom:category scheme="http://purl.org/dc/dcmitype/" > term="http://purl.org/dc/dcmitype/Text" label="Document"/> > </atom:entry> > > <atom:entry> > > <atom:id>http://localhost/j100000/i201523/3082143/ore/proxy/page/1</atom:id> > <atom:title>Page 1 of 'The Title'</atom:title> > <atom:updated>2008-04-21T18:21:09Z</atom:updated> > <atom:link rel="alternate" href="http://localhost/3082143?seq=1"/> > <atom:category scheme="http://purl.org/dc/dcmitype/" > term="http://purl.org/dc/dcmitype/Image" label="Image"/> > </atom:entry> > > </atom:entries> > > </atom:feed> > > I ran .toString() on both the SyndFeed and the atom Feed objects, with > the following results > > SyndFeed: > > SyndFeedImpl.contributors=null > SyndFeedImpl.link=null > SyndFeedImpl.foreignMarkup=[] > SyndFeedImpl.image=null > SyndFeedImpl.links=null > SyndFeedImpl.copyright=null > SyndFeedImpl.interface=interface com.sun.syndication.feed.synd.SyndFeed > SyndFeedImpl.descriptionEx=null > SyndFeedImpl.supportedFeedTypes[0]=rss_0.91N > SyndFeedImpl.supportedFeedTypes[1]=rss_0.93 > SyndFeedImpl.supportedFeedTypes[2]=rss_0.92 > SyndFeedImpl.supportedFeedTypes[3]=rss_1.0 > SyndFeedImpl.supportedFeedTypes[4]=rss_0.94 > SyndFeedImpl.supportedFeedTypes[5]=rss_2.0 > SyndFeedImpl.supportedFeedTypes[6]=rss_0.91U > SyndFeedImpl.supportedFeedTypes[7]=rss_0.9 > SyndFeedImpl.supportedFeedTypes[8]=atom_1.0 > SyndFeedImpl.supportedFeedTypes[9]=atom_0.3 > SyndFeedImpl.uri=null > SyndFeedImpl.titleEx=null > SyndFeedImpl.author=null > SyndFeedImpl.authors=null > SyndFeedImpl.title=null > SyndFeedImpl.feedType=atom_1.0 > SyndFeedImpl.description=null > SyndFeedImpl.encoding=null > SyndFeedImpl.entries=[] > SyndFeedImpl.categories=[] > SyndFeedImpl.publishedDate=null > SyndFeedImpl.language=null > SyndFeedImpl.modules[0].publishers=[] > SyndFeedImpl.modules[0].subjects=[] > SyndFeedImpl.modules[0].identifiers=[] > SyndFeedImpl.modules[0].coverages=[] > SyndFeedImpl.modules[0].subject=null > SyndFeedImpl.modules[0].rights=null > SyndFeedImpl.modules[0].date=null > SyndFeedImpl.modules[0].type=null > SyndFeedImpl.modules[0].descriptions=[] > SyndFeedImpl.modules[0].sources=[] > SyndFeedImpl.modules[0].creator=null > SyndFeedImpl.modules[0].formats=[] > SyndFeedImpl.modules[0].publisher=null > SyndFeedImpl.modules[0].languages=[] > SyndFeedImpl.modules[0].title=null > SyndFeedImpl.modules[0].dates=[] > SyndFeedImpl.modules[0].description=null > SyndFeedImpl.modules[0].contributor=null > SyndFeedImpl.modules[0].types=[] > SyndFeedImpl.modules[0].contributors=[] > SyndFeedImpl.modules[0].relation=null > SyndFeedImpl.modules[0].format=null > SyndFeedImpl.modules[0].interface=interface > com.sun.syndication.feed.module.DCModule > SyndFeedImpl.modules[0].uri=http://purl.org/dc/elements/1.1/ > SyndFeedImpl.modules[0].creators=[] > SyndFeedImpl.modules[0].source=null > SyndFeedImpl.modules[0].coverage=null > SyndFeedImpl.modules[0].relations=[] > SyndFeedImpl.modules[0].rightsList=[] > SyndFeedImpl.modules[0].titles=[] > SyndFeedImpl.modules[0].language=null > SyndFeedImpl.modules[0].identifier=null > > > Atom Feed: > > Feed.rights=null > Feed.alternateLinks=[] > Feed.xmlBase=null > Feed.titleEx=null > Feed.info=null > Feed.id=null > Feed.authors=[] > Feed.title=null > Feed.updated=null > Feed.encoding=null > Feed.generator=null > Feed.entries=[] > Feed.icon=null > Feed.logo=null > Feed.contributors=[] > Feed.otherLinks=[] > Feed.foreignMarkup=[] > Feed.copyright=null > Feed.modified=null > Feed.feedType=atom_1.0 > Feed.subtitle=null > Feed.categories=null > Feed.language=null > Feed.modules[0].publishers=[] > Feed.modules[0].subjects=[] > Feed.modules[0].identifiers=[] > Feed.modules[0].coverages=[] > Feed.modules[0].subject=null > Feed.modules[0].rights=null > Feed.modules[0].date=null > Feed.modules[0].type=null > Feed.modules[0].descriptions=[] > Feed.modules[0].sources=[] > Feed.modules[0].creator=null > Feed.modules[0].formats=[] > Feed.modules[0].publisher=null > Feed.modules[0].languages=[] > Feed.modules[0].title=null > Feed.modules[0].dates=[] > Feed.modules[0].description=null > Feed.modules[0].contributor=null > Feed.modules[0].types=[] > Feed.modules[0].contributors=[] > Feed.modules[0].relation=null > Feed.modules[0].format=null > Feed.modules[0].interface=interface com.sun.syndication.feed.module.DCModule > Feed.modules[0].uri=http://purl.org/dc/elements/1.1/ > Feed.modules[0].creators=[] > Feed.modules[0].source=null > Feed.modules[0].coverage=null > Feed.modules[0].relations=[] > Feed.modules[0].rightsList=[] > Feed.modules[0].titles=[] > Feed.modules[0].language=null > Feed.modules[0].identifier=null > Feed.tagline=null > > I'm totally stumped :) Any other ideas? > > Cheers, > > Richard > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@... > For additional commands, e-mail: users-help@... > > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: parsing atom 1.0 document is incompleteI see the same problem concerning Atom 1.0 feeds using the Rome Fetcher subproject, any ideas on how I can get the Atom 1.0 feeds to populate the SyndEntry items correctly? >I have found my answer, which is to read in the InputStream like this >instead: > >XmlReader reader = new XmlReader(is); >WireFeedInput input = new WireFeedInput(); >Feed atom = (Feed) input.build(reader); |
|
|
Re: parsing atom 1.0 document is incompleteHi,
> I see the same problem concerning Atom 1.0 feeds using the Rome Fetcher > subproject, any ideas on how I can get the Atom 1.0 feeds to populate the > SyndEntry items correctly? > I haven't been able to get Rome to ever correctly construct SyndFeeds. The only mechanism that seems to work is to use the WireFeed method below. There seems to be a problem during serialisation too, see upcoming email ... > >> I have found my answer, which is to read in the InputStream like this >> instead: >> >> XmlReader reader = new XmlReader(is); >> WireFeedInput input = new WireFeedInput(); >> Feed atom = (Feed) input.build(reader); >> > > > -- > View this message in context: http://www.nabble.com/parsing-atom-1.0-document-is-incomplete-tp16963850p18222880.html > Sent from the Rome - Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@... > For additional commands, e-mail: users-help@... > > -- ======================================================================= Richard Jones | Hewlett-Packard Limited Research Engineer, HP Labs | registered office: Bristol, UK | Cain Road, Bracknell, | Berks, RG12 1HN. | Registered No: 690597 England eml: richard.d.jones@... ------------------------------------- blg: http://chronicles-of-richard.blogspot.com/ ----------------------------------------------------------------------- The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL". ======================================================================== --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free Forum Powered by Nabble | Forum Help |