|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
interacting with scraped pages?Hi,
I have a simple piggybank scraper for half.ebay.com wishlist pages. In the last year they've added a "feature" to these pages where items have an expiry date, (typically 120 days out for new items), and each wishlist page has an [extend expiration] button along with a (implied "overall") checkbox that selects each individual item's (implied "applies to me too") checkbox. What I want to do is have my scraper check the (implied "overall") checkbox, and then click the [extend expiration] button on each page, before or after scraping said page. I'm not sure how to go about doing this -- anyone have any hints? thanks, =JeffH _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
|
|
Re: interacting with scraped pages?=JeffH wrote:
> Hi, > > I have a simple piggybank scraper for half.ebay.com wishlist pages. In the last > year they've added a "feature" to these pages where items have an expiry date, > (typically 120 days out for new items), and each wishlist page has an [extend > expiration] button along with a (implied "overall") checkbox that selects each > individual item's (implied "applies to me too") checkbox. > > What I want to do is have my scraper check the (implied "overall") checkbox, > and then click the [extend expiration] button on each page, before or after > scraping said page. > > I'm not sure how to go about doing this -- anyone have any hints? You probably want a greasemonkey script for such page interaction scripting, not a scraper. -- Stefano Mazzocchi Digital Libraries Research Group Research Scientist Massachusetts Institute of Technology E25-131, 77 Massachusetts Ave skype: stefanomazzocchi Cambridge, MA 02139-4307, USA email: stefanom at mit . edu ------------------------------------------------------------------- _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
|
|
Re: interacting with scraped pages?Stefano Mazzocchi wrote:
> > You probably want a greasemonkey script for such page interaction > scripting, not a scraper. yeah, that's sorta what I thought too, so I wonder if one can do both operations via one script, i.e. do the page manipulation things via greasemonkey api(s) and the scraping things via piggy bank api(s) from the same script? thanks, =JeffH _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
|
|
Re: interacting with scraped pages?=JeffH wrote:
> Stefano Mazzocchi wrote: > > > > You probably want a greasemonkey script for such page interaction > > scripting, not a scraper. > > yeah, that's sorta what I thought too, so I wonder if one can do both > operations via one script, i.e. do the page manipulation things via > greasemonkey api(s) and the scraping things via piggy bank api(s) from the same > script? No. This can't be done because Greasemonkey scripts run inside a special javascript sandbox that exposes the Greasemonkey APIs while Piggy Bank scrapers run into another sandbox that exposes the Piggy Bank APIs. There is no place where the two APIs are exposed at the same time. -- Stefano Mazzocchi Digital Libraries Research Group Research Scientist Massachusetts Institute of Technology E25-131, 77 Massachusetts Ave skype: stefanomazzocchi Cambridge, MA 02139-4307, USA email: stefanom at mit . edu ------------------------------------------------------------------- _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
|
|
Re: interacting with scraped pages?Stefano Mazzocchi wrote:
> > No. This can't be done because Greasemonkey scripts run inside a special > javascript sandbox that exposes the Greasemonkey APIs while Piggy Bank > scrapers run into another sandbox that exposes the Piggy Bank APIs. > There is no place where the two APIs are exposed at the same time. yeah, i was sorta afraid that might be the case. So, I wonder if one can construct a meta-script that invokes both? or, could the grease monkey sandbox be invoked from the other sandbox somehow? (heh, security r0015 undoubtedly consider this thought blasphemous/heresy/etc) I spose one could just have two buttons or whatever in browser chrome whatever, and have, say, the PB (piggy bank) script put up a message (or the greasemonkey (GM) script) about something like "u really oughta push that other button over there too while yer at it cuz that'll ensure blah blah blah wrt yer wishlist...", eh? thanks, =JeffH _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
|
|
Re: interacting with scraped pages?Ok, I have an idea -- i was just looking into how to write greasemonkey (GM)
scripts and how to handle multiple pages... I'd previously scrawled.. > I have a simple piggybank scraper for > half.ebay.com wishlist pages. In the last > year they've added a "feature" to these > pages where items have an expiry date, > (typically 120 days out for new items), > and each wishlist page has an [extend > expiration] button along with a (implied > "overall") checkbox that selects each > individual item's (implied "applies to me too") checkbox. Seems to me it'd be possible to set up a greasemonkey script that is effective on only individual half.ebay.com wishlist pages (even just my pages) and have it, onLoad of the page, auto-check the implied "overall" checkbox, and then (effectively) click on the [extend expiration] button. Thus when one goes to such a whishlist page, the "extend expiration" business happens automagically, and it'd seem that the page would take ~2x longer to load - to a user. And it ought to work for each page that PiggyBank (PB) processes. But I wonder about whether there'd be any timing issues between the PB script that's running through a list of URLs to process, and the GM script running briefly on each page at page load time. thoughts? thanks, =JeffH _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
|
|
Re: interacting with scraped pages?=JeffH wrote:
> Ok, I have an idea -- i was just looking into how to write greasemonkey (GM) > scripts and how to handle multiple pages... > > I'd previously scrawled.. > > I have a simple piggybank scraper for > > half.ebay.com wishlist pages. In the last > > year they've added a "feature" to these > > pages where items have an expiry date, > > (typically 120 days out for new items), > > and each wishlist page has an [extend > > expiration] button along with a (implied > > "overall") checkbox that selects each > > individual item's (implied "applies to me too") checkbox. > > Seems to me it'd be possible to set up a greasemonkey script that is effective > on only individual half.ebay.com wishlist pages (even just my pages) and have > it, onLoad of the page, auto-check the implied "overall" checkbox, and then > (effectively) click on the [extend expiration] button. Thus when one goes to > such a whishlist page, the "extend expiration" business happens automagically, > and it'd seem that the page would take ~2x longer to load - to a user. > > And it ought to work for each page that PiggyBank (PB) processes. But I wonder > about whether there'd be any timing issues between the PB script that's running > through a list of URLs to process, and the GM script running briefly on each > page at page load time. > > thoughts? not sure I understood your intentions completely but just keep in mind that anything a scraper can do can be done in a delayed thread, like this: function scrape() { var delay = 1000; // how many milliseconds of delay setTimeout(delay, function() { // do the work here }); } -- Stefano Mazzocchi Digital Libraries Research Group Research Scientist Massachusetts Institute of Technology E25-131, 77 Massachusetts Ave skype: stefanomazzocchi Cambridge, MA 02139-4307, USA email: stefanom at mit . edu ------------------------------------------------------------------- _______________________________________________ General mailing list General@... http://simile.mit.edu/mailman/listinfo/general |
| Free Forum Powered by Nabble | Forum Help |