interacting with scraped pages?

View: New views
7 Messages — Rating Filter:   Alert me  

interacting with scraped pages?

by =JeffH-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I have a simple piggybank scraper for half.ebay.com wishlist pages. In the last
year they've added a "feature" to these pages where items have an expiry date,
(typically 120 days out for new items), and each wishlist page has an [extend
expiration] button along with a (implied "overall") checkbox that selects each
individual item's (implied "applies to me too") checkbox.

What I want to do is have my scraper check the (implied "overall") checkbox,
and then click the [extend expiration] button on each page, before or after
scraping said page.

I'm not sure how to go about doing this -- anyone have any hints?

thanks,

=JeffH

_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general

Re: interacting with scraped pages?

by Stefano Mazzocchi-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

=JeffH wrote:

> Hi,
>
> I have a simple piggybank scraper for half.ebay.com wishlist pages. In the last
> year they've added a "feature" to these pages where items have an expiry date,
> (typically 120 days out for new items), and each wishlist page has an [extend
> expiration] button along with a (implied "overall") checkbox that selects each
> individual item's (implied "applies to me too") checkbox.
>
> What I want to do is have my scraper check the (implied "overall") checkbox,
> and then click the [extend expiration] button on each page, before or after
> scraping said page.
>
> I'm not sure how to go about doing this -- anyone have any hints?

You probably want a greasemonkey script for such page interaction
scripting, not a scraper.

--
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general

Re: interacting with scraped pages?

by =JeffH-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stefano Mazzocchi wrote:
 >
 > You probably want a greasemonkey script for such page interaction
 > scripting, not a scraper.

yeah, that's sorta what I thought too, so I wonder if one can do both
operations via one script, i.e. do the page manipulation things via
greasemonkey api(s) and the scraping things via piggy bank api(s) from the same
script?

thanks,

=JeffH



_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general

Re: interacting with scraped pages?

by Stefano Mazzocchi-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

=JeffH wrote:
> Stefano Mazzocchi wrote:
>  >
>  > You probably want a greasemonkey script for such page interaction
>  > scripting, not a scraper.
>
> yeah, that's sorta what I thought too, so I wonder if one can do both
> operations via one script, i.e. do the page manipulation things via
> greasemonkey api(s) and the scraping things via piggy bank api(s) from the same
> script?

No. This can't be done because Greasemonkey scripts run inside a special
javascript sandbox that exposes the Greasemonkey APIs while Piggy Bank
scrapers run into another sandbox that exposes the Piggy Bank APIs.
There is no place where the two APIs are exposed at the same time.

--
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general

Re: interacting with scraped pages?

by =JeffH-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stefano Mazzocchi wrote:
 >
 > No. This can't be done because Greasemonkey scripts run inside a special
 > javascript sandbox that exposes the Greasemonkey APIs while Piggy Bank
 > scrapers run into another sandbox that exposes the Piggy Bank APIs.
 > There is no place where the two APIs are exposed at the same time.

yeah, i was sorta afraid that might be the case.  So, I wonder if one can
construct a meta-script that invokes both? or, could the grease monkey sandbox
be invoked from the other sandbox somehow? (heh, security r0015 undoubtedly
consider this thought blasphemous/heresy/etc)

I spose one could just have two buttons or whatever in browser chrome whatever,
and have, say, the PB (piggy bank) script put up a message (or the greasemonkey
  (GM) script) about something like "u really oughta push that other button
over there too while yer at it cuz that'll ensure blah blah blah wrt yer
wishlist...", eh?

thanks,

=JeffH



_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general

Re: interacting with scraped pages?

by =JeffH-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ok, I have an idea -- i was just looking into how to write greasemonkey (GM)
scripts and how to handle multiple pages...

I'd previously scrawled..
 > I have a simple piggybank scraper for
 > half.ebay.com wishlist pages. In the last
 > year they've added a "feature" to these
 > pages where items have an expiry date,
 > (typically 120 days out for new items),
 > and each wishlist page has an [extend
 > expiration] button along with a (implied
 > "overall") checkbox that selects each
 > individual item's (implied "applies to me too") checkbox.

Seems to me it'd be possible to set up a greasemonkey script that is effective
on only individual half.ebay.com wishlist pages (even just my pages) and have
it, onLoad of the page, auto-check the implied "overall" checkbox, and then
(effectively) click on the [extend expiration] button. Thus when one goes to
such a whishlist page, the "extend expiration" business happens automagically,
and it'd seem that the page would take ~2x longer to load - to a user.

And it ought to work for each page that PiggyBank (PB) processes. But I wonder
about whether there'd be any timing issues between the PB script that's running
through a list of URLs to process, and the GM script running briefly on each
page at page load time.

thoughts?

thanks,

=JeffH

_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general

Re: interacting with scraped pages?

by Stefano Mazzocchi-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

=JeffH wrote:

> Ok, I have an idea -- i was just looking into how to write greasemonkey (GM)
> scripts and how to handle multiple pages...
>
> I'd previously scrawled..
>  > I have a simple piggybank scraper for
>  > half.ebay.com wishlist pages. In the last
>  > year they've added a "feature" to these
>  > pages where items have an expiry date,
>  > (typically 120 days out for new items),
>  > and each wishlist page has an [extend
>  > expiration] button along with a (implied
>  > "overall") checkbox that selects each
>  > individual item's (implied "applies to me too") checkbox.
>
> Seems to me it'd be possible to set up a greasemonkey script that is effective
> on only individual half.ebay.com wishlist pages (even just my pages) and have
> it, onLoad of the page, auto-check the implied "overall" checkbox, and then
> (effectively) click on the [extend expiration] button. Thus when one goes to
> such a whishlist page, the "extend expiration" business happens automagically,
> and it'd seem that the page would take ~2x longer to load - to a user.
>
> And it ought to work for each page that PiggyBank (PB) processes. But I wonder
> about whether there'd be any timing issues between the PB script that's running
> through a list of URLs to process, and the GM script running briefly on each
> page at page load time.
>
> thoughts?

not sure I understood your intentions completely but just keep in mind
that anything a scraper can do can be done in a delayed thread, like this:

function scrape() {
    var delay = 1000; // how many milliseconds of delay
    setTimeout(delay, function() {
      // do the work here
    });
}

--
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

_______________________________________________
General mailing list
General@...
http://simile.mit.edu/mailman/listinfo/general
LightInTheBox - Buy quality products at wholesale price