Extracting data (Parsing Rules)


Advanced mode: Download Web Link structures



The Download Web Link structure lets you process a detail page. You can follow a link (href) from the present document, and Download Web Link will download the web page text, so you can further process it. An example:

Download Next Web Link Begin
   Get Text between <title> and </title> ...
Download Next Web Link End

From the current page (the current page postion to be exact), Download Next Web Link finds the first occurence of a href, downloads this new web page, and keeps it in memory for further processing. Any rules you apply within a Download Web Link structure apply to the downloaded page, not the current page. It keeps it own page position so you can run rules just like you could on a normal page. In this case, the Get Text rule only harvests the page title from the detail page.

For an elaborate example harvesting a master page and detail pages, have a look at "example17.hhp". This is a rather advanced example that gets customer name from the master page, and diverse ordering info from a number of detail pages that are linked to particular customers.  


Next >