|
Extracting
data (Parsing Rules)
Advanced mode: Download
Web Link structures
The Download Web Link structure lets you process a detail page. You
can follow a link (href) from the
present document, and Download Web Link
will download the web page text, so you
can further process it. An example:
Download Next Web
Link Begin
Get
Text between <title> and
</title> ...
Download Next Web
Link End
From the current page (the current page
postion to be exact),
Download Next Web
Link finds the first occurence of
a href, downloads this new web page, and
keeps it in memory for further
processing. Any rules you apply within a
Download Web Link structure apply to the
downloaded page, not the current page.
It keeps it own page position so you can
run rules just like you could on a
normal page. In this case, the Get Text
rule only harvests the page title from
the detail page.
For an elaborate example harvesting a
master page and detail pages, have a
look at "example17.hhp". This is
a rather advanced example that gets
customer name from the master page, and
diverse ordering info from a number of
detail pages that are linked to
particular customers.
Next >
|