|
Extracting
data (Parsing Rules)
Introduction
After
you have set up the Data Sources you
have to think about extracting data. HH
provides two ways of doing that. Both
use "rules" to help extract data. We
will go into each separate rule later.
First we will tell how both ways differ
from each other.
HH generates pages of text from the
settings in data sources. These pages
are presented in turn to the parsing
rule. So, for example, first the parsing
rules are applied to page1.html, then to
page2.html, etc. Just until there are no
more pages or files to harvest. The
parsing rules extract data from each
page in turn.
Normal Mode
Parsing rules in normal mode is the most
straightforward way to go about
extracting data. Each rule is applied
from the beginning of the document.
So, if we have two rules in Parsing
rules (normal mode) HH will execute
both rules from the start of the page.
This may seem insignificant now, but
bare it in mind as it will get important
later on.
Advanced mode
Harvesting in advanced mode is also done
by applying rules. The main difference
is that as a rule is applied, the
position in the page is remembered.
This means that you can set up far more
advanced harvests than would be possible
in normal mode.
To get acquainted with HH, we suggest you
learn about the Normal mode first before
you try the advanced mode.
Next >
|