|
The new
HappyHarvester version 2
Migration
notes
The new
HappyHarvester version 2 has changed in
many ways compared with previous
versions 1.x. If you are a
HappyHarvester 1.x user, please read
these notes to learn of the most salient
differences between v2.x and v1.x.
Version 2 is a vastly improved version
of HappyHarvester 1. It is a major
upgrade where we have changed the way a
harvest is done. Not only the interfaced
has changed, the extraction engine
itself has been set up in a new way.
This brings new possibilities for data
extraction that were previously
impossible.
Rule based
extraction engine The
previous version of HH 1.x let you work
with a set of pre and post strings. HH
simply harvested all strings on a page,
from top to bottom, occuring between the
pre and post strings. This worked fine
in most cases, but bore limitations in
cases where data was incomplete or less
straightforward to harvest.
The new version 2 uses rules to extract
data. A rule can not only do a harvest
between a pre and post string, but can
do many other things as well like
extracting an image.
Rules in Normal
Mode
There are two different rule
engines, named "Normal mode" and
"Advanced mode". The normal mode comes
the closest to harvesting in the 1.x
version. In HH 1.x you could for example
do:
pre string: <td>,
post string:
</td>
prestring: <div>,
post string:
</div>
and it would harvest all text between
<td> and </td>, then start on the top of
the page, and then get you all text
between <div> and </div>. In Normal mode
in HH 2, you would use:
Get Text between
<td>
and </td>.
Find 1000
instances.
Get Text between
<div>
and </div>.
Find 1000
instances. and it would
do exactly the same.
The
Advanced Mode
Advanced mode offers you much more.
Instead of starting the harvest anew
from the top of the page when a rule has
finished (like in Normal Mode), the
engine will remember the page position.
This offers you a hand hold to harvest
structured data - pieces of data that
belong together and need to stay
together, like a product and a price,
for example. Refer to the
Help to find out
more about
Advanced mode.
You need a new
serial for version 2.x
Version 2 is a major upgrade to which
existing users of HH 1.x need a new
serial. Your old 1.x serial will not
work in version 2. As a registered 1.x
user, you can
upgrade at a reduced price.
|