we need a crawler to crawl the folowing website
[login to view URL]
We need to get category structure + product information and pictures - all the pictures of products
the data must be put into this mysql table:
products
CREATE TABLE `products` (
`product_id` int(10) unsigned NOT NULL auto_increment,
`category` varchar(50) default NULL,
`sub_category` varchar(50) default NULL,
`part_number` varchar(30) NOT NULL,
`product_name` varchar(200) NOT NULL,
`product_description` text,
`price_rrp` decimal(7,2) NOT NULL default '0.00',
`qty_1_units` varchar(20) default NULL,
`qty_1_price` decimal(7,2) NOT NULL default '0.00',
`qty_2_units` varchar(20) default NULL,
`qty_2_price` decimal(7,2) NOT NULL default '0.00',
`qty_3_units` varchar(20) default NULL,
`qty_3_price` decimal(7,2) NOT NULL default '0.00',
`stock_level` varchar(50) NOT NULL,
`image_small` varchar(200) default NULL,
`image_large` varchar(200) default NULL,
`special` tinyint(3) unsigned NOT NULL default '0',
`limited_stock` tinyint(3) unsigned NOT NULL default '0',
`new_product` tinyint(3) unsigned NOT NULL default '0',
`brochure_1` varchar(200) default NULL,
`brochure_2` varchar(200) default NULL,
`brochure_3` varchar(200) default NULL,
`brochure_4` varchar(200) default NULL,
`created` datetime NOT NULL,
`last_updated` datetime NOT NULL,
PRIMARY KEY (`product_id`)
) ENGINE=MyISAM AUTO_INCREMENT=8268 DEFAULT CHARSET=latin1;
Good crawlers can not be written in PHP, cause PHP does not have possibility to crawl more then one page at time ( multi threading ). You could use Python, Java or Perl for this job.
If you decide effective crawler to be written in Python, hire me.