Hi, I'm interested in a crawler / extractor with a very simple task. I just want to be able to give it a list of URLs to different articles (e.g. [login to view URL]) OR a URL to a site that hosts many articles (e.g. [login to view URL]) that all have comment sections at the bottom of each article page. I want the crawler/extractor to collect all of the comments and export them into a spreadsheet with the following columns: URL, article title, commenter, comment, keywords (based on simple word frequencies, minus, of course common stop words; should calculate by stem/type, not each token of a word).
It's important that there be no limit on how many URLs I can input or how many comments can be scraped - it should be a COMPLETE collection of all the user comments for each URL; it should also be able to detect if there are multiple pages and grab the comments from all of the pages.
hello sir ,
c/c++/python/autohotkey expert worked for samsung & huawei
a sample can be provided before hired
hope to get message from u
thank you very much