- Parse a single webpage (HTML source) from any entered URL
(it must be a generic parser for _any_ _single_ web page...)
- Count all occurences of any same single word and two-word combos on the page.
- Exclude words from a short list of reseved words (we'll provide)
- List results on the screen (as shown below)
- If unable to parse out any text, list "unable to parse" on the screen and generate some variable that can be used by another program.
Must be done on the fly, after the URL is typed/pasted in an entry field and Hit "GO".
Fast performance is very important.
**** using an existing free open-source reliable HTML parser in PHP or Python (like Beatifu Soup?) for this job is OK and preferable if leads to a lower cost project **
**
Bidders, Please indicate relevant experience with web-page parsing using regular-expressions.
The project-deliverable page would look like this below.
Entering a URL in the field and clicking "GO" should produce the results shown.
Enter URL
---------
[ type or paste url here ] **GO
**
Results (top 10)
------------------
word1 10
word2 9
word word1 4
word that 1
--------------------
END
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Linux
PHP
HTML