In Progress

MapReduce / c++ programming

Problem 1

You are already given a sample MapReduce application that counts the number of occurrences of each word in a collection of text files. To try it out, run make, then run it with

./test-wordCount [url removed, login to view]

The input file [url removed, login to view] contains a list of filenames (in the input/ subdirectory) that are used for testing. Feel free to add more files or test with different ones, just remember to list them in wordCount-input.txt.

The output from the above is the list of words and the corresponding number of occurrences, here is an excerpt:

...

wreck.: 1

write,: 1

writer,: 10

writer.: 10

writer...: 1

writing: 1

...

Note that the word "writer" occurs multiple times because the WordCount's implementation of MRmap is not aware of punctuation. As your warm-up task, fix the mr::WordCount::MRmap method to remove punctuation when it creates the multimap of intermediate results. Be careful with contractions (e.g., "I'm", "you're") -- you can consider a contraction to be a single word and not remove apostrophes. Your updated WordCount example should result in this type of output (corresponding the the above original output) when you run ./test-wordCount wordCount-input.txt.

...

wreck: 1

write: 1

writer: 21

writing: 1

...

Required files: [url removed, login to view], [url removed, login to view]

Problem 2

In this problem you will create a new class, SentenceStats derived from mr::MapReduce that implements the MRmap and MRreduce methods to compute the following quantities:

• Maximum sentence length (number of words).

• Minimum sentence length (number of words).

• Average sentence length (number of words).

Required files: [url removed, login to view], [url removed, login to view]

Problem 3

Create a test program for your SentenceStats class, use [url removed, login to view] as an example. The following is a sample output you can produce assuming [url removed, login to view] is a list of text files to process (as in Problem 1).

./test-sentenceStats [url removed, login to view]

Maximum sentence length: 12 words

Minimum sentence length: 3 words

Average sentence length: 5 words

(Note, these are sample values, not corresponding to a particular input; you can reuse the input file from Problem 1 or create a new one for testing this problem.)

Required files: [url removed, login to view], Makefile (add a test-sentenceStats target)

Problem 4

In this problem you will create a new class, Phrases derived from mr::MapReduce that implements the MRmap and MRreduce methods, which computes the frequency of two-word phrases in a set of text files.

Required files: [url removed, login to view], [url removed, login to view]

Problem 5

Create a test program for your Phrases class, use [url removed, login to view] as an example. You should take as input a file containing a list of text files to process (as in Problem 1). The output is similar to that of Problem 1, but instead of single words, you should output all 2-word phrases and the number of times they occur in the provided input.

Required files: [url removed, login to view], Makefile (add test-phrases target)

Problem 6.

The same as problems 4 and 5, but create a class AllPhrases derived from mr::MapReduce that computes frequency of phrases of any length, not just two-word phrases (the maximum phrase size is the whole sentence). Sort the output from most to least frequent. Create a test program [url removed, login to view] to test your class.

Required files: [url removed, login to view], [url removed, login to view], [url removed, login to view], Makefile (add test-allPhrases target)

Skills: C++ Programming

See more: mapreduce sample, your writing class, programming words, programming with cpp, programming process, programming problems, programming methods, programming method, original programming, method programming, intermediate programming, implements set, free c programming, c++ programming test, c# programming problems, c programming problems, c++ programming problem, c programming or, c programming input output, c programming free

About the Employer:
( 0 reviews ) eugene, China

Project ID: #5477693

Awarded to:

vw8326147vw

I have been programming for more than 20 years and I have a B.S degree in computer science. I can start on this project immediately and finish it in one day.

$20 USD in 1 day
(0 Reviews)
0.0

4 freelancers are bidding on average $40 for this job

msabouri

Hello there, I can help you with this project. I can finish it within 24 hours. Please contact me if you are interested. Thank you.

$29 USD in 1 day
(30 Reviews)
5.2
achilli3st7

Hello Sir/Mam, I am a 21 year old computer science engineer, currently working in a cyber forensics company as a digital evidence analyst. I have been into programming since past six years and have accumaleted grea More

$77 USD in 2 days
(0 Reviews)
0.0
bilalgujjar203

Hello there, I have experience in C++ more than 2 years now. I have read your program and understand your requirements, I can deliver it very early. Since I am a new freelancer, and perhaps it will be my first project More

$35 USD in 3 days
(0 Reviews)
0.0