Cron scraper to get baseball stats from web and place in CSV file(repost)

Cancelled Posted Sep 28, 2010 Paid on delivery
Cancelled Paid on delivery

Previously I had a script made for me that worked quite well (see requirements below).

I have not been able to get the script to work in the last three months. I will provide the workable script. Please update the script so it will work for me.

1. Acquire pitching and hitting stats for each 32 teams from:

[url removed, login to view][TEAM]

[url removed, login to view][TEAM]

[url removed, login to view][TEAM]

2. Create a .csv file with the columns intact (i.e. header row for pitching would show: Name,GP,GS,W,L, etc.; Batting (Name, GP, BA, etc.)

3. Create a folder with the time and date. Inside the folder, create a subfolder for each team (i.e. KAN, BAL, ATL etc.). Inside each folder place the pitching, batting and split stats in .csv format.

This is simply something that Excel can do manually for me. However web queries are disabled for mac users.

Other details:

1. this will be a cron job run weekly.

2. Cross platform is ++. I will be using a mac environment most of the time.

3. Clean code, intelligent variable names (ex. using $url instead of teds-favorite-variable) and use of comments are greatly appreciated.

## Deliverables

The inspiration comes from the following book: "Baseball Hacks:Tips & Tools for Analyzing and Winning with Statistics", Alder, Joseph. O'Reilly Media, 2006. On page 198: [url removed, login to view] [url removed, login to view]: #!/usr/bin/perl # PERL MODULES TO USE use LWP::Simple; use HTML::TableExtract; # WHAT TEAM TO PULL? $TeamID = $ARGV[0]; # CREATE FILE TO PLACE OUTPUT INTO $outfile = '[url removed, login to view]'; open OUT,">$outfile" or die "can't open file $outfile for output!\n"; # GRAB HTML OF ESPN WEBPAGE FOR GIVEN TEAM $URL = "[url removed, login to view]" . $TeamID; $html = get($URL); # PARSE HTML INTO NEW TABLEEXTRACT OBJECT $te = new HTML::TableExtract(); $te->parse($html); # WE'RE INTERESTED IN THE 5TH HTML TABLE IN THE PAGE $ts = $te->table_state(0,5); @rows = $ts->rows; # HOW MANY HTML TABLE ROWS? $N = scalar(@rows); # NOTE: WE'RE ONLY INTERESTED IN ROWS 3 TO N-4. HTML TABLE ROWS 1-2 CONTAIN # MENU ITEMS AMD ROWS N-4 TO N CONTAIN TOTALS AND OTHER FORMATTING. # ROW 3 IS HEADER ROW print OUT "TEAM|" . join("|", @{$rows[3]}) . "\n"; # FOR REST OF ROWS, PIPE-DELIMIT DATA PLUS A LINEFEED for $i (4 .. $N-4) { print OUT "$TeamID|"; print OUT join("|", @{$rows[$i]}); print OUT "\n"; } # CLOSE OUTPUT FILE close OUT; On the errata site, they updated the code for lines 23-39 to now read: ([url removed, login to view]) The HTML table needed and which rows to grab in the Perl script on page 198 of your book should be as follows: # WE'RE INTERESTED IN THE 2ND HTML TABLE IN THE PAGE $ts = $te->table_state(0,1); @rows = $ts->rows; # HOW MANY HTML TABLE ROWS? $N = scalar(@rows); # PRINT OUT THE COLUMN HEADERS print OUT "TEAM|" . join("|", @{$rows[1]}) . " "; # FOR REST OF ROWS, PIPE-DELIMIT DATA PLUS A LINEFEED for $i (2 .. $N-4) { print OUT "$TeamID|"; print OUT join("|", @{$rows[$i]}); print OUT " "; } 2. The team codes are three letters: ARI,ATL,BAL,BOS,CHC,CHW,CIN,CLE,COL,DET,FLA,HOU,KAN,LAA,LAD,MIL,MIN,NYM,NYY,OAK,PHI,PIT,SAN,SDG,SEA,STL,TAM,TEX,TOR,WAS 3. When I run the perl script on a Windows machine I get the following error: "C:\baseball>[url removed, login to view] KAN Can't call method "rows" on an undefined value at C:\baseball\[url removed, login to view] line 24."

* * *This broadcast message was sent to all bidders on Tuesday Sep 28, 2010 10:47:44 AM:

Hello. I had a Ruby program that worked GREAT. About a month ago the program stopped working and came back with an error. I need the program written in Ruby, using Nokogiri and workable on a mac.

Engineering Project Management Software Architecture Software Testing Web Hosting Website Management Website Testing

Project ID: #3756258

About the project

3 proposals Remote project Active Oct 14, 2010

3 freelancers are bidding on average $11 for this job

taro

See private message.

$12.75 USD in 14 days
(17 Reviews)
4.6
notjustacoder

See private message.

$8.5 USD in 14 days
(29 Reviews)
3.3
chasecmiller

See private message.

$12.75 USD in 14 days
(9 Reviews)
2.8