1- CUDA: matrix addition
Implement matrix addition in CUDA C = A+B where the matrices are NxN and N is large. This is an extension of the program in the "CUDA by Example" book, which adds two long vectors of length N. Also refer to the [login to view URL] program, which uses 2-dimensional arrays.
In your main program assign (float) values to the elements of A and B: a[i][j] = 2*i + j + 1 and b[i][j] = i + 4*j + 2.
Call your kernel. Then check if all elements of C are correct; if they are correct, print "We did it!".
Also execute the matrix addition sequentially, and time this (nested loop) with gettimeofday(). Compare the time to the execution time of the kernel plus the cudaMemcpy calls (do not include the malloc or the cudaMalloc times), and calculate the speedup. Do this for 10 (large to very large) values of N.
Submit a typescript showing: a listing (with "cat") of your source code, your compilation, and executions with output. Discuss your findings in your report.
2- Implement matrix multiplication.
Hi
I have PhD in computational physics. My area of expertise is HPC programming and more precisely CUDA programming. I believe that I can write a clear and well document code you. Chat me to discuss in more details this project. Hope to hear you soon.
thanks
Kyriakos Hadjiyiannakou