# Audio Visualization: Know Calculus?

Split audio wave(.wav format) into bitmap graph showing decibel level at various sound frequency bands (FFT algorithm). Called spectrogram/spectrograph software. Weight each decibel level according to the ISO 226:2003 graph (see links in deliverables section). Each pixel will be colored according to a decibel color key, which is calculated based on basic statistics, as detailed. Expect this to take at least several days, even if you are good at calculus. Textbooks on signaling also may explain the FFT algorithm, though may not teach ISO 226:2003 weighting. Bidders, please at least 1) explain your knowledge of Calculus and 2) your past experience with an audio project (if any). For an example with source code you can use, see the [url removed, login to view] application in the "LINKS" section at the bottom. If you play a .WAV music in that app, you can see that the detail is horrible. For an example of a fairly well detailed spectrograph, see "spectrograph image" below in the "LINKS" section. I'm interested in a spectrograph that shows all details the human ear can hear. With the [url removed, login to view] example program, you can set "FFT points" to about 4000. I would guess this means the program splits the .WAV into 4,000 frequency bands. That value may need to be higher for excellent detail. I've seen other programs with up to 32,000 points, which is probably excessive detail. [url removed, login to view] also allows you to adjust the number of FFT's per second. This allows more detail over the time axis. I believe a typical .WAV plays at about 11,000hz, so I guess you could calculate FFT 11,000 times per second. But, I think after 4,000 FFT's per second, additional detail will be unlikely. Again, you'll have to experiment with what default to set for excellent detail. This could potentially take a long time to compute, and you might have to use the hard drive for memory space. IMPORTANT: Very detailed instruction & links to help are in the deliverables section.

## Deliverables

Full functionality includes:

A. Save(as .bmp) and open (as .wav). Place buttons to (1)save and (2)open the toolbar. Note that I've labeled the toolbar elements (1) to (13).

B. As file is opened, generate spectrogram from the WAV file. If the file has multiple channels, merge them. Calculate the maximally detailed image, based on the detail settings (see section G).

C. From the detailed image, create two versions of the image that are re-sized to fit the available screen width: a black & white version, and a version color-coded according to the color key (see "COLOR KEY" section below). If the ISO 226 weighting option is checked, then as each screen-width image is being created, adjust each decibel value according the ISO 226:2003 graph (see links section at bottom for help).

Unlike other spectorgram programs, this program scrolls from the top to bottom of the screen. By scrolling from top to the bottom, more frequency bands can be displayed on the screen, so more detail can be seen. This can be done by rotation of the image or by any other means. The lowest frequencies should start on the left side of the screen.

D. The user will be able to scroll or zoom the image by entering into one of these four selection modes:

(8) Hand(hand icon) Hand icon mode allows the user to drag the image in the same way as popular graphic/photo apps.

(9) Select(pointer icon w/ arrow pointing left and right). The select(left/right) arrrow mode allows the user to select a strip of sound just like popular audio applications do.

(10) Select(pointer icon with arrow pointing up and down). The select (up/down) arrow allows the user to select a range of frequencies.

(11) Select (pointer in box icon). The select (box) icon allows the user to select a range of frequencies and a range of time at the same time.

The selection will expand to fit the screen only after the magnify button (magnifying glass icon) is pressed. All available screen area should be used (the area excluding the toolbar, srollbars, etc.).

E. Audio buttons available are: (3)rewind, (4)stop, (5)play, (6)pause, and (7)skip to end. These should be placed on the toolbar. Also placed on the toolbar should be a (13)slider that sets the play speed. The default play speed is one.

F. The (12)loop button (infinity icon) plays the time area currently on the screen over and over again. Wait 1 second before each replay.

G. The user may customize the following settings in the settings menu:

Time Scale (pixels per second). Sets how many pixels represent one second of time on the time axis.

Starting Frequency: 10hz default. Starting point for the frequency axis.

Ending Frequency: 22,000hz default. Ending point for the frequency axis.

Start Point (seconds): Blank/Null Default. Can customize starting point when play button is pressed. Value is in seconds.

End Point (seconds): Blank/Null Default. Can customize ending point for audio play. Value in seconds.

Preview (pixels): Default value 250. Defines number of pixel-lines that will show up ahead of currently playing pixel line. The played portion is in color. The unplayed portion will be in black and white. Because I want to use the over-write technique that the [url removed, login to view] sample program uses, most of the screen will display the played portion.

Show Time Ticmarks (checkbox): Enable the ticmarks on the time axis.

Show Frequency Ticmarks (checkbox): Enable the ticmarks on the frequency axis.

Max intensity starts at (decibel value): Default value is "auto" (see color key comments). However, user may set to any floating pt value.

Min intensity starts at (decibel value): Default value is "auto" (see color key comments). However, user may set to any floating pt value.

FFT Points: I don't know exactly what this is, but it is in most Spectrogram programs. I believe it is the frequency resolution variable which defines how many frequency bands to calculate. Default value 4,000?

FFTs per second: The higher this number, the more times the FFT algorithm is computed per second. This will increase the detail on the time axis up to a certain point. Default value 3,000?

Sample Frequency(hz): Unsure of what this is exactly, but its a common setting in almost any audio program.

Play Speed (float): A number > 0. A value of 0.5 means play audio at half speed. 2 means play audio at 2x speed.

Frequency adjusted for fast/slow playback (checkbox). When play speed is adjusted, this checkbox determines if the frequency is adjusted so that it does not sound distorted. I would imagine that when playing at twice the speed, the frequency needs to be cut in half to sound normal.

Volume Equalizer - ISO 226:2003 (checkbox): If checked, pixel color is adjusted by the ISO 226 chart.

H. COLOR KEY

Color each pixel according to this rainbow scheme:

Color: Black > Royal Blue > Blue > Cyan > Green > Yellow > Red > Ruby Red > White

Code : 0 0 0 > 128 0 255 > 0 0 255 > 0 255 255 > 0 255 0 > 255 255 0 > 255 0 0 > 255 0 128 > 255 255 255

Before coloring pixels, remember to weight the decibel level according to the ISO 226:2003 graph (if user enables that option). If the maximum intensity decibel level is not given in the settings, then use the following method to calculate the maximum intensity:

Calculate decibel value for maximum intensity color:

First, calculate the "small value" decibel by sorting a list of all decibel values over zero in the entire image (or .wav file). The "small value" will be considered 1 decibel under the 4th percentile decibel value. Next, keep track of the highest 96th percentile decibel value for each pixel line. Do not consider values under the "small value" decibel level. So, if there are 10,000 pixel lines on the time axis, there should be 10,000 numbers representing the 96th percentile highest decibel value for each line. Use median value of that list to be the highest intensity decibel value. Any values over that decibel level will be colored using the highest intensity color (pure white in the case of the above key). So, roughly 4% of the *colored* pixels will be the color pure white.

Example app w/ free license, so you may use some of the code:

[url removed, login to view] (See Frequency Analyzer)

The FFT algorithm:

Note various communications signaling textbooks may have sections on the FFT algorithm.

ISO 226:2003 decibel weighting:

The ISO website has detailed information, but it costs 90\$USD. Factor that into your bid if you will need it.

Spectrograph image with good detail:

STANDARD RENTACODER TERMS:

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows:

a) For desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Windows XP

( 23 reviews ) Grafton, United States

Project ID: #3119732

## 9 freelancers are bidding on average \$352 for this job

See private message.

\$425 USD in 24 days
(65 Reviews)
6.3
OvchinnikovAl

See private message.

\$255 USD in 24 days
(21 Reviews)
5.0
hutchdog

See private message.

\$297.5 USD in 24 days
(9 Reviews)
3.0
mcmm

See private message.

\$297.5 USD in 24 days
(1 Review)
0.0
ocpcoder

See private message.

\$212.5 USD in 24 days
(0 Reviews)
0.0
f3arvw

See private message.

\$425 USD in 24 days
(2 Reviews)
0.0
gammaxy

See private message.

\$403.75 USD in 24 days
(0 Reviews)
0.0
daongoctuvw

See private message.

\$425 USD in 24 days
(0 Reviews)
0.0
titusjob

See private message.

\$425 USD in 24 days
(0 Reviews)
0.0