Assignment P4

Perl Assignment P4 (Perl Programming Assignment)

I recently did a psycholinguistics experiment where subjects were given sentences and asked to rate them on a 1-to-5 scale for how plausible they sound. My colleagues used Microsoft Excel to organize the stimuli and the data we collected into spreadsheets, but I wanted to manipulate the data myself using Perl on my Unix machine. So I got them to save the relevant files in "comma-separated values" format (.csv) and send them to me. The files they sent me are:

mapping.sv_umd.list1
Contains verb1, verb2, index1, index2 for the stimuli, showing which items should be paired up for the statistical analysis. (Each pair has a reciprocal verb, e.g. "kissed", paired up with a non-reciprocal verb, e.g. "predicted", because the study was looking at the effects of reciprocality in on-line sentence processing. See a recent paper we wrote if you're interested in that.)
plausibilitySV01.csv
Contains Index,Source,Verb,1,2,3,4,5,SUM,Mean where
- "Index" is the item number from the survey form,
- "Source" can be ignored,
- "Verb" is the verb we're testing
- Column titled 1..5 contains the number of people who gave the referred-to rating as their answer for this item
- "Sum" is the sum of the 1-5 numbers
- "Mean" is the mean (= average) rating for this item.

The main Perl program I needed took each pair of items, as specified in mapping.sv_umd.list1, and found the "mean rating" values for those two items by looking up the items in plausibilitySV01.csv. For example, the first pair specifies that item 83 was paired up with item 76 (for the verbs "battled" and "judged"), so the first line of my output file contained

  4   3.066666667

I was then able to feed this output (two columns of numbers) to a different program I have for computing the relevant statistic (it was a paired t-test) in order to discover whether or not one column has significantly higher ratings than the other column.

Your job is to do the same thing: write the program I've just described, which produces two columns of numerical output.