Jupyter at Bryn Mawr College |
|||
Public notebooks: /services/public/dblank / CS110 Intro to Computing / 2015-Spring / Notes |
We have seen that we have needed to sort data in a couple of places:
Sorting a set of items is a good place to dive into the idea of an algorithm.
An algorithm is a well-defined, complete, step-by-step description that performs a process in a finite amount of time and space.
Before, we have used a straightforward algorithm that is so stupid, it doesn't have a name. Let's call it squared_sort for a reason that will become obvious shortly.
The Squared sort algorithm, stated very abstractly:
Compare all combinations of values at positions i and j: Swap where item at i is greater than the item at j
The Squared sort algorithm, stated with more details about an implementation:
Starting at the first element and going to second-to-last item, call this i: Starting at i, and going to last item, call this j: if the item at location i is greater than the item at location j: Swap the items at i and j end if end j loop end i loop
An algorithm is not so specific that it requires that it be written in a computer language. However, it is specific enough that it can be written in a programming language.
squared_sort
looks like this in Java:
void squared_sort() {
for (int i = 0; i < array.length - 1; i++) {
for (int j = i; j < array.length; j++) {
comparisons++;
if (array[i] > array[j]) {
swap(i, j);
}
}
}
}
void swap(int i, int j) {
swaps++;
int tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
As we discussed in class, this sorting algorithm will compare all pairwise combinations of items, swapping the elements of an array if the first (array[i]) is bigger than the second (array[j]).
Things to note:
comparisons
, and swaps
About how many comparisons and swaps will be performed on an array of 10 items? On 1000 items?
Stop here and think.
Ok, now let's experiment!
Putting everything together to create an array of 10 random numbers, and sort them using squared_sort.
Shows the time, comparisons, and swaps for 10 random elements.
int [] array;
int comparisons = 0;
int swaps = 0;
void squared_sort() {
for (int i = 0; i < array.length - 1; i++) {
for (int j = i; j < array.length; j++) {
comparisons++;
if (array[i] > array[j]) {
swap(i, j);
}
}
}
}
void swap(int i, int j) {
swaps++;
int tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
void print_array() {
for (int i = 0; i < array.length; i++) {
print("" + array[i] + ", ");
}
println("");
}
println("-----------------------------------------------------------------");
array = new int [10];
for (int i = 0; i < array.length; i++) {
array[i] = int(random(array.length));
}
comparisons = 0;
swaps = 0;
println("Before sorting:");
if (array.length < 100)
print_array()
start = millis();
squared_sort();
stop = millis();
println("Squared sort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
if (array.length < 100)
print_array()
You should see output that is similar to:
Before sorting: 6, 3, 9, 5, 4, 3, 1, 8, 5, 5, Squared took: 0 ms, 54 compares, 22 swaps on random 10 1, 3, 3, 4, 5, 5, 5, 6, 8, 9,
Does that match your expectations?
If we look at the algorithm (or code) we see that we run through all of the items in the array, and for each of those, we basically run through the items in the array again! You'll study these ideas much more in Data Structures, but as a rough approximation, we see that we'd make about $n ^ 2$ comparisons (where n is the number of items in an array). That would be about 100 in this example. (Actually, we see that since j starts at i, it is really $n^2/2$ which is closer to what we actually see, 54).
Do the following experiments:
How does it do on 100 items?
How does it do on 1000 items?
How does it do on 10,000 items?
This $n ^ 2$ is bad.
Write Processing code that draws a line graph where the x axis is n (number of items in array) and the y axis is the number of comparisons.
Bubble sort is the name given to a simple, and somewhat interesting, sorting algorithm.
The Bubblesort algorithm in abstract terms:
While not done: Run through all items in a set, i: if item i is greater than item i + 1: Swap items i and i + 1 If no swaps were made on last sweep, you are done
The Bubblesort algorithm, with more implementation details:
Let stop be the length of the array While stop is not at the beginning: Starting at the first element and going to stop - 1, call this i: Let new_stop be the beginning position if the item at location i is greater than the item at location i + 1: swap the items at i and i + 1 Save i as new_stop end if Let stop be new_stop end while
And written in Java:
void bubble_sort() {
int stop = array.length;
while (stop != 0) {
last_swap = 0;
for (int i = 0; i < stop - 1; i++) {
comparisons++;
if (array[i] > array[i + 1]) {
swap(i, i + 1);
last_swap = i + 1;
}
}
stop = last_swap;
}
}
How will this compare to Squared sort?
Let's try it!
int [] array;
int comparisons = 0;
int swaps = 0;
void bubble_sort() {
int stop = array.length;
while (stop != 0) {
last_swap = 0;
for (int i = 0; i < stop - 1; i++) {
comparisons++;
if (array[i] > array[i + 1]) {
swap(i, i + 1);
last_swap = i + 1;
}
}
stop = last_swap;
}
}
void swap(int i, int j) {
swaps++;
int tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
void print_array() {
for (int i = 0; i < array.length; i++) {
print("" + array[i] + ", ");
}
println("");
}
println("-----------------------------------------------------------------");
array = new int [10];
for (int i = 0; i < array.length; i++) {
array[i] = int(random(array.length));
}
comparisons = 0;
swaps = 0;
println("Before sorting:");
if (array.length < 100)
print_array()
start = millis();
bubble_sort();
stop = millis();
println("Bubble sort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
if (array.length < 100)
print_array()
You probably see results, such as:
Before sorting: 2, 4, 5, 5, 9, 6, 7, 8, 0, 9, Bubble sort took: 0 ms, 37 compares, 11 swaps on random 10 0, 2, 4, 5, 5, 6, 7, 8, 9, 9,
Try Bubble sort on 10, 100, 1000, and 10,000 array sizes*
Describe how bubble sort differs in behavior compared to squared sort.
A big difference between bubble_sort and all other sorts is its behavior on an already sorted array.
Try it!
Do an experiment where you call bubble sort a second time. You can do the same for squared sort, but it will always give the same number of comparisons of a given $n$.
comparisons = 0;
swaps = 0;
start = millis();
bubble_sort();
stop = millis();
println("Bubble sort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
if (array.length < 100)
print_array()
println("Second time:");
comparisons = 0;
swaps = 0;
start = millis();
bubble_sort();
stop = millis();
println("Bubble sort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);
if (array.length < 100)
print_array()
The final algorithm that we will look at today is Quicksort. Quicksort represents a class of algorithms in computer science that capture the motto "divide and conquer".
The Quicksort algorithm, at 30,000 feet (abstract view):
Pick a pivot value Break set of items into two parts: first part is all less than the pivot value second part is all greater than the pivot value Sort the first part Sort the second part
The Quicksort algorithm, with more implementation details:
Quicksort on start and end: if start less than end: Partition from start to end around p: Quicksort on start and p - 1 Quicksort on p and end
The partition picks a point,
Partition from start to end: Pick a pivot position, p, and pivot value Partition array into: those less than pivot value pivot value those less than pivot value return partition
In Java, the main quicksort function looks like:
void quicksort(int lo, int hi) {
if (lo < hi) {
int p = partition(lo, hi);
quicksort(lo, p - 1);
quicksort(p + 1, hi);
}
}
How will this differ from squared sort and bubble sort?
int [] array;
int comparisons = 0;
int swaps = 0;
void quicksort(int lo, int hi) {
if (lo < hi) {
int p = partition(lo, hi);
quicksort(lo, p - 1);
quicksort(p + 1, hi);
}
}
int partition(int lo, int hi) {
pivotIndex = choosePivot(lo, hi);
pivotValue = array[pivotIndex];
// put the chosen pivot at array[hi]
swap(pivotIndex, hi);
storeIndex = lo;
// Compare remaining array elements against pivotValue = array[hi]
for (int i = lo; i <= hi - 1; i++) {
comparisons++;
if (array[i] < pivotValue) {
swap(i, storeIndex);
storeIndex = storeIndex + 1;
}
}
swap(storeIndex, hi); // Move pivot to its final place
return storeIndex;
}
int choosePivot(int lo, int hi) {
return ceil((hi + lo) / 2);
}
void swap(int i, int j) {
swaps++;
int tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
void print_array() {
for (int i = 0; i < array.length; i++) {
print("" + array[i] + ", ");
}
println("");
}
println("-----------------------------------------------------------------");
array = new int [10];
for (int i = 0; i < array.length; i++) {
array[i] = int(random(array.length));
}
comparisons = 0;
swaps = 0;
println("Before sorting:");
if (array.length < 100)
print_array()
start = millis();
quicksort(0, array.length - 1);
stop = millis();
println("Quicksort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
if (array.length < 100)
print_array()
You'll see something similar to:
Before sorting: 1, 9, 6, 8, 5, 9, 7, 7, 5, 0, Quicksort took: 0 ms, 32 compares, 26 swaps on random 10 0, 5, 6, 5, 1, 7, 7, 8, 9, 9,
Try Quicksort on 10, 100, 1000, and 10,000 array sizes*
Describe how quicksort differs in behavior compared to bubble and squared sort.
Create a plot with all of the following lines:
For different sizes of n (10, 100, 1,000, 10,000):
Here is some code that might help:
int [] array;
int comparisons = 0;
int swaps = 0;
void swap(int i, int j) {
swaps++;
int tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
void squared_sort() {
for (int i = 0; i < array.length - 1; i++) {
for (int j = i; j < array.length; j++) {
comparisons++;
if (array[i] > array[j]) {
swap(i, j);
}
}
}
}
void bubble_sort() {
int stop = array.length;
while (stop != 0) {
last_swap = 0;
for (int i = 0; i < stop - 1; i++) {
comparisons++;
if (array[i] > array[i + 1]) {
swap(i, i + 1);
last_swap = i + 1;
}
}
stop = last_swap;
}
}
void quicksort(int lo, int hi) {
if (lo < hi) {
int p = partition(lo, hi);
quicksort(lo, p - 1);
quicksort(p + 1, hi);
}
}
int partition(int lo, int hi) {
pivotIndex = choosePivot(lo, hi);
pivotValue = array[pivotIndex];
// put the chosen pivot at array[hi]
swap(pivotIndex, hi);
storeIndex = lo;
// Compare remaining array elements against pivotValue = array[hi]
for (int i = lo; i <= hi - 1; i++) {
comparisons++;
if (array[i] < pivotValue) {
swap(i, storeIndex);
storeIndex = storeIndex + 1;
}
}
swap(storeIndex, hi); // Move pivot to its final place
return storeIndex;
}
int choosePivot(int lo, int hi) {
return ceil((hi + lo) / 2);
}
void print_array() {
for (int i = 0; i < array.length; i++) {
print("" + array[i] + ", ");
}
println("");
}
println("-----------------------------------------------------------------");
for (int s = 5; s < 11; s++) {
int orig = new int [pow(2, s)];
array = new int [pow(2, s)];
// make a copy, to compare across all
for (int i = 0; i < array.length; i++) {
orig[i] = int(random(array.length));
}
// reset array:
for (int i = 0; i < array.length; i++) {
array[i] = orig[i];
}
comparisons = 0;
swaps = 0;
start = millis();
squared_sort();
stop = millis();
println("Squared took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
// reset:
comparisons = 0;
swaps = 0;
start = millis();
squared_sort();
stop = millis();
println("Squared took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);
// reset array:
for (int i = 0; i < array.length; i++) {
array[i] = orig[i];
}
comparisons = 0;
swaps = 0;
start = millis();
bubble_sort();
stop = millis();
println("Bubble took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
// reset:
comparisons = 0;
swaps = 0;
start = millis();
bubble_sort();
stop = millis();
println("Bubble took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);
// reset array:
for (int i = 0; i < array.length; i++) {
array[i] = orig[i];
}
comparisons = 0;
swaps = 0;
start = millis();
quicksort(0, array.length - 1);
stop = millis();
println("Quicksort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
// reset:
comparisons = 0;
swaps = 0;
start = millis();
quicksort(0, array.length - 1);
stop = millis();
println("Quicksort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);
}