{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Sorting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have seen that we have needed to sort data in a couple of places:\n", "\n", "* we needed to be able to sort genomes when we [evolved Taylor Swift](../Lectures/Genetic Algorithm.ipynb)\n", "* we needed to sort objects based on their depth (z value) to draw background items first\n", "* we need it in many other places in computing!\n", "\n", "Sorting a set of items is a good place to dive into the idea of an _algorithm_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Table of Contents\n", "* [1. Sorting](#1.-Sorting)\n", "\t* [1.1 Algorithms](#1.1-Algorithms)\n", "\t\t* [1.1.1 Squared sort](#1.1.1-Squared-sort)\n", "\t\t* [1.1.2 Bubble sort](#1.1.2-Bubble-sort)\n", "\t\t\t* [1.1.2.1 Bubble sort is best, at something!](#1.1.2.1-Bubble-sort-is-best,-at-something!)\n", "\t\t* [1.1.3 Quicksort](#1.1.3-Quicksort)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.1 Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An algorithm is a well-defined, complete, step-by-step description that performs a process in a finite amount of time and space.\n", "\n", "Before, we have used a straightforward algorithm that is so stupid, it doesn't have a name. Let's call it **squared_sort** for a reason that will become obvious shortly.\n", "\n", "The Squared sort algorithm, stated very abstractly:\n", "\n", "
```\n",
"Compare all combinations of values at positions i and j:\n",
"    Swap where item at i is greater than the item at j\n",
"```
\n", "\n", "The Squared sort algorithm, stated with more details about an implementation:\n", "\n", "
```\n",
"Starting at the first element and going to second-to-last item, call this i:\n",
"   Starting at i, and going to last item, call this j:\n",
"       if the item at location i is greater than the item at location j:\n",
"           Swap the items at i and j\n",
"       end if\n",
"   end j loop\n",
"end i loop\n",
"```
\n", "\n", "An algorithm is not so specific that it requires that it be written in a computer language. However, it is specific enough that it can be written in a programming language. \n", "\n", "`squared_sort` looks like this in Java:\n", "\n", "```java\n", "void squared_sort() {\n", " for (int i = 0; i < array.length - 1; i++) {\n", " for (int j = i; j < array.length; j++) {\n", " comparisons++;\n", " if (array[i] > array[j]) {\n", " swap(i, j);\n", " }\n", " }\n", " }\n", "}\n", "\n", "void swap(int i, int j) {\n", " swaps++;\n", " int tmp = array[i];\n", " array[i] = array[j];\n", " array[j] = tmp;\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we discussed in class, this sorting algorithm will compare all pairwise combinations of items, swapping the elements of an array if the first (array[i]) is bigger than the second (array[j]).\n", "\n", "Things to note:\n", "\n", "1. the array operated on must be a global\n", "1. we keep track of the comparisons made, and the number of swapped items through global variables `comparisons`, and `swaps`\n", "1. i runs from 0 to length - 1\n", "1. j runs from i to length\n", "1. reported times are in milliseconds, ms. There are 1,000 ms in a second\n", "\n", "About how many comparisons and swaps will be performed on an array of 10 items? On 1000 items?\n", "\n", "**Stop here and think.**\n", "\n", "Ok, now let's experiment!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1.1 Squared sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Putting everything together to create an array of 10 random numbers, and sort them using squared_sort.\n", "\n", "Shows the time, comparisons, and swaps for 10 random elements." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "\n", " var component = document.getElementById(\"sketch_1\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"state_1\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"controls_div_1\");\n", " if (component != undefined)\n", " component.remove();\n", " require([window.location.protocol + \"//calysto.github.io/javascripts/processing/processing.js\"], function() {\n", " // FIXME: Stop all previously running versions (?)\n", " var processingInstance = Processing.getInstanceById(\"canvas_1\");\n", " if (processingInstance != undefined && processingInstance.isRunning())\n", " processingInstance.noLoop();\n", " });\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " Sketch #1:
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "int [] array;\n", "int comparisons = 0;\n", "int swaps = 0;\n", "\n", "void squared_sort() {\n", " for (int i = 0; i < array.length - 1; i++) {\n", " for (int j = i; j < array.length; j++) {\n", " comparisons++;\n", " if (array[i] > array[j]) {\n", " swap(i, j);\n", " }\n", " }\n", " }\n", "}\n", "\n", "void swap(int i, int j) {\n", " swaps++;\n", " int tmp = array[i];\n", " array[i] = array[j];\n", " array[j] = tmp;\n", "}\n", "\n", "void print_array() {\n", " for (int i = 0; i < array.length; i++) {\n", " print(\"\" + array[i] + \", \");\n", " }\n", " println(\"\");\n", "}\n", "\n", "void setup() {\n", " println(\"-----------------------------------------------------------------\");\n", "\n", " array = new int ;\n", " for (int i = 0; i < array.length; i++) {\n", " array[i] = int(random(array.length));\n", " }\n", "\n", " comparisons = 0;\n", " swaps = 0;\n", " println(\"Before sorting:\");\n", " if (array.length < 100)\n", " print_array();\n", "\n", " int start = millis();\n", " squared_sort();\n", " int stop = millis();\n", " println(\"Squared sort took: \" + (stop - start) + \" ms, \" + comparisons + \" compares, \" + swaps + \" swaps on random \" + array.length);\n", " if (array.length < 100)\n", " print_array();\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should see output that is similar to:\n", "\n", "
```\n",
"Before sorting:\n",
"6, 3, 9, 5, 4, 3, 1, 8, 5, 5, \n",
"Squared took: 0 ms, 54 compares, 22 swaps on random 10\n",
"1, 3, 3, 4, 5, 5, 5, 6, 8, 9, \n",
"```
\n", "\n", "Does that match your expectations?\n", "\n", "If we look at the algorithm (or code) we see that we run through all of the items in the array, and for each of those, we basically run through the items in the array again! You'll study these ideas much more in Data Structures, but as a rough approximation, we see that we'd make about \$n ^ 2\$ comparisons (where n is the number of items in an array). That would be about 100 in this example. (Actually, we see that since j starts at i, it is really \$n^2/2\$ which is closer to what we actually see, 54).\n", "\n", "Do the following experiments:\n", "\n", "**How does it do on 100 items?**\n", "\n", "**How does it do on 1000 items?**\n", "\n", "**How does it do on 10,000 items?**\n", "\n", "This \$n ^ 2\$ is bad. \n", "\n", "**Sketch on paper what the graph looks like for squared-sort**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1.2 Bubble sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bubble sort is the name given to a simple, and somewhat interesting, sorting algorithm.\n", "\n", "The Bubblesort algorithm in abstract terms:\n", "\n", "
```\n",
"While not done:\n",
"    Run through all items in a set, i:\n",
"        if item i is greater than item i + 1:\n",
"            Swap items i and i + 1\n",
"    If no swaps were made on last sweep, you are done\n",
"```
\n", "\n", "The Bubblesort algorithm, with more implementation details:\n", "\n", "
```\n",
"Let stop be the length of the array\n",
"While stop is not at the beginning:\n",
"    Starting at the first element and going to stop - 1, call this i:\n",
"       Let new_stop be the beginning position\n",
"       if the item at location i is greater than the item at location i + 1:\n",
"           swap the items at i and i + 1\n",
"           Save i as new_stop\n",
"       end if\n",
"    Let stop be new_stop\n",
"end while\n",
"```
\n", "\n", "And written in Java:\n", "\n", "```java\n", "void bubble_sort() {\n", " int stop = array.length;\n", " while (stop != 0) {\n", " last_swap = 0;\n", " for (int i = 0; i < stop - 1; i++) {\n", " comparisons++;\n", " if (array[i] > array[i + 1]) {\n", " swap(i, i + 1);\n", " last_swap = i + 1;\n", " }\n", " }\n", " stop = last_swap;\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**How will this compare to Squared sort?**\n", "\n", "Let's try it!" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "application/javascript": [ "\n", " var component = document.getElementById(\"sketch_2\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"state_2\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"controls_div_2\");\n", " if (component != undefined)\n", " component.remove();\n", " require([window.location.protocol + \"//calysto.github.io/javascripts/processing/processing.js\"], function() {\n", " // FIXME: Stop all previously running versions (?)\n", " var processingInstance = Processing.getInstanceById(\"canvas_2\");\n", " if (processingInstance != undefined && processingInstance.isRunning())\n", " processingInstance.noLoop();\n", " });\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " Sketch #2:
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "int [] array;\n", "int comparisons = 0;\n", "int swaps = 0;\n", "\n", "void bubble_sort() {\n", " int stop = array.length;\n", " while (stop != 0) {\n", " int last_swap = 0;\n", " for (int i = 0; i < stop - 1; i++) {\n", " comparisons++;\n", " if (array[i] > array[i + 1]) {\n", " swap(i, i + 1);\n", " last_swap = i + 1;\n", " }\n", " }\n", " stop = last_swap;\n", " }\n", "}\n", "\n", "void swap(int i, int j) {\n", " swaps++;\n", " int tmp = array[i];\n", " array[i] = array[j];\n", " array[j] = tmp;\n", "}\n", "\n", "void print_array() {\n", " for (int i = 0; i < array.length; i++) {\n", " print(\"\" + array[i] + \", \");\n", " }\n", " println(\"\");\n", "}\n", "\n", "void setup() {\n", " println(\"-----------------------------------------------------------------\");\n", "\n", " array = new int ;\n", " for (int i = 0; i < array.length; i++) {\n", " array[i] = int(random(array.length));\n", " }\n", "\n", " comparisons = 0;\n", " swaps = 0;\n", " println(\"Before sorting:\");\n", " if (array.length < 100)\n", " print_array();\n", "\n", " int start = millis();\n", " bubble_sort();\n", " int stop = millis();\n", " println(\"Bubble sort took: \" + (stop - start) + \" ms, \" + comparisons + \" compares, \" + swaps + \" swaps on random \" + array.length);\n", " if (array.length < 100)\n", " print_array();\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You probably see results, such as:\n", "\n", "
```\n",
"Before sorting:\n",
"2, 4, 5, 5, 9, 6, 7, 8, 0, 9, \n",
"Bubble sort took: 0 ms, 37 compares, 11 swaps on random 10\n",
"0, 2, 4, 5, 5, 6, 7, 8, 9, 9, \n",
"```
\n", "\n", "**Try Bubble sort on 10, 100, 1000, and 10,000 array sizes***\n", "\n", "**Describe how bubble sort differs in behavior compared to squared sort. Draw on the graph Bubble sort results.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1.2.1 Bubble sort is best, at something!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A big difference between bubble_sort and all other sorts is its behavior on an **already sorted array**.\n", "\n", "**Try it!**\n", "\n", "Do an experiment where you call bubble sort a second time. You can do the same for squared sort, but it will always give the same number of comparisons for a given \$n\$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1.3 Quicksort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final algorithm that we will look at today is Quicksort. Quicksort represents a class of algorithms in computer science that capture the motto \"divide and conquer\".\n", "\n", "The Quicksort algorithm, at 30,000 feet (abstract view):\n", "\n", "
```\n",
"Pick a pivot value\n",
"Break set of items into two parts:\n",
"    first part is all less than the pivot value\n",
"    second part is all greater than the pivot value\n",
"Sort the first part\n",
"Sort the second part\n",
"```
\n", "\n", "The Quicksort algorithm, with more implementation details:\n", "\n", "
```\n",
"Quicksort on start and end:\n",
"    if start less than end:\n",
"        Partition from start to end around p:\n",
"        Quicksort on start and p - 1\n",
"        Quicksort on p and end\n",
"```
\n", "\n", "The partition picks a point, \n", "\n", "
```\n",
"Partition from start to end:\n",
"   Pick a pivot position, p, and pivot value\n",
"   Partition array into:\n",
"       those less than pivot value\n",
"       pivot value\n",
"       those less than pivot value\n",
"   return partition\n",
"```
\n", "\n", "In Java, the main quicksort function looks like:\n", "\n", "```java\n", "void quicksort(int lo, int hi) {\n", " if (lo < hi) {\n", " int p = partition(lo, hi);\n", " quicksort(lo, p - 1);\n", " quicksort(p + 1, hi);\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**How will this differ from squared sort and bubble sort?**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "\n", " var component = document.getElementById(\"sketch_3\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"state_3\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"controls_div_3\");\n", " if (component != undefined)\n", " component.remove();\n", " require([window.location.protocol + \"//calysto.github.io/javascripts/processing/processing.js\"], function() {\n", " // FIXME: Stop all previously running versions (?)\n", " var processingInstance = Processing.getInstanceById(\"canvas_3\");\n", " if (processingInstance != undefined && processingInstance.isRunning())\n", " processingInstance.noLoop();\n", " });\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " Sketch #3:
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "int [] array;\n", "int comparisons = 0;\n", "int swaps = 0;\n", "\n", "void quicksort(int lo, int hi) {\n", " if (lo < hi) {\n", " int p = partition(lo, hi);\n", " quicksort(lo, p - 1);\n", " quicksort(p + 1, hi);\n", " }\n", "}\n", "\n", "int partition(int lo, int hi) {\n", " int pivotIndex = choosePivot(lo, hi);\n", " int pivotValue = array[pivotIndex];\n", " // put the chosen pivot at array[hi]\n", " swap(pivotIndex, hi);\n", " int storeIndex = lo;\n", " // Compare remaining array elements against pivotValue = array[hi]\n", " for (int i = lo; i <= hi - 1; i++) {\n", " comparisons++;\n", " if (array[i] < pivotValue) {\n", " swap(i, storeIndex);\n", " storeIndex = storeIndex + 1;\n", " }\n", " }\n", " swap(storeIndex, hi); // Move pivot to its final place\n", " return storeIndex;\n", "}\n", "\n", "int choosePivot(int lo, int hi) {\n", " return ceil((hi + lo) / 2);\n", "}\n", "\n", "void swap(int i, int j) {\n", " swaps++;\n", " int tmp = array[i];\n", " array[i] = array[j];\n", " array[j] = tmp;\n", "}\n", "\n", "void print_array() {\n", " for (int i = 0; i < array.length; i++) {\n", " print(\"\" + array[i] + \", \");\n", " }\n", " println(\"\");\n", "}\n", "\n", "void setup() {\n", " println(\"-----------------------------------------------------------------\");\n", "\n", " array = new int ;\n", " for (int i = 0; i < array.length; i++) {\n", " array[i] = int(random(array.length));\n", " }\n", "\n", " comparisons = 0;\n", " swaps = 0;\n", " println(\"Before sorting:\");\n", " if (array.length < 100)\n", " print_array();\n", "\n", " int start = millis();\n", " quicksort(0, array.length - 1);\n", " int stop = millis();\n", " println(\"Quicksort took: \" + (stop - start) + \" ms, \" + comparisons + \" compares, \" + swaps + \" swaps on random \" + array.length);\n", " if (array.length < 100)\n", " print_array();\n", "}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll see something similar to:\n", "\n", "
```\n",
"Before sorting:\n",
"1, 9, 6, 8, 5, 9, 7, 7, 5, 0, \n",
"Quicksort took: 0 ms, 32 compares, 26 swaps on random 10\n",
"0, 5, 6, 5, 1, 7, 7, 8, 9, 9, \n",
"```
\n", "\n", "**Try Quicksort on 10, 100, 1000, and 10,000 array sizes***\n", "\n", "**Describe how quicksort differs in behavior compared to bubble and squared sort.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Create a plot with all of the following lines:**\n", "\n", "For different sizes of n (10, 100, 1,000, 10,000):\n", "\n", "1. Squared sort, on random data and on sorted data\n", "1. Bubble sort, on random data and on sorted data\n", "1. Quicksort, on random data and on sorted data\n", "\n", "Here is some code that might help:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "\n", " var component = document.getElementById(\"sketch_5\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"state_5\");\n", " if (component != undefined)\n", " component.remove();\n", " component = document.getElementById(\"controls_div_5\");\n", " if (component != undefined)\n", " component.remove();\n", " require([window.location.protocol + \"//calysto.github.io/javascripts/processing/processing.js\"], function() {\n", " // FIXME: Stop all previously running versions (?)\n", " var processingInstance = Processing.getInstanceById(\"canvas_5\");\n", " if (processingInstance != undefined && processingInstance.isRunning())\n", " processingInstance.noLoop();\n", " });\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " Sketch #5:
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", "