Hello, world!

Welcome to my web site. My name is Patrick Kreutzer and I am currently studying Computer Science at the Friedrich-Alexander-University Erlangen-Nuremberg, finishing my master degree presumably in late 2015. On these pages, I write about all the stuff that comes to my mind and that is (hopefully) interesting for others.

Counting lines of code on the command line

Yes, I know, counting lines of code is an evil thing to do to assess a code base -- but hey, almost everybody does it nonetheless. Here is a bash function to count the lines of code in files with specific file extensions:

function loc
{
    if [ "$#" -lt 1 ]
    then
        local path="."
        local search_pattern=".*"
    else
        local path=$1
        shift

        if [ "$#" -lt 1 ]
        then
            local search_pattern=".*"
        else
            local search_pattern=".*/\(.*\.$1\)"

            shift

            for extension in "$@"
            do
                search_pattern="$search_pattern\|\(.*\.$extension\)"
            done
        fi
    fi

    find $path -regex "$search_pattern" -print0 | wc -l --files0-from=- | sort -n
}

If you add this function to the .bashrc file in your home directory, you can type loc in a terminal to count the lines of code. If you do not provide any arguments when calling the function, all files in the current working directory (and recursively in the sub-directories) are counted. You can, however, specify a directory to search in as well as a list of file extensions to filter the files:

# count all files in the current directory (and recursively in the sub-directories)
loc
# count files in the directory called 'src' (and its sub-directories)
loc src/
# as above, but count lines of files ending with '.java' or '.py' only
loc src/ java py

My bash prompt

If you are working a lot on the Linux command line like I do, you probably want to have a nice, fancy looking bash prompt that shows you some more information than the default one. Today, I am going to show you the prompt I am currently using. I got the ideas for it from different sources all over the internet, but I decided to implement it (and comment it!) on my own. If you want to give it a try, just copy the following lines to the .bashrc file in your home directory:

function update_prompt {
    # get information to be displayed in prompt
    hostname=$(hostname | tr -d '\n')
    username=$(whoami | tr -d '\n')
    working_directory=$(pwd | tr -d '\n' | sed "s:^$HOME:~:")
    date_time=$(date "+%H:%M")

    # compute size of prompt an number of fill characters
    local terminal_width=${COLUMNS}
    local promptsize=$(echo -n "--( $working_directory )--( $username @ $hostname )--" | wc -c)
    local fillsize=$(($terminal_width-$promptsize))

    fill=""

    # check if we have to truncate the current working directory
    if [ "$fillsize" -lt "0" ]
    then
        # working directory is too long to be fully displayed
        # -> cut off leading characters
        local cut_position=$((3-$fillsize))
        local length_working_directory=$(echo -n "$working_directory" | wc -c)
        working_directory="...${working_directory:cut_position:length_working_directory}"
    else
        # working directory is short enough to be fully displayed
        # -> create enough fill characters to align working directory to the right
        local fill_characters=""
        while [ "$fillsize" -gt "0" ]
        do
            fill="$fill-"
            fillsize=$(($fillsize-1))
        done
    fi

    local col_none="\[\033[0m\]"
    
    local col_yellow="\[\033[1;33m\]"
    local col_red="\[\033[0;31m\]"
    local col_green="\[\033[0;32m\]"

    local col_light_blue="\[\033[1;34m\]"
    local col_light_gray="\[\033[1;37m\]"
    local col_light_purple="\[\033[1;35m\]"
    local col_light_green="\[\033[1;32m\]"
    local col_light_turquois="\[\033[1;36m\]"

    PS1="$col_yellow--( $col_light_turquois$working_directory$col_yellow )-${fill}-( $col_red$username $col_yellow@ $col_light_purple$hostname$col_yellow )--\n$col_yellow--( $col_green$date_time$col_yellow )--> $col_none"
}

PROMPT_COMMAND=update_prompt

As you might have noticed, this is a two-line prompt where the first line scales to fit the width of the terminal. It tells you the current working directory, the username and hostname, and the current time. The cool thing is that the working directory is truncated if it is too long to be fully displayed. This is how it looks like:

Note how the working directory is truncated in the second line.

Static/Dynamic Typing? Strong/Weak Typing?

In a book I am currently reading (I do not want to name the book here, because I do not want to discredit it; but it doesn't matter anyway) I stumbled upon the following paragraph about typing in programming languages:

The terms strong and weak typing are sometimes used to refer to statically typed and dynamically typed languages respectively.

Albeit being mixed up quite often, this is just plain wrong. These are totally different concepts (well, "totally" is probably a bit exaggerated, but I'm trying to make a point here), so let me clear things up.

Static and Dynamic Typing

Let's consider static and dynamic typing first, because the differences are quite easy to see. In a statically typed language like Java, the type of a variable (as indicated in the variable's declaration) is fixed, so the variable may only hold values of this specific type. For example, consider the following Java code:

int foo = 13;

The variable foo is declared of type int, so foo can only hold values of this type. The following snippet is therefore illegal and produces a compiler error:

int foo = 13;
foo = "hello, world!";   // type 'String' does not match foo's declared type

In a dynamically typed language like Python however, a variable's type may change and depends on the value the variable contains at a specific position in the code. For example, the following snippet is perfectly valid Python code:

foo = 13    # foo is now of type 'int'...
foo = "hello, world!"    # ... and now it is of type 'str'

If you insert type(foo) after each assignment above, you will see that the type of foo changes depending on the value it is currently holding.

Summing up, you can say that in a statically typed language the types are bound to variables, whereas in a dynamically typed language they are bound to the values. Both concepts have their own advantages and disadvantages. I myself definitely favor statically typed languages over dynamically typed ones, because if variables have a static type many errors can be detected at compile time, whereas with dynamic typing they are not detected until run time. However, programs written in dynamically typed languages tend to be more concise and less verbose.

By the way: not having to type a variable's type name does not mean that the language is dynamically typed; some programming languages like Scala use type inference (where possible) to deduce the type of a variable automatically. For example, the compiler is able to find out that foo has to be of type String in this Scala code snippet:

var foo = "hello, world!"

However, Scala is a statically typed language like Java, so foo may only hold String values here.

Strong and Weak typing

Unfortunately, there is not a single perfect definition of strong and weak typing, and it is rather a graduation than a black/white classification. I will try to explain the differences by giving an example that shows how strongly and weakly typed languages differ. In general, a more strongly typed language makes it harder to "bypass" the type system, i.e. to use operations on "wrong" data types (there are other criteria for a language to be considered stronly or weakly typed, but I think this one is the most important). Therefore, strongly typed languages are considered to provide a higher type safety than weakly typed languages.

We will now look at a code snippet written in C, a statically typed language that is considered weakly typed (see my point here?). If you write the following:

int foo = 13;
float bar = 3.;
float bazz = foo + bar;

the value of bazz becomes 16.0, as you would expect. The compiler added an implicit type conversion to convert the int value stored in foo to a corresponding temporary value of type float. However, if you write the following lines of code:

int foo = 13;
float bar = 3.;
float bazz = *((float *)(&foo)) + bar;

the value of bazz becomes just "random" garbage. So, what did I do here? By writing ((float *)(&foo)) I took the address of foo and interpreted it as an address to a value of type float (using the explicit cast (float *)). By dereferencing this address using the * operator the value at this location in memory is treated as a value of type float, although it is actually an int value (the bit sequence at this location represents the int value 13; this is, however, not the representation of the float value 13). Thus, if you add this value to another value, the result is not what you would expect.

This example might look a bit contrived, but it shows an important point: a weakly typed language allows you to circumvent type safety, whether it makes sense or not. A more strongly typed language prevents you from doing so. For example, in Java a type cast is only possible if source and target type are in an inheritance relationship (or convertible in case of primitive types), and a cast results in a run time excpetion if a type safety violation is detected.

tl;dr

Never mix up static/dynamic typing with strong/weak typing; they are different things. A language is considered statically typed if variables have a fixed type and may therefore only hold values of this specific type, whereas in a dynamically typed language a variable's type may change. On the other hand, strongly typed languages provide higher type safety than weakly typed languages by restricting the ways in which you can access the values in memory.

Older articles can be found on the articles page.