Hoylen's Weblog

Fri, 28 May 2010

Newton's Y2K10 problem

The Apple iPad was released in Australia (as well as many other countries) today, but it is interesting to remember its ancestor--the Apple Newton. The Apple Newton was the original device that created the term Personal Digital Assistant (PDA), and many of the concepts in the iPad were already present in the Newton.

The Newton had its own date problem. The Apple Newton represented integers as signed 30-bit numbers. The processor does use 32-bit words Like nearly all modern microprocessors do, but the Newton uses two bits from it for its own housekeeping. The clock on the Newton represents time as the number of seconds past 1 January 1993 as an integer. Since the maximum integer is 2^29, the maximum time the Newton could represent is 2010-01-05 18:48:31.

It is an interesting coincidence that the iPad was released in the same year that the Newton's clock ends. The iPad was announced on 27 January 2010, just 22 days after the Newton's clock reached its end. However, enterprising Newton programmers have created hacks to extend the Newton's clock.

The Apple Newton was a very innovative platform, with a unique and powerful object-oriented data model. Have a look at the Newton's technical documentation to see how it worked and why they deliberately made integers 30-bits long.

Thu, 13 May 2010

How good is facial recognition technolgy?

I recently renewed my passport and got one of the new ePassports which allows me to use the SmartGate passport control. This system uses facial recognition technology for self-processing at the border control.

How accurate is facial recognition techology? It is far from perfect, but it is very useful when used correctly in the right system.

Accuracy of biometric recognition systems is measured in two ways: False Acceptance Rate (FAR) where the system recognises a face when it should not have (e.g. letting an unknown person in), and False Rejection Rate (FRR) where the system does not recognise a face when it should have (e.g. keeping a legitimate person out).

These two values are related, because you can always improve one at the expense of the other. For example, tune the algorithm to accept more borderline cases and you improve the FRR but make the FAR worse. Letting everyone (including the bad guys) into the country is perfect FRR, but terrible FAR. Keeping everyone out (including legitimate people) is perfect FAR, but terrible FRR.

To see some real numbers, I found the results from the Face Recognition Vendor Test (PDF) of 2006.

Their benchmark performance for 2006 technolgy, for a FAR of 0.001, was a FRR of 0.01. That is, incorrectly accepting 1 in 1000 faces means incorrectly rejecting 1 in 100 faces. That was the benchmark, but different algorithms achieved similar or worse results under different conditions (see the graphs on pages 14 and 16). A FRR of 0.01 sounds poor, but is significantly better than the FRR of 0.79 that was achieved using 1993 technology -- incorrectly rejecting 8 out of 10 faces!

If the facial recognition algorithms are so poor why are they being used at border control? It is because it is not just the algorithm, but the entire system that counts.

In these systems, they probably use algorithms that crank up the FAR, so that the computer is very unlikely to let the wrong person into the country. That means their FRR is poor, so more legitimate people will be refused entry. But those people can then be processed by a border control officer, who can then recognise them. So it is not the algorithm that works, but the entire system involving both computers and people that works.

This system actually uses the respective strengths of people and computers. Page 20 of the report shows the error rates of people and the algorithms. It shows that when the FRR is high, the algorithms generally achieve better FAR than people; but when the FAR is high, people achieve better FRR than the algorithms. That is, an algorithm is better at correctly rejecting an impersonator than a person; but a person is more better at correctly recognising a person (even though they might look different) than a computer.

It is what we expect: computers are not very good at recognising faces. People are better at recognising faces, but computers are better at rejecting faces. Together the system works. Perhaps researchers should really be claiming success at facial rejection technology rather than facial recognition technology!

Thu, 11 Mar 2010

TLS renegotiation security vulnerability

In November 2009, a security vulnerability in TLS was announced. This affects nearly all implementatations of TLS, but the IETF is working quickly at revising the TLS specification to address the problem. A lot of the articles about the problem characterise the problem as a flaw in the TLS protocol, but actually the problem is not with TLS but how it is (incorrectly) used.

I have been reading the original paper Authentication Gap in TLS Renegotiation and the vulnerability results from a number of things. If you know something about the technical details of TLS, I recommend reading the article for yourself.

The main problem comes from a connection consisting of an insecure session being renegotiated into a secure session. During the insecure session, a man-in-the-middle can inject some malicious data into the request sent to the server. This is fine according to the TLS protocol. TLS knows that the data sent over the first session is not to be trusted.

The problem comes about because the application incorrectly treats the all the data as having the same security as the second session, after the renegotiation with the legitimate client. That is, it incorrectly treats the data received over the insecure session, before the renegotiation, as secure when it should not. So the vulnerability comes about because the application protocol was incorrectly using TLS. This is an example of where important information has been abstracted away--a common problem in system design: the presence of different sessions should not have been abstracted into a single connection with one level of security.

Einstein said, "everything should be made as simple as possible, but no simpler." Unfortunately, in this case they did make it simpler!

Before you panic: the vulnerability only allows the man-in-the-middle to inject its data into the beginning of the request. Although they could use that to inject their own requests, they can't see the real request or the response--those are still encrypted for the legitimate client.

So I would not be too hasty in blaming TLS itself for the vulnerability. Except, that SSL/TLS was originally designed to secure HTTP and introducing sessions with different security (a concept which HTTP does not support), so it could be argued that it didn't completely meet the requirements properly. Unfortuntely, this is also a common problem in system design.

It is desirable to design components as separate pieces, but when they come together there can be unintended problems.

Sun, 21 Feb 2010

File Set Diff

I wrote a utility to compare files from two directories.

A friend had a large directory of photos on their computer and some of it was backed up to an external hard disk. We suspected that some photos were not backed up, but which ones? This was made more difficult because they had renamed some of the files.

So I wrote a script to find all the files in a directory and calculate a SHA-1 hash on their contents. The script does the same to a second directory and compares the hashes. It then prints out the files that are in one directory but are missing from the other. It also can detect duplicate files in a directory, since the SHA-1 hash uniquely identifies the contents of a file (even if it has been moved or renamed).

The script can be obtained from the downloads page on this Web site.

Thu, 04 Feb 2010

QR codes

I've been experimenting with QR codes. These are a two dimensional bar codes that can contain a URL, phone number, email address, vCard contact information, location, SMS message, calendar event, or arbitrary text. They are popular in Japan and are being used in the Google Favorite Places business listings and Google Charts API.

ZXing QR code generator

Thu, 28 Jan 2010

Controlling URL line breaking with zero-width spaces

Line breaks for URLs often occur where you don't want them to. The solution is to use a zero-width space to suggest where it could have a line break.

In HTML, a zero width space can be represnted as "​". This character must only be used in the displayed URL and not the URL in the href attributes.

Here are two examples. The first URL is unmodified. The second URL uses zero width spaces after the slashes and uses non-breaking hyphens. Resize the browser window to see how the line breaking behaves.

http://www.example.org/alphaBetaGamma/foo-bar/alphaBetaGamma/foo-bar/alphaBetaGamma/foo-bar/alphaBetaGamma/index.html

http://www.example.org/​alphaBetaGamma/​foo-bar/​alphaBetaGamma/​foo-bar/​alphaBetaGamma/​foo-bar/​alphaBetaGamma/​index.html

Looks good, but there is one big disadvantage: if someone copies-and-pastes the URL it will not work. This is less of a problem if the URL is a hyperlink which they will normally click, but it is something to keep in mind.

The same trick can be used in Word documents. There are many ways to enter a zero-width space in Microsoft Word, but they are all very complicated. Instead, I think the simplest way is to copy it from another document. For example, save this Web page as HTML, open it in Microsoft Word, turn on hidden symbols, and copy the zero-width space character from it. With hidden symbols turned on, the zero-width space appears as a rectangle inside a rectangle. Or create a simple HTML document with ​ in it.

However, do not use non-breaking hyphens to further control the line breaks. If someone copies-and-pastes the URL, it will not work when there are non-breaking hyphens in it. They will be very confused, because the hyphen looks correct even though it is the wrong character.

Fri, 22 Jan 2010

Consumer password worst practices

How strong are your passwords? Despite lots of warnings, people still use weak passwords.

In December 2009, a cracker posted 32 million passwords onto the Internet. A security firm (Imperiva) calculated some statistics on these passwords. In their report they say:

  • About 30% of passwords are 6 characters or shorter.
  • About 60% of passwords only contain alpha-numeric characters.
  • About 50% were easily guessed names or words.

The most common password was "123456", followed by "12345", 123456789", "password", "iloveyou" and "princess". Read the Consumer password worst practices report to see what the top 20 passwords were, and for tips on using strong passwords so you don't become (literally in this case) a statistic.

Thu, 21 Jan 2010

Changing file line endings and encodings in emacs

Text files on Unix systems use a single line feed character (LF, 0x0A) to indicate the end of a line. Text files on MS-DOS and Microsoft Windows uses a carrage return plus line feed pair (CR-LF, 0x0D 0x0A). The classical Macintosh used a single carriage return character (CR, Ox0D). Thankfully, the LF-CR pair has never been used!

One way to change the line ending convention is to use emacs with the set-buffer-file-coding-system function (mapped to C-x RET f). When it prompts you for the coding system, enter either "unix", "dos" or "mac".

This is easier than trying to remember cryptic commands like:

tr -d '\r'
sed 's/$/^M/'

And having to worry about getting them to work because of different variations in sed and shell environments (e.g. when using bash the ^M is typed using Ctrl-v Ctrl-m).

If your system has the unix2dos and dos2unix commands installed (e.g. Cygwin and most Linux distributions do) use them. Otherwise, emacs lives up to its reputation as the kitchen sink tool.

Sun, 10 Jan 2010

Cygwin rxvt: a better terminal

After installing Cygwin (a very powerful Unix like environument for Microsoft Windows) I usually set up my home directory and create a shortcut to rxvt.

I make the Windows "My Documents" directory my Cygwin home directory:

cd /home
mv username username.bak
ln -s "/cygdrive/c/Documents and Settings/username/My Documents" username

Setup xrvt as the shell window, since it is much better than the default Windows Command Prompt:

  1. In Windows Explorer, go to C:/cygwin/bin.
  2. Right click on xrvt.exe and create a shortcut for it.
  3. Rename the shortcut to "Cygwin rxvt".
  4. Right click the shortcut and select "Pin to Start menu".
  5. Right click on the shortcut and select "Properties".
  6. Change the Target property of the shortcut to:
C:\cygwin\bin\rxvt.exe -sl 1500 -fn "Consolas-16" -bg black -fg orange -e bash --login -i

The -sl arguments sets the number of lines in the history buffer. The -fn argument sets the font. If you haven't got the Consolas font, use "Courier New-16" instead. The -bg and -fg sets the colours. The -e bash --login -i runs the bash shell.

The rxvt here is a Cygwin Windows program. It does not require X11 to operate. But it does use the X11 method of copying and pasting (i.e. selecting the text copies it, and the middle mouse button is paste).

Note: Cygwin version 1.7 (or later) now installs a shortcut to rxvt called "rxvt-native", so the above instructions are no longer necessary. However, I still customise its font and colours by modifying the command as described above. There is also now Mintty a terminal emulator written especially for Cygwin.

Storing Cygwin on an ISO image to install in a Parallels VM

Store Cygwin on an ISO image for easy re-installation onto virtual machines.

I'm installing Cygwin onto a Parallels virtual machine. I wanted to download Cygwin and its packages only once, and to install it onto multiple virtual machines. I tried storing it as a directory on the Mac, and attaching it to the Parallels virtual machine as a shared folder. Unfortunately, shared folders appear as a network drive on ".psf" under Parallels, and Cygwin has problems installing from it. Of course, I could have copied the files onto the (virtual) C: drive and installed it from there, but would have needlessly used up space on the VM's drive.

The solution I found was to create an ISO disc image containing the Cygwin files and to mount that onto the virtual machine as a DVD disc. Cygwin installs fine from the virtual DVD-ROM and unnecessary file copying was avoided.

Creating the ISO still required the packages to be downloaded inside a VM running Windows, and then copied out of that VM into an ISO image. But after that, no more copying is required.

Tue, 05 Jan 2010

diff utilities

I've been reading the documentation for the diff command on Unix and have discovered lots of powerful options in it.

The diff command can show the changes side by side. You will need a very wide terminal, but you can still get a good indication of what has changed by setting its output to a narrower terminal width.

diff -y -W 80 file1 file2

Two directories can also be recursively compared:

diff -Naur dir1 dir2

There is also an interactive command called sdiff to merge two files together to create a third file. However, I think it is easier to use emerge mode in emacs.

If you are running Mac OS X, another option is to use the FileMerge application. If you install Xcode, It can be found in the /Developer/Applications/Utilities folder.

Security tips for the rest of us

Computer security is hard. Technical people have a hard time keeping up with all the issues, so what is the average computer going to do?

The Security Now, podcast #229 describes a few simple rules that anyone can follow:

  1. Don't click on links in emails.
  2. Don't accept files or email attachments from people you don't know.
  3. Do keep your computer up to date with Windows Update or Mac Software Update.
  4. Do use good strong passwords.

These are easy enough for anyone to remember and follow. It is much better to follow a few simple rules, instead of having more better rules that don't get followed.

For further details, see the So Long, And No Thanks for the Externalities: The Rational Rejection of Security Advice by Users paper. It describes how some traditional security advice is not worth following, because the benefits/risks are outweighed by the cost of following them.