Tuesday, November 16, 2010

Interpret escape sequences in Python strings

In Python source files it is possible to define "raw" string literals which mostly do not interpret the escape sequences in the string literal. These string literals are introduced with an "r" prefix, like:
In [2]: s = r'\n'
In [3]: len(s)
Out[3]: 2

As it can be seen that is not the new line character, but a sequence of two characters, '\' and 'n'. String obtained with raw_input have somewhat the same property. For example, if we input \x61, it is not interpreted as the character with code 0x61 (which is 'a'), but as the sequence of 4 bytes '\x61'; that probably is the part saying "raw" input. ;)

If we want to evaluate the escape sequences "as the python compiler does when it encounters a regular python literal (e.g., 'hi\n'), then we have to use the decode method of string objects.

In [17]: s = raw_input(
\x61
In [18]: s
Out[18]: '\\x61'

In [19]: s.decode('string-escape')
Out[19]: 'a'

, ,

No comments: