Python strip explained

Python strip explained 1

Last Updated on

Python Strip Usage

In Python, we usually use these function like: strip, lstrip, rstrip for striping chars.

s.strip(‘x’) remove prefix and  suffix ‘x’ from string, the usage is: string.strip([chars]

rstrip() is a useful method for removing trailing characters,

lstrip() remove suffix characters only.

Note: strip/rstrip/lstrip remove characters like ‘\n’ ‘\t’ if called without argument. 

The usage of  lstrip is similar.

Example

string = "sscoderscatsss"
print(string.strip('s')) # remove the prefix and suffix 's' 
=> "coderscat" 

string = "sscoderscatsss" print(string.strip('st')) # remove the prefix,suffix 's' and 't' => "codersca" string = "coderscatssss" print(string.rstrip('s')) # remove the trailing 's' => "coderscat" string = "coderscat" print(string.rstrip('s'))  # no trailing 's', original string returned => "coderscat" string = "coderscat  \n\t" print(string.rstrip())  # no argument provided, remove trailing spaces => "coderscat"

Explanation

As I elaborated in previous post : How to learn data structures and algorithms, it’s good opportunity to learn more when we meet a function which we are not familiar with.

Our followed question should be how this is implemented in Python?  What’s the worst complexity for this operation?

So let’s dig into code.

First we need to find the implementation of rstrip,  search the keyword ‘rstrip’ in Python’s Github repo, it should implemented in C, so we add a filter with C programming languages:

Python strip explained 2

CPython stores strings as sequences of unicode characters, so we should check the definition in Objects/clinic/unicodeobject.c.h,  unicode_rstrip in unicodeobject.c.h is a wrapper function, which will call unicode_rstrip_impl to finish strip functionality.

So we continue to search unicode_rstrip_impl in codebase, it’s located at: Objects/unicodeobject.c, follow the function call flow do_argstrip -> do_strip. 

A good coding style should put all the strip functions’s implementation into one logic unit, and it’s really coded like this. Have a look at do_strip

static PyObject *
do_strip(PyObject *self, int striptype)
{
    Py_ssize_t len, i, j;

    if (PyUnicode_READY(self) == -1)
        return NULL;

    len = PyUnicode_GET_LENGTH(self);

    if (PyUnicode_IS_ASCII(self)) {
        // blah blah
    }
    else {
        int kind = PyUnicode_KIND(self);
        void *data = PyUnicode_DATA(self);

        i = 0;
        if (striptype != RIGHTSTRIP) {
            while (i < len) {
                Py_UCS4 ch = PyUnicode_READ(kind, data, i);
                if (!Py_UNICODE_ISSPACE(ch))
                    break;
                i++;
            }
        }

        j = len;
        if (striptype != LEFTSTRIP) {
            j--;
            while (j >= i) {
                Py_UCS4 ch = PyUnicode_READ(kind, data, j);
                if (!Py_UNICODE_ISSPACE(ch))
                    break;
                j--;
            }
            j++;
        }
    }

    return PyUnicode_Substring(self, i, j);
}

lstrip, rstrip, strip all call this do_strip finally.

It’s simple and elegant, the worst complexity is O(N), so you learn it! By the way, there are 1.5w lines of code in unicodeobject.c, and Github seems don’t index it default. 

Further more, if you have time, you can spend more time for studying the other string operations in  Python, there are located at cpython/Objects/stringlib.

Going deeper

Because of all kinds of optimizations, strings in Python actually are very complex. 

For example, whether a new object is allocated depends on many conditions, if a string’s length is longer than 20 it will not be subject to constant folding. This sometime lead to not consistent :(.

Please have a look at <Python Objects Part III: String Interning>.