Clean Spark environment in Zeppelin

When you have experimented a lot, there are many things in your notebook and quite often, you are not sure what is there and what not. Zeppelin does not really give you a good tool to do that, so I wrote my own function to clean up the global variables of Python:

def clear(keep=("__builtins__", "clear", 'completion', 'z', '__zeppelin_completion__', '_zsc_', '__zSqlc__', '__zeppelin__', '__zSpark__', 'sc', 'spark', 'sqlc', 'sqlContext')):
    keeps = {}
    for name, value in globals().iteritems():
        if name in keep: 
            keeps[name] = value
    globals().clear()
    for name, value in keeps.iteritems():
        globals()[name] = value

You can then call the function with clear().

Additionally, if you are dealing with libraries, you may want to explicitly remove them with del sys.modules['modulename'].

This entry was posted in Big data and tagged , , , by swk. Bookmark the permalink.

About swk

I am a software developr, data scientist, computational linguist, teacher of computer science and above all a huge fan of LaTeX. I use LaTeX for everything, including things you never wanted to do with LaTeX. My latest love is lilypond, aka LaTeX for music. I'll post at irregular intervals about cool stuff, stupid hacks and annoying settings I want to remember for the future.