Version conflict in Google protobuf on EMR cluster

Posted on May 15, 2020 by swk

I have written a Spark application that uses a library that uses the Google protobuf library in version 3.3.1. On my computer I can run it with my local Spark and everything is fine. But now, I want to run it on an EMR cluster in on AWS. And I get this:

java.lang.NoSuchMethodError: com.google.protobuf.CodedInputStream.readStringRequireUtf8()Ljava/lang/String;

It seems that this is a common incopatibility issue (says stackoverflow). Java libraries are loaded from the classpath. The classpath is searched from front to back. And in the EMR cluster, Spark and Hadoop libraries are inserted before the user libraries. So the old protobuf version is found.

But there is the possibility to change this behaviour described in the Spark 2.4.4 documentation. So I added two corresponding options to spark-submit in the EMR cluster step:

--conf spark.driver.userClassPathFirst=true 
--conf spark.executor.userClassPathFirst=true

And it worked!

NB: The proposed solution at stackoverflow is to create a “shaded” fat jar. That probably works as well and should be simple for Maven users. Maybe my next post is about that 😉

Invisible typing in scala console

Posted on May 13, 2020 by swk

When I start a scala console and type after the prompt, nothing is visible. This fixes the issue temporarily:

import sys.process._ 
"reset" !

This is a known issue for Ubuntu 18.04 and Scala 2.11 (see stackoverflow).

Login to AWS with the Java SDK and role assumption

Posted on May 12, 2020 by swk

The following seems to work for me as contents of .aws/credentials:

[default]
region = eu-west-1
role_arn = arn:aws:iam::12345:role/blubb
source_profile = blabla
[blabla]
aws_access_key_id = XXX
aws_secret_access_key = YYY
aws_session_token = ZZZ
valid_until = DDD

The content of .aws/config does not seem to matter. So it can be whatever it needs to be for the CLI.

Yesterday's Coffee

Too good to throw away – too hard to remember

Monthly Archives: May 2020

Version conflict in Google protobuf on EMR cluster

Invisible typing in scala console

Login to AWS with the Java SDK and role assumption