Run Spark locally and access S3

By changing the code:
val sparkConfig = new SparkConf()
   .set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
   .setMaster("local[*]")
By adding JVM arguments to Java:
-Dspark.master=local[*]
-Dspark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
By setting the JVM property from Java (I have not tested if this works for the credentials provider, but it should):
System.setProperty("spark.master", "local[*]")
System.setProperty("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
The AWS credentials will be taken from the default profile or you can specify the profile with the environment variable AWS_PROFILE=<your profile.
This entry was posted in Big data and tagged , , by swk. Bookmark the permalink.

About swk

I am a software developr, data scientist, computational linguist, teacher of computer science and above all a huge fan of LaTeX. I use LaTeX for everything, including things you never wanted to do with LaTeX. My latest love is lilypond, aka LaTeX for music. I'll post at irregular intervals about cool stuff, stupid hacks and annoying settings I want to remember for the future.