I got this exception when I was running an EMR application
org.apache.spark.SparkException: File ./myapplication.jar exists and does not match contents of //10.28.139.44:33084/jars/myapplication.jar
If you run into this exception, the issue could be that the nodes have run out of disk space on one of the partitions as I found it out in my case.
When I got this problem, I did df -h
on the nodes and found that the root partition is full.
[hadoop@myemr-node ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 9.8G 9.8G 0 100% / devtmpfs 34G 20K 34G 1% /dev tmpfs 34G 0 34G 0% /dev/shm /dev/xvdb 840G 2.1G 838G 1% /mnt /dev/xvdc 840G 1.5G 839G 1% /mnt1
Finding out the usage of each folder in ‘/’ shows /home/hadoop consuming 6.8GB space, however du on /home/hadoop does not show anything interesting.
[hadoop@myemr-node ~]$ du -sh /home/hadoop/* 0 /home/hadoop/bin 0 /home/hadoop/Cascading-2.5-SDK 0 /home/hadoop/conf 12K /home/hadoop/contrib 0 /home/hadoop/etc 4.0K /home/hadoop/hadoop-examples.jar 0 /home/hadoop/hive 572K /home/hadoop/lib 0 /home/hadoop/libexec 0 /home/hadoop/mahout 0 /home/hadoop/pig 0 /home/hadoop/sbin 0 /home/hadoop/scala 0 /home/hadoop/share 0 /home/hadoop/shark 0 /home/hadoop/spark
Then I looked up to find if there are any hidden files/folders
[hadoop@myemr-node ~]$ ls -a . .. .bash_history .bash_profile .bashrc bin Cascading-2.5-SDK conf contrib etc hadoop-examples.jar hive lib libexec mahout pig sbin scala share shark spark .ssh .versions [hadoop@myemr-node ~]$ du -sh .versions 6.8G .versions [hadoop@myemr-node work]$ pwd /home/hadoop/.versions/spark-1.0.0-bin-hadoop2/work [hadoop@myemr-node work]$ du -sh * 276M app-20151015043847-0012 550M app-20151016053316-0013 550M app-20151016054953-0014 276M app-20151016055230-0015 275M app-20151016060102-0016 276M app-20151016060322-0017 276M app-20151016171448-0018 276M app-20151016172752-0019 275M app-20151017180735-0020 276M app-20151017180915-0021 275M app-20151017182043-0022 276M app-20151017182142-0023 276M app-20151017182907-0024 276M app-20151017183526-0025 276M app-20151017184015-0026 276M app-20151017184912-0027 276M app-20151017185454-0028 44M app-20151017190743-0029
Cleaning up these files has solved the problem. We need to do this on each of the clusters.
References: Stackoverflow question