Skip to content

Permission issue in text table join leads to failure #747

@merrily01

Description

@merrily01

Describe the bug

When using Spark 3.5 with Blaze, performing a join operation on two tables stored in text format,

a permission exception occurs during the shuffle write phase of the executor's task execution due to account issues.

This ultimately leads to the failure of the query execution.

To Reproduce
Steps to reproduce the behavior :

  1. Create test tables with text storage format:
CREATE TABLE IF NOT EXISTS test_table5 (
  id INT,
  key INT,
  value1 STRING
) STORED AS TEXTFILE;

CREATE TABLE IF NOT EXISTS test_table6 (
  id INT,
  key INT,
  value2 STRING
) STORED AS TEXTFILE;
  1. Generate test data :
INSERT INTO test_table5 VALUES
(1, 100, 'value1'),
(2, 200, 'value2'),
(3, 300, 'value3');

INSERT INTO test_table6 VALUES
(1, 400, 'value4'),
(2, 500, 'value5'),
(3, 600, 'value6');
  1. Execute the test case SQL :
SELECT a.*, b.*
FROM test_table5 a
JOIN test_table6 b
ON a.id = b.id;
  1. See error :
  • Driver side :
Driver stacktrace :
25/01/07 20:41:56 INFO DAGScheduler: ShuffleMapStage 1 (main at NativeMethodAccessorImpl.java:0) failed in 20.353 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7) (tjtx16-110-38.58os.org executor 8): java.lang.RuntimeException: poll record batch error: Execution error: native execution panics: Execution error: Execution error: output_with_sender[Shuffle] error: Execution error: output_with_sender[FFIReader] error: Execution error: output_with_sender[FFIReader]: output() returns error: External error: Java exception thrown at native-engine/datafusion-ext-plans/src/ffi_reader_exec.rs:157: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
	at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:25)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at org.apache.spark.util.CompletionIterator.to(CompletionIterator.scala:25)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at org.apache.spark.util.CompletionIterator.toBuffer(CompletionIterator.scala:25)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at org.apache.spark.util.CompletionIterator.toArray(CompletionIterator.scala:25)
	at org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleWriterBase.nativeShuffleWrite(BlazeShuffleWriterBase.scala:81)
	at org.apache.spark.sql.execution.blaze.plan.NativeShuffleExchangeExec$$anon$1.write(NativeShuffleExchangeExec.scala:159)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:106)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:143)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:662)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:682)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
  • Executor side :
Exception in thread "Thread-8" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:150)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:59)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:355)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:265)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$3(ArrowFFIExporter.scala:67)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$3$adapted(ArrowFFIExporter.scala:63)
	at org.apache.spark.sql.blaze.util.Using$.$anonfun$resources$2(Using.scala:313)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.blaze.util.Using$.$anonfun$resources$1(Using.scala:312)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.blaze.util.Using$.resources(Using.scala:311)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$1(ArrowFFIExporter.scala:63)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$1$adapted(ArrowFFIExporter.scala:60)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.exportNextBatch(ArrowFFIExporter.scala:60)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:142)
	... 22 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=READ, inode="/home/hive/warehouse/test.db/test_table6/part-00001-d58e0437-f279-48fa-933a-63ce19829afd-c000.lzo":hdfs:hadoop:-rwx-wx-wx
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1993)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1977)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1919)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:162)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2061)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:833)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:288)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1072)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1073)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3085)

Expected behavior

  • With Spark 3.5 + Blaze :

    • The query execution fails with an error message and logs.
  • With Vanilla Spark 3.5 or set spark.blaze.enable=false:

    • The query executes successfully, returning the expected result data.

Screenshots

  • With Spark 3.5 + Blaze:
    image

  • With Vanilla Spark 3.5 or set spark.blaze.enable=false :
    image

Additional Context :

  • The issue occurs when master is set to YARN, but not when running in local mode.
  • Tables of parquet type do not have this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions