-
Notifications
You must be signed in to change notification settings - Fork 210
Description
Describe the bug
hive table is ignore caseSensitve, and hive table location just parquet files (schema with upper chars,eg componentId, userName ), after enable blaze, spark sql with upper filter condition won't return any data.
To Reproduce
Steps to reproduce the behavior:
-
spark.sql("set spark.sql.caseSensitive=false") -
val executSql = """ select dnum from report.tb_39e85e2e76e444e195c6db2df728751e_34b7dfe549 where dt between '2024-11-20' and '2024-11-27' and componentId='255' limit 50 """ -
val df = spark.sql(executSql) println(df.schema) df.show(10) -
package scala jar.
-
spark-submit --class com.***.myapp.Test --master yarn --conf spark.sql.hive.convertMetastoreParquet=true --conf spark.blaze.enable=true --conf spark.sql.extensions=org.apache.spark.sql.blaze.BlazeSparkSessionExtension --conf spark.shuffle.manager=org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager --conf spark.sql.caseSensitive=false cosn://dc-sh-prod-03-1323003688/tasklibs/spark3.2.2_myapp.jar
-
executor logs:
测试sql :
select dnum, 3680 as moneys
from report.tb_39e85e2e76e444e195c6db2df728751e_34b7dfe549
where to_date_udf(year,month,day) between date_sub('2024-11-27',7) and '2024-11-27'
and componentId='255' limit 50
userGroupInfo.getUserField : dnum
StructType(StructField(dnum,StringType,true), StructField(moneys,IntegerType,false))
+----+------+
|dnum|moneys|
+----+------+
+----+------+
obviusely, it cannt return any data. just filter conditions cause : componentId
- Expected behavior (when set spark.blaze.enable=false )
自动化分析任务导入的sql :
select dnum, 3680 as moneys
from report.tb_39e85e2e76e444e195c6db2df728751e_34b7dfe549
where to_date_udf(year,month,day) between date_sub('2024-11-27',7) and '2024-11-27'
and componentId='255' limit 50
dataframe schema:
StructType(StructField(dnum,StringType,true), StructField(moneys,IntegerType,false))
+---------+------+
| dnum|moneys|
+---------+------+
|649409512| 3680|
|666687060| 3680|
|667198577| 3680|
|672462560| 3680|
|668511291| 3680|
|661643626| 3680|
|669103964| 3680|
|660927197| 3680|
|671793888| 3680|
|637719401| 3680|
+---------+------+
only showing top 10 rows
append:
A: hive table create scripts :
CREATE EXTERNAL TABLE report.tb_39e85e2e76e444e195c6db2df728751e_34b7dfe549(
android_id string,
systempid string,
appnm string,
appversion string,
appversioncode string,
biversion string,
cardstyleid string,
city string,
clientdatetime string,
componentcontentid string,
componentid string,
componentname string,
componentposition string,
componenttypeid string,
componentversion string,
datasource string,
dateofweek string,
datetime string,
dayofquarter string,
dayofyear string,
deviceid string,
devicetype string,
dnum string,
hour string,
id string,
imei string,
ip string,
launcherversionname string,
launcherdnum string,
launchervercode string,
mac string,
minute string,
nation string,
networktype string,
packagenm string,
phonetype string,
postconfigversion string,
projectid string,
province string,
region string,
remote_addr string,
scenetemplateid string,
scenetemplatename string,
second string,
sendtime string,
signature string,
systype string,
sysversion string,
systemvercode string,
tabposition string,
tclosversion string,
type string,
userid string,
weekofyear string,
wlanmac string,
xforwarded string,
packagename string,
componentstatus string,
musicstatus string,
componenttitle string,
vid string,
receipttime string)
PARTITIONED BY (
year bigint,
month bigint,
day bigint,
cleanhour bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://xxxxxx/data/report/584f9c5bab31fb1d59e138e1/39e85e2e76e444e195c6db2df728751e/34B7DFE549'
