Skip to content

db_compute_boxplot in Hive is using percentile instead of percentile approx #30

@skamanrev

Description

@skamanrev

I'm using dbplot v0.3.3 with hive (version Hive 1.2.1000.2.6.5.0-292).

db_compute_boxplot is using percentile rather than percentile_approx, this limits the function to integers only.

Here is some sample code

    library(DBI)
    library(odbc)
    library(dplyr)
    library(dbplyr)
    library(ggplot2)
    library(dbplot)
con <- dbConnect(odbc(), "QoE", user = "foo", 
    pwd = "bar", bigint = "numeric")

Write Iris to Hive table removing '.' from col names

foo<-iris
colnames(foo)<-gsub(".", "_", colnames(foo), fixed = TRUE)
dbWriteTable(con,"iris",foo,overwrite=TRUE)

Run db_compute_boxplot - throws data type error for UDAF Percentile

db_iris<-tbl(con,in_schema("default","iris"))
db_iris %>% db_compute_boxplot(species,sepal_length)
Error in new_result(connection@ptr, statement) : nanodbc/nanodbc.cpp:1344: HY000: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: 
Error while compiling statement: 
FAILED: NoMatchingMethodException 
No matching method for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (double, double). Possible choices: _FUNC_(bigint, array<double>) _FUNC_(bigint, double)

Run again casting Sepal_Length to Integer

db_iris %>% db_compute_boxplot(species,as.integer(sepal_length)) %>% 
  ggplot()+geom_boxplot(aes(x=species,
                            middle=middle,
                            lower=lower,
                            upper=upper,
                            ymin=ymin,
                            ymax=ymax,
                            color=species),stat='identity')

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions