Skip to content

Conversation

@Willem-J-an
Copy link
Contributor

@Willem-J-an Willem-J-an commented Jun 19, 2025

Rationale for this change

#415

What changes are included in this PR?

Fix decoding of avro types that no longer worked: fixed bytes, timestamp, time, date & decimals

Are these changes tested?

Added tests to the avro/reader_test.go

Are there any user-facing changes?

No API changes, just broader compatability.

@zeroshade
Copy link
Member

Thanks for this, can you add a test for it?

@Willem-J-an
Copy link
Contributor Author

Thanks for this, can you add a test for it?

Will do when I get around to it!

@Willem-J-an
Copy link
Contributor Author

Thanks for this, can you add a test for it?

@zeroshade I've added a test with some sample data based on the pre-existing avro schema that was in the repo. It encodes the data to avro & to json, and validates when the avro is decoded that it matches the json. For the json encoder I aligned it to what I saw the arrow record as json produced.

I'm wondering what happened here, as quite a few types were not working. Maybe hamba had some breaking changes and due to no tests they were not detected?

What do you think about the reader_type switch cases that do not seem to be applicable anymore; are there valid cases where these are still needed or does it make sense to delete them?

@Willem-J-an Willem-J-an changed the title feat: support avro decoded time struct Fix(avro-reader): bunch of types that didn't work Jul 3, 2025
@zeroshade
Copy link
Member

Which cases do you see are no longer applicable and not working? the tests that are being removed seem to be types that should be supported

@Willem-J-an
Copy link
Contributor Author

Tests that are being removed? I think no tests are being removed?

As an example, the cases that I'm referring to is this:

func appendFixedSizeBinaryData(b *array.FixedSizeBinaryBuilder, data interface{}) {
	switch dt := data.(type) {
	case nil: # <- is fine of course 
		b.AppendNull()
	case []byte: # <- this case does not apply at least in my testing because hamba decoder returns array, not a slice
		b.Append(dt)
	case map[string]any: # <- this case also didn't apply in my testing
		switch v := dt["bytes"].(type) {
		case nil:
			b.AppendNull()
		case []byte:
			b.Append(v)
		}

Not sure if these are legacy or still relevant in cases that I cannot oversee!

@Willem-J-an
Copy link
Contributor Author

Willem-J-an commented Jul 7, 2025

Seems like my changes do not work on windows, I added the license info to the avro schema which seems a bit weird. Is there a better way to deal with the license for this file?

@zeroshade
Copy link
Member

You can add the file to the dev/release/rat_exclude_files.txt that should exclude it from the license check

@zeroshade
Copy link
Member

<- this case does not apply at least in my testing because hamba decoder returns array, not a slice

That's odd and is likely a bug in hamba then, according to the docs it should return []bytes correctly:
image

<- this case also didn't apply in my testing

Didn't apply because we aren't testing it? or a different case gets matched for the test that should hit it?

@Willem-J-an
Copy link
Contributor Author

Screenshot_2025-07-07-19-14-51-05_fd7367fd0afc7e864f00091a00b3d0b0

My example was fixed binary, notice it shows up twice in above screenshot. I observed the fixed binary to return an array, as is described in the screenshot as well.

The case didn't apply because when I use arrow reader it converts to array, not to map and not to slice. I don't know if there's a scenario where it would actually be used, that's my question! I suppose if hamba doesn't decode into slice or map for fixed bytes, is there any other decoder that users can use that do hit these code paths? If not than maybe these are only applicable to hamba v1 or so, and not needed anymore.

If you're unsure we can also just leave it. The most important part for me is to have the working cases added, rather than the not working cases removed.

And fixed binary was an example, the same applies to timestamps and the other types I fixed.

@zeroshade
Copy link
Member

A fixed should probably map to a fixed_len_byte_array[N] if we can easily detect it, if we can't then I think it's okay to remove it for now if it's not currently working.

@Willem-J-an
Copy link
Contributor Author

I added working fixes for all the types already, including for fixed!

@Willem-J-an
Copy link
Contributor Author

Added the schema file to the rat exclude, that should fix the windows tests. Any other feedback I'd need to look at? Thanks!

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@zeroshade zeroshade changed the title Fix(avro-reader): bunch of types that didn't work fix(avro-reader): bunch of types that didn't work Jul 8, 2025
@zeroshade zeroshade changed the title fix(avro-reader): bunch of types that didn't work fix(arrow/avro-reader): bunch of types that didn't work Jul 8, 2025
@zeroshade zeroshade merged commit b196d3b into apache:main Jul 8, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants