You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import pyarrow as pa
import pyarrow.parquet as pq
pq.write_table(
pa.table(
{
"uint32_field": pa.array([None, None, 28], pa.uint32()),
"int32_field": pa.array([None, 28, 28], pa.int32()),
}
),
"/tmp/my_test_messages.parquet",
)
And I try to read it using parquet-java (in kotlin, but it doesn't matter):
packageorg.apache.parquet.testimportorg.apache.parquet.test.MyTestMessageimportcom.google.protobuf.Int32Valueimportio.kotest.matchers.shouldBeimportorg.apache.hadoop.fs.Pathimportorg.apache.parquet.proto.ProtoConstantsimportorg.apache.parquet.proto.ProtoParquetReaderimportorg.apache.parquet.proto.ProtoReadSupportimportorg.junit.jupiter.api.TestclassTestUInt32Value {
@Test
fun`test can not load UInt32Value`() {
val reader =ProtoParquetReader.builder<MyTestMessage.Builder>(
Path("file:///tmp/my_test_messages.parquet")
)
.set(ProtoReadSupport.PB_CLASS, MyTestMessage::class.java.canonicalName)
.set(ProtoConstants.CONFIG_IGNORE_UNKNOWN_FIELDS, "true")
.build()
val firstMessage = reader.read().build()
firstMessage shouldBe MyTestMessage.getDefaultInstance()
val secondMessage = reader.read().build()
secondMessage shouldBe
MyTestMessage.newBuilder().setInt32Field(Int32Value.of(28)).build()
val thirdMessage = reader.read()
}
}
I get this error when reading the third message:
org.apache.parquet.io.ParquetDecodingException: Can not read value at 3 in block 0 in file file:/tmp/my_test_messages.parquet
at app//org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:280)
at app//org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at app//org.apache.parquet.test.TestUInt32Value.test can load bad not nested plain(TestUInt32Value.kt:29)
Caused by:
java.lang.UnsupportedOperationException: org.apache.parquet.proto.ProtoMessageConverter$ProtoUInt32ValueConverter
at org.apache.parquet.io.api.PrimitiveConverter.addInt(PrimitiveConverter.java:101)
at org.apache.parquet.column.impl.ColumnReaderBase$2$3.writeValue(ColumnReaderBase.java:321)
at org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:486)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:425)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:249)
... 2 more
A few thing to note:
this works for the second message, which means it is implemented correctly for (signed) Int32Value
It works if you generate the data using the JVM.
But this is because when you do so the parquet table has got a different structure (each message is a nested struct {"value": 28}
Describe the bug, including details regarding any error messages, version, and platform.
TLDR: the parquet protobuf reader doesn't work for UInt32Value
I have protobuf using wrapped unsigned and signed integer:
I then generate a parquet file for that data:
And I try to read it using parquet-java (in kotlin, but it doesn't matter):
I get this error when reading the third message:
A few thing to note:
But this is because when you do so the parquet table has got a different structure (each message is a nested struct
{"value": 28}
This is basically generating a table that looks like this:
Component(s)
Protobuf
The text was updated successfully, but these errors were encountered: