Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write.avro fails on data frame #3

Open
piccolbo opened this issue Mar 3, 2015 · 3 comments
Open

write.avro fails on data frame #3

piccolbo opened this issue Mar 3, 2015 · 3 comments
Labels

Comments

@piccolbo
Copy link
Collaborator

piccolbo commented Mar 3, 2015

Error is
ravro:::write.avro(df, tf1)
Exception in thread "main" org.apache.avro.SchemaParseException: Enum has no symbols: {"name":"col_2","type":"enum","symbols":"d"}
at org.apache.avro.Schema.parse(Schema.java:1121)
at org.apache.avro.Schema.parse(Schema.java:1094)
at org.apache.avro.Schema$Parser.parse(Schema.java:927)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91)
at org.apache.avro.tool.Main.run(Main.java:80)
at org.apache.avro.tool.Main.main(Main.java:69)

dump of df

df <-
structure(list(col_1 = 139.084976531123, col_2 = structure(1L, .Label = "d", class = "factor"),
col_3 = TRUE, col_4 = FALSE, col_5 = -11.3948273417181, col_6 = 90.2836501356233,
col_7 = structure(1L, .Label = "", class = "factor"), col_8 = structure(1L, .Label = "57be", class = "factor")), .Names = c("col_1",
"col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"
), row.names = c(NA, -1L), class = "data.frame")

Another instance

Exception in thread "main" org.apache.avro.SchemaParseException: Enum has no symbols: {"name":"col_1","type":"enum","symbols":"_6f7a4bc347_ravro"}
at org.apache.avro.Schema.parse(Schema.java:1121)
at org.apache.avro.Schema.parse(Schema.java:1094)
at org.apache.avro.Schema$Parser.parse(Schema.java:927)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91)
at org.apache.avro.tool.Main.run(Main.java:80)
at org.apache.avro.tool.Main.main(Main.java:69)

Dump

df <-
structure(list(col_1 = structure(1L, .Label = "6f7a4bc347", class = "factor"),
col_2 = structure(1L, .Label = "46f315f9", class = "factor"),
col_3 = -158.916518470489, col_4 = -72.4716823839384, col_5 = 34L,
col_6 = structure(1L, .Label = "6f7a", class = "factor"),
col_7 = -10L, col_8 = 10L), .Names = c("col_1", "col_2",
"col_3", "col_4", "col_5", "col_6", "col_7", "col_8"), row.names = c(NA,
-1L), class = "data.frame")

My theory from several example is failure occurs iff input is a data frame with a single row and at least one factor column

@piccolbo piccolbo added the bug label Mar 3, 2015
@jamiefolson
Copy link
Contributor

My two thoughts:

  1. Does Avro allow an enum with only one level?
  2. If an enum is allowed to have a single level, we might need to change
    the enum levels from a character vector to a list, so that toJSON will
    produce ["d"] instead of "d".

Jamie Olson

On Tue, Mar 3, 2015 at 3:59 PM, Antonio Piccolboni <notifications@github.com

wrote:

Error is
ravro:::write.avro(df, tf1)
Exception in thread "main" org.apache.avro.SchemaParseException: Enum has
no symbols: {"name":"col_2","type":"enum","symbols":"d"}
at org.apache.avro.Schema.parse(Schema.java:1121)
at org.apache.avro.Schema.parse(Schema.java:1094)
at org.apache.avro.Schema$Parser.parse(Schema.java:927)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91)
at org.apache.avro.tool.Main.run(Main.java:80)
at org.apache.avro.tool.Main.main(Main.java:69)

dump of df

df <-
structure(list(col_1 = 139.084976531123, col_2 = structure(1L, .Label =
"d", class = "factor"),
col_3 = TRUE, col_4 = FALSE, col_5 = -11.3948273417181, col_6 =
90.2836501356233,
col_7 = structure(1L, .Label = "", class = "factor"), col_8 =
structure(1L, .Label = "57be", class = "factor")), .Names = c("col_1",
"col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"
), row.names = c(NA, -1L), class = "data.frame")

Another instance

Exception in thread "main" org.apache.avro.SchemaParseException: Enum has
no symbols: {"name":"col_1","type":"enum","symbols":"_6f7a4bc347_ravro"}
at org.apache.avro.Schema.parse(Schema.java:1121)
at org.apache.avro.Schema.parse(Schema.java:1094)
at org.apache.avro.Schema$Parser.parse(Schema.java:927)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91)
at org.apache.avro.tool.Main.run(Main.java:80)
at org.apache.avro.tool.Main.main(Main.java:69)

Dump

df <-
structure(list(col_1 = structure(1L, .Label = "6f7a4bc347", class =
"factor"),
col_2 = structure(1L, .Label = "46f315f9", class = "factor"),
col_3 = -158.916518470489, col_4 = -72.4716823839384, col_5 = 34L,
col_6 = structure(1L, .Label = "6f7a", class = "factor"),
col_7 = -10L, col_8 = 10L), .Names = c("col_1", "col_2",
"col_3", "col_4", "col_5", "col_6", "col_7", "col_8"), row.names = c(NA,
-1L), class = "data.frame")

My theory from several example is failure occurs iff input is a data frame
with a single row and at least one factor column


Reply to this email directly or view it on GitHub
#3.

@piccolbo
Copy link
Collaborator Author

piccolbo commented Mar 5, 2015

I think it's admissible from reading the specs, but I am not sure it should be very high on our priority list. How useful are single level enums in real life? I modified my tests to generate at least two levels. I think we can reasonably delay this until there is a second request.

@piccolbo
Copy link
Collaborator Author

piccolbo commented Mar 5, 2015

I mean you can close with won't fix AFAIK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants