Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

Merged
merged 1 commit into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 23 additions & 4 deletions src/main/java/com/amazon/ion/MacroAwareIonReader.kt
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,27 @@ import java.io.IOException
interface MacroAwareIonReader : Closeable {

/**
* Performs a macro-aware transcode of the stream being read by this reader.
* Performs a macro-aware transcode of all values in the stream. This is
* shorthand for calling [prepareTranscodeTo], then calling [transcodeNext]
* repetitively until it returns `false`.
* @param writer the writer to which the reader's stream will be transcoded.
*/
@Throws(IOException::class)
fun transcodeAllTo(writer: MacroAwareIonWriter)

/**
* Prepares the reader to perform a macro-aware transcode to the given
* writer. This must be called before calling [transcodeNext], but is not
* necessary if calling [transcodeAllTo].
* @param writer the writer to which the reader's stream will be transcoded.
*/
fun prepareTranscodeTo(writer: MacroAwareIonWriter)
Comment on lines +22 to +28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes the prepareTranscodeTo(writer)/transcodeNext() preferable to transcodeTo(writer)? From a glance over it looks like either would work, so I likely missed something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge difference, but the current factoring allows for the setup (registering the IVM callback) to happen once, rather than before every value that is transcoded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I see it now. It's really more of a question of whether you know that the callback has been registered, right? So could that state also be carried implicitly, by virtue of a private method that assumes the callback has already been registered and a public method which does the registration then calls the private method? Or am I misunderstanding how this works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that may be possible. We can follow up in #1015. My main goal with this factoring was to avoid repetitive re-setting of the callback. It's not as easy as "is a callback already registered", because a callback is always registered so that the higher-level reader gets notified when IVMs are encountered. Here, we augment that functionality.


/**
* Performs a macro-aware transcode of the next value read by this reader
* to the writer previously provided to a call to [prepareTranscodeTo].
* For Ion 1.0 streams, this functions similarly to providing a system-level
* [IonReader] to [IonWriter.writeValues]. For Ion 1.1 streams, the transcoded
* [IonReader] to [IonWriter.writeValue]. For Ion 1.1 streams, the transcoded
* stream will include the same symbol tables, encoding directives, and
* e-expression invocations as the source stream. In both cases, the
* transcoded stream will be data-model equivalent to the source stream.
Expand All @@ -34,8 +52,9 @@ interface MacroAwareIonReader : Closeable {
* To get a [MacroAwareIonReader] use `_Private_IonReaderBuilder.buildMacroAware`.
* To get a [MacroAwareIonWriter] use [IonEncodingVersion.textWriterBuilder] or
* [IonEncodingVersion.binaryWriterBuilder].
* @param writer the writer to which the reader's stream will be transcoded.
* @return true if a value was transcoded; false if the end of the stream was reached.
* @throws IOException if thrown during writing.
*/
@Throws(IOException::class)
fun transcodeTo(writer: MacroAwareIonWriter)
fun transcodeNext(): Boolean
}
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,9 @@ class IonReaderContinuableCoreBinary extends IonCursorBinary implements IonReade
// Indicates whether the reader is currently evaluating an e-expression.
protected boolean isEvaluatingEExpression = false;

// The writer that will perform a macro-aware transcode, if requested.
private MacroAwareIonWriter macroAwareTranscoder = null;

/**
* Constructs a new reader from the given byte array.
* @param configuration the configuration to use. The buffer size and oversized value configuration are unused, as
Expand Down Expand Up @@ -1799,30 +1802,34 @@ private boolean evaluateNext() {
}

@Override
public void transcodeTo(MacroAwareIonWriter writer) throws IOException {
public void transcodeAllTo(MacroAwareIonWriter writer) throws IOException {
prepareTranscodeTo(writer);
while (transcodeNext());
}

@Override
public void prepareTranscodeTo(MacroAwareIonWriter writer) {
registerIvmNotificationConsumer((major, minor) -> {
resetEncodingContext();
// Which IVM to write is inherent to the writer implementation.
// We don't have a single implementation that writes both formats.
writer.startEncodingSegmentWithIonVersionMarker();
});
while (transcodeNextTo(writer) != Event.NEEDS_DATA);
macroAwareTranscoder = writer;
}

/**
* Transcodes the next value, and any encoding directives that may precede it,
* to the given writer.
* @param writer the writer to which the value will be transcoded.
* @return the result of the operation.
* @throws IOException if thrown during writing.
*/
Event transcodeNextTo(MacroAwareIonWriter writer) throws IOException {
@Override
public boolean transcodeNext() throws IOException {
if (macroAwareTranscoder == null) {
throw new IllegalArgumentException("prepareTranscodeTo must be called before transcodeNext.");
}
// NOTE: this method is structured very similarly to nextValue(). During performance analysis, we should
// see if the methods can be unified without sacrificing hot path performance. Performance of this method
// is not considered critical.
lobBytesRead = 0;
while (true) {
if (parent == null || state != State.READING_VALUE) {
boolean isEncodingDirective = false;
if (state != State.READING_VALUE && state != State.COMPILING_MACRO) {
boolean isEncodingDirectiveFromEExpression = isEvaluatingEExpression;
encodingDirectiveReader.readEncodingDirective();
Expand All @@ -1832,17 +1839,22 @@ Event transcodeNextTo(MacroAwareIonWriter writer) throws IOException {
// If the encoding directive was expanded from an e-expression, that expression has already been
// written. In that case, just make sure the writer is using the new context. Otherwise, also write
// the encoding directive.
writer.startEncodingSegmentWithEncodingDirective(
macroAwareTranscoder.startEncodingSegmentWithEncodingDirective(
encodingDirectiveReader.newMacros,
encodingDirectiveReader.isMacroTableAppend,
encodingDirectiveReader.newSymbols,
encodingDirectiveReader.isSymbolTableAppend,
isEncodingDirectiveFromEExpression
);
isEncodingDirective = true;
}
if (isEvaluatingEExpression) {
if (evaluateNext()) {
continue;
if (isEncodingDirective) {
continue;
}
// This is the end of a top-level macro invocation that expanded to a user value.
return true;
}
} else {
event = super.nextValue();
Expand All @@ -1854,16 +1866,18 @@ Event transcodeNextTo(MacroAwareIonWriter writer) throws IOException {
}
} else if (isEvaluatingEExpression) {
if (evaluateNext()) {
// This is the end of a contained macro invocation; continue iterating through the parent container.
continue;
}
} else {
event = super.nextValue();
}
if (valueTid != null && valueTid.isMacroInvocation) {
expressionArgsReader.beginEvaluatingMacroInvocation(macroEvaluator);
macroEvaluatorIonReader.transcodeArgumentsTo(writer);
macroEvaluatorIonReader.transcodeArgumentsTo(macroAwareTranscoder);
isEvaluatingEExpression = true;
if (evaluateNext()) {
// This macro invocation expands to nothing; continue iterating until a user value is found.
continue;
}
if (parent == null && isPositionedOnEvaluatedEncodingDirective()) {
Expand All @@ -1878,15 +1892,44 @@ Event transcodeNextTo(MacroAwareIonWriter writer) throws IOException {
}
break;
}
if (event != Event.NEEDS_DATA) {
if (minorVersion > 0 && isPositionedOnSymbolTable()) {
if (event == Event.NEEDS_DATA || event == Event.END_CONTAINER) {
return false;
}
transcodeValueLiteral();
return true;
}

/**
* Transcodes a value literal to the macroAwareTranscoder. The caller must ensure that the reader is positioned
* on a value literal (i.e. a scalar or container value not expanded from an e-expression) before calling this
* method.
* @throws IOException if thrown by the writer during transcoding.
*/
private void transcodeValueLiteral() throws IOException {
if (parent == null && isPositionedOnSymbolTable()) {
if (minorVersion > 0) {
// TODO finalize handling of Ion 1.0-style symbol tables in Ion 1.1: https://github.com/amazon-ion/ion-java/issues/1002
throw new IonException("Macro-aware transcoding of Ion 1.1 data containing Ion 1.0-style symbol tables not yet supported.");
}
// The reader is now positioned on an actual encoding value. Write the value.
writer.writeValue(asIonReader);
// Ion 1.0 symbol tables are transcoded verbatim for now; this may change depending on the resolution to
// https://github.com/amazon-ion/ion-java/issues/1002.
macroAwareTranscoder.writeValue(asIonReader);
} else if (event == Event.START_CONTAINER && !isNullValue()) {
// Containers need to be transcoded recursively to avoid expanding macro invocations at any depth.
if (isInStruct()) {
macroAwareTranscoder.setFieldNameSymbol(getFieldNameSymbol());
}
macroAwareTranscoder.setTypeAnnotationSymbols(asIonReader.getTypeAnnotationSymbols());
macroAwareTranscoder.stepIn(getEncodingType());
super.stepIntoContainer();
while (transcodeNext()); // TODO make this iterative.
super.stepOutOfContainer();
macroAwareTranscoder.stepOut();
} else {
// The reader is now positioned on a scalar literal. Write the value.
// Note: writeValue will include any field name and/or annotations on the scalar.
macroAwareTranscoder.writeValue(asIonReader);
}
return event;
}

@Override
Expand Down
61 changes: 41 additions & 20 deletions src/main/java/com/amazon/ion/impl/_Private_IonReaderBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -183,26 +183,26 @@ public void close() throws IOException {
}

@FunctionalInterface
interface IonReaderFromBytesFactoryText {
IonReader makeReader(IonCatalog catalog, byte[] ionData, int offset, int length, _Private_LocalSymbolTableFactory lstFactory);
interface IonReaderFromBytesFactoryText<T> {
T makeReader(IonCatalog catalog, byte[] ionData, int offset, int length, _Private_LocalSymbolTableFactory lstFactory);
}

@FunctionalInterface
interface IonReaderFromBytesFactoryBinary {
IonReader makeReader(_Private_IonReaderBuilder builder, byte[] ionData, int offset, int length);
interface IonReaderFromBytesFactoryBinary<T> {
T makeReader(_Private_IonReaderBuilder builder, byte[] ionData, int offset, int length);
}

static IonReader buildReader(
static <T> T buildReader(
_Private_IonReaderBuilder builder,
byte[] ionData,
int offset,
int length,
IonReaderFromBytesFactoryBinary binary,
IonReaderFromBytesFactoryText text
IonReaderFromBytesFactoryBinary<T> binary,
IonReaderFromBytesFactoryText<T> text
) {
if (IonStreamUtils.isGzip(ionData, offset, length)) {
try {
return buildReader(
return (T) buildReader(
builder,
new GZIPInputStream(new ByteArrayInputStream(ionData, offset, length)),
_Private_IonReaderFactory::makeReaderBinary,
Expand Down Expand Up @@ -257,20 +257,20 @@ private static boolean startsWithGzipHeader(byte[] buffer, int length) {
}

@FunctionalInterface
interface IonReaderFromInputStreamFactoryText {
IonReader makeReader(IonCatalog catalog, InputStream source, _Private_LocalSymbolTableFactory lstFactory);
interface IonReaderFromInputStreamFactoryText<T> {
T makeReader(IonCatalog catalog, InputStream source, _Private_LocalSymbolTableFactory lstFactory);
}

@FunctionalInterface
interface IonReaderFromInputStreamFactoryBinary {
IonReader makeReader(_Private_IonReaderBuilder builder, InputStream source, byte[] alreadyRead, int alreadyReadOff, int alreadyReadLen);
interface IonReaderFromInputStreamFactoryBinary<T> {
T makeReader(_Private_IonReaderBuilder builder, InputStream source, byte[] alreadyRead, int alreadyReadOff, int alreadyReadLen);
}

static IonReader buildReader(
static <T> T buildReader(
_Private_IonReaderBuilder builder,
InputStream source,
IonReaderFromInputStreamFactoryBinary binary,
IonReaderFromInputStreamFactoryText text
IonReaderFromInputStreamFactoryBinary<T> binary,
IonReaderFromInputStreamFactoryText<T> text
) {
if (source == null) {
throw new NullPointerException("Cannot build a reader from a null InputStream.");
Expand Down Expand Up @@ -358,10 +358,31 @@ public IonTextReader build(String ionText) {
* @return a new MacroAwareIonReader instance.
*/
public MacroAwareIonReader buildMacroAware(byte[] ionData) {
// TODO make this work for text too.
if (!IonStreamUtils.isIonBinary(ionData)) {
throw new UnsupportedOperationException("MacroAwareIonReader is not yet implemented for text data.");
}
return new IonReaderContinuableCoreBinary(getBufferConfiguration(), ionData, 0, ionData.length);
return buildReader(
this,
ionData,
0,
ionData.length,
(builder, data, offset, length) -> new IonReaderContinuableCoreBinary(builder.getBufferConfiguration(), data, offset,length),
(catalog, data, offset, length, factory) -> {
throw new UnsupportedOperationException("MacroAwareIonReader is not yet implemented for text data.");
}
);
}

/**
* Creates a new {@link MacroAwareIonReader} over the given data.
* @param ionData the data to read.
* @return a new MacroAwareIonReader instance.
*/
public MacroAwareIonReader buildMacroAware(InputStream ionData) {
return buildReader(
this,
ionData,
(builder, source, alreadyRead, alreadyReadOff, alreadyReadLen) -> new IonReaderContinuableCoreBinary(builder.getBufferConfiguration(), source, alreadyRead, alreadyReadOff, alreadyReadLen),
(catalog, source, factory) -> {
throw new UnsupportedOperationException("MacroAwareIonReader is not yet implemented for text data.");
}
);
}
}
2 changes: 1 addition & 1 deletion src/test/java/com/amazon/ion/Ion_1_1_RoundTripTest.kt
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ class Ion_1_1_RoundTripTest {
val reader: MacroAwareIonReader = (IonReaderBuilder.standard() as _Private_IonReaderBuilder).buildMacroAware(ion)
val writer: MacroAwareIonWriter = ION_1_1.textWriterBuilder().build(actual) as MacroAwareIonWriter

reader.transcodeTo(writer)
reader.transcodeAllTo(writer)

reader.close()
writer.close()
Expand Down
Loading
Loading