-
Notifications
You must be signed in to change notification settings - Fork 5
Distributed Resources and Dependency Injection
Mara greatly simplifies the distribution of resources, hiding the complexities of the distributed cache behind the annotations: @Distribute
and @Resource
. This pair of annotations is used to distribute values from the Tool class to mapreduce components. Both static and dynamically-built objects and values may be passed into your component classes.
Driver or context properties you wish to make available in your Hadoop framework components may be annotated with the FIELD or METHOD level @Distribute annotation. If you annotate a method, the method will be invoked once during job initialization and its output distributed through the framework. Presently the annotation works with primitives, Strings, org.apache.hadoop.fs.Path
, java.io.File
, or any class that implements java.io.Serializable
. Future enhancements will likely include support for serialization via Kryo.
To access the object in your Hadoop component – currently mappers and reducers are supported – annotate the class with either the Spring @Service stereotype annotation or the appropriate framework-specific @MapperService
, @CombinerService
, or @ReducerService
annotations. Then annotate the member you wish to inject with the @Resource
annotation. Basic type conversion will work as expected – for example Strings from the context will be converted to primitive values if they may be parsed into the target type. Legal casts will function as expected. Use the optional ‘name’ parameter if you wish the distributed name to differ from the target resource bean property.
In your driver:
// Simple FIELD-level annotation of a property
@Distribute
private String myProperty
// Can apply at FIELD or METHOD level
@Distribute
public Set<String> getBlacklist() throws IOException {
Set<String> blacklist = null;
if (StringUtils.isNotBlank(context.blacklist)) {
blacklist = new HashSet<String>();
…<populate set>…
}
return blacklist;
}
// May also be applied to context-level elements
private static class MyContext extends DriverContextBase {
@Distribute
@Option(required=true, argName="string-value", description="A distributed cli option")
private String myCliOption;
}
In your mapreduce component:
@Resource(name="blacklist")
private Set<String> blacklistedWords;
Mara supports these annotations within unit tests as well - something difficult to replicate when using the distributed cache mechanism. See Unit Testing with Dependency Injection for details and examples.