Pre Exam Test Practice
Ask a Question
Advertise on boostr.in
Organizational Research By
Surprising Reserch Topic
pass a relation to a pig udf when using foreach on another relation
pass a relation to a pig udf when using foreach on another relation ¬†using -'hadoop,apache-pig'
We are using Pig 0.6 to process some data. ¬†One of the columns of our data is a space-separated list of ids (such as: 35 521 225). ¬†We are trying to map one of those ids to another file that contains 2 columns of mappings like (so column 1 is our data, column 2 is a 3rd parties data):
12 6129 ¬†
We wrote a UDF that takes in the column value (so: "35 521 225") and the mappings from the file. ¬†We would then split the column value and iterate over each and return the first mapped value from the passed in mappings (thinking that is how it would logically work).
We are loading the data in PIG like this:
data = LOAD 'input.txt' USING PigStorage() AS (name:chararray, category:chararray);
mappings = LOAD 'mappings.txt' USING PigStorage() AS (ourId:chararray, theirId:chararray);
Then our generate is:
output = FOREACH data GENERATE title, com.example.ourudf.Mapper(category, mappings);
However the error we get is:
'there is an error during parsing: Invalid alias mappings in [data::title: chararray,data::category, chararray]`
It seems that Pig is trying to find a column called "mappings" on our original data. ¬†Which if course isn't there. ¬†Is there any way to pass a relation that is loaded into a UDF?
Is there any way the "Map" type in PIG will help us here? ¬†Or do we need to somehow join the values?
EDIT: To be more specific - we don't want to map ALL of the category ids to the 3rd party ids. ¬†We just wanted to map the first. The UDF will iterate over the list of our category ids - and will return when it finds the first mapped value. ¬†So if the input looked like:
someProduct\t35 521 225
the output would be:
Oct 11, 2015
to add a comment.
Related Hot Questions
Government Jobs Opening