sparql join query explanation hows its working

sparql join query explanation hows its working  using -'hadoop,sparql,bigdata,jena'

My query:

select ?x ?z
  ?x <> ?y .
  ?x <> ?z .
  ?x <> "176-186" .

I required to make custom parser for this query.

When I do this query on jena model, it returns one record.
Can anyone explain this query implementation?

I split out this query into three parts:

select ?x ?y where { ?x <> ?y . }

Total Records Found : 3034

select ?x ?z where { ?x <> ?y . ?x <> ?z . }

 Total Records Found : 2679

select ?x ?z where { ?x <> ?y . ?x <> ?z . ?x <> "176-186" . }

 Total Records Found : 1

Please help me to make custom query parser.

asked Oct 11, 2015 by yashwantpinge
0 votes
1 view

3 Answers

0 votes

You are trying to calculate the join of the three triple patterns. Papers on join implementation over Apache Hadoop will be useful background.

It may helpful to look at Apache Spark and the Resilient Distributed Dataset (RDD) concept.

It is also important to consider likely selectivity of each pattern - as Joshua says, the "pages" pattern may well be yield a unique solution and using that to simply lookup each of "name" and "volume" is not a demanding task.

ARQ's in-memory algorithm is not aiming for maximum independent parallelism which is what you want on Hadoop. Merge joins (or sort-merge joins) make two parallelizable accesses to the data.

You can extend ARQ at the basic pattern level or at the whole algebra execution level, or any point in between, by extends class OpExecutor.

answered Oct 11, 2015 by rajesh
0 votes

It sounds like you're asking why

select ?x ?z where {
  ?x <> ?y .           # (a)
  ?x <> ?z .         # (b)
  ?x <> "176-186" .   # (c)

returns just one result, while each line alone returns more. Triple patterns in SPARQL are conjunctive: non-optional patterns must be matched by the data in order for results to be returned. Thus, you're asking for the values of ?x and ?z where ALL of the following hold:

  • ?x has the name ?y, AND
  • ?x has some value for volume, AND
  • ?x has the specific value "176-186" for pages.

Based on the names of the properties, it sounds like you're querying some bibilographic information. It's not surprising that in a given bibliographic database, there might be only one article whose pages are exactly `"176-186", as that's a very specific value.

answered Oct 11, 2015 by abhimca2006
0 votes

Edited to include the correct algebra link

The best advice that I can offer is to look at the Jena documentation for ARQ's SPARQL Algebra and derive your custom evaluation engine at that level. Another reference that may be informative is the W3 SPARQL Algebra.

It seems (from the tags that you have selected) that you intend to perform query operations distributed throughout a map-reduce job, and you are looking at a specific example of the application of the algebra as a proof-of-concept. If your intent is to integrate this into Jena's query evaluation, then you will need to manually explore Jena's existing system in order to understand why it behaves the way it does.

answered Oct 11, 2015 by jekbishnoi