In solr, how to search for blank facets in a multi valued facet field and at the same time, simply returning proper matches?

i have an application where users can pick car parts. they pick their vehicle and then pick vehicle attributes as facets. after they select their vehicle, they can pick facets like engine size, for example, to narrow down the list of results. the problem was, not all do*****ents have an engine size (it's an empty value in solr), as it doesn't matter for all parts. for example, an engine size rarely matters for an air filter. so even if a user picked 3.5l for their engine size, i still wanted to show the air filters on the screen as a possible part the user could pick.i did some searching and the following facet query works perfectly:

 enginesize:"3.5" or enginesize:(*:* and -enginesize:[* to *]) 

this query would match either 3.5 or would match records where there was no value for the engine size field (no value meant it didn't matter, and it fit the car). perfect...

the problem: i recently made the vehicle attribute fields multivalued fields, so i could store attributes for each part as a list. i then applied faceting to it, and it worked fine. however, the problem came up when i applied the query previously mentioned above. while selecting the enginesize facet narrowed down the number of do*****ents displayed to only do*****ents that have that engine size, records (i also use the word record to mean do*****ent) that had empty values (i.e. "") for enginesize were not appearing. the same query above does not work for multivalued facets the same way it did when enginesize was a single valued field.


  engine mount    3.5 3.5 3.5 3.5 3.5   engine bolt  6 6 6 6 6    air filter          

so what i am looking for is a query that will pull back do*****ents 1 and 3 above when i do a facet search for the engine size for 3.5. the first do*****ent (the engine mount) matches, because it contains the value in one of the multivalued fields "enginesize" that i am looking for (contains 3.5 in one of the fields). however, the third do*****ent for the air filter doesn't get returned because of the empty values. i do not want to return the second do*****ent at all because it doesn't match the facet value

i basically want a query that will match empty string values for a given facet and also match the actual value, so i get both do*****ents returned.

does someone have a query that would return do*****ent 1 and do*****ent 3 (the engine bracket and the air filter), but not the engine bolt do*****ent?

i tried the following without success (including the one at the very top of this question):

// returns everything
enginesize:"3.5" or (enginesize:[* to *] ) 
// only returns do*****ent 1
enginesize:"3.5" or (enginesize:["" to ""] and -enginesize:"3.5") 
// only returns do*****ent 1
enginesize:"3.5" or (enginesize:"")

i imported the data above using a csv file, i set the field keepempty=true. i tried instead manually inserting a space into the field when i generated the csv file (which would give you , instead of the previous , and then retried the queries. doing that, i got the following results:

// returns do*****ent 1
enginesize:"3.5" or enginesize:(: and -enginesize:[* to *]) 
// returns all do*****ents
enginesize:"3.5" or (enginesize:["" to ""] and -enginesize:"3.5") 
// returns all do*****ents
enginesize:"3.5" or (enginesize:"")

does anyone have a query that would work for either situation, whether i have a space as the blank value or simply no value at all?

2 Answers

how about changing how you index, instead of how you query?

instead of trying to index "engine size doesn't matter" as an empty record, index it as "any".

then your query simply becomes enginesize:"3.5" or (enginesize:any)
i've just been playing with this and found a hint that seems to do the trick for me. translated to your query it should be:

enginesize:"3.5" or (-enginesize:["" to *])


update: after some more testing i don't think this works reliably — for some indexes it had to be the other way round and without the minus sign, i.e. enginesize:[* to ""]. this might depend on the index type, if it's multi-valued or even on the actual values.

in any case it seems too much of a hack. i'll probably resolve to subs*****uting the empty value with a special marker
