Partial match in document property with query parameter not working

Hello all,

I have a document type with a property firstname. The document repository should be filtered/queried by a HTML form input parameter. I am having trouble setting up an HstQuery and Filters, i.e. addContains() or addLike() do not behave as expected. I tried various XPath expressions and JCR SQL queries which seemingly are not supported anyway.

Let’s say there is a document containing ‘theo’ in the firstname property.

Using the Repository servlet the following queries do not work:

//*[@hippostd:state = 'published' and jcr:like(@starterstoreboot:firstname,'%theo%')]
//*[@hippostd:state = 'published' and jcr:contains(@starterstoreboot:firstname,'the')]

The first query only works if ‘theo’ is reduced to ‘he’, most likely so that the % will match a char at the start/end of the property value and produce a result. The second query only works for an exact match ‘theo’ but not a partial match.

How should a substring / partial match against document contents be implemented properly in brxm? I have tried all combinations I could think of but without any luck.

Thanks in advance,
Frank

Hi Frank,

Can you check if this works?

//*[@hippostd:state = 'published' and jcr:like(fn:lower-case(@starterstoreboot:firstname),'%theo%')]

Regards,

Woonsan

Most likely ‘theo’ in the document contains nonbreaking space chars
before or after it. Therefor the term ‘theo’ does not occur. Or you do
not have sufficient read access. Either way, use the addContains and
not the addLike since the latter is really not needed and discouraged
in general.

The search really works (you have the most basic search there is which
if that wouldn’t work every customer would have a problem), there must
be something odd in your document otherwise.

Regards Ard


woonsanko

    August 7

Hi Frank,

Can you check if this works?

//*[@hippostd:state = 'published' and jcr:like(fn:lower-case(@starterstoreboot:firstname),'%theo%')]

As indicated, jcr:like really is discouraged as it easily blows up a repository. This is because of how inverted indexes like lucene work wrt query expansion. The fn:lower-case also should not be needed.

Rather first test

//*[@hippostd:state = ‘published’ and jcr:contains(.,‘theo’)]

If that works, then try to find out why it doesn’t work for the starterstoreboot property

Regards Ard

Hi Woonsan,

thanks for your support. I tried working with lower-case/fn:lower-case yesterday, but without success. Actually your query using fn:lower-case within the jcr:like expression does return the expected results - which is great.

Could you please quickly explain the difference between lower-case and fn:lower-case to me?

However, I would like to avoid jcr:like as jcr:contains currently only ever produces results if there is an exact match. I did see the corresponding notes in the Filter class documentation.

Best regards,
Frank

Hi Ard,

thanks for your support as well. Yes, I really would like to avoid addLike() and use addContains(). The problem being that it is not producing results for a partial match, but only for a case-sensitive exact match: given the search phrase e.g. ‘he’ no results show up for documents containing ‘theo’ - which in my understanding should definitely work.

Are there differences with respect to the property types? In my case it is a String property, not a Text, Date or Compound etc.

Kind regards,
Frank


FvM

    August 8

Hi Ard,

thanks for your support as well. Yes, I really would like to avoid addLike() and use addContains(). The problem being that it is not producing results for a partial match, but only for a case-sensitive exact match:

addContains matches case independent

given the search phrase e.g. ‘he’ no results show up for documents containing ‘theo’ - which in my understanding should definitely work.

Why? That is an awful search experience. If you really want, jcr:like does that, but be warned, you will very easily get OOM exceptions, you can google this for Lucene with prefix wildcards.

Are there differences with respect to the property types? In my case it is a String property, not a Text, Date or Compound etc.

jcr:contains should just work on matches. You can use jcr contains with post fix wildcards if you want, but prefix wildcards we don’t support for OOM reasons. Also I really think the search experience will be awful with prefix wildcards

Hi Ard,

ok, our use case is really simple: given a list of contact persons a user wants to search that document collection of contact persons with search parameters, e.g. by firstname. How would you approach this other than a partial match as outlined above? What would be a best practice advised by bloomreach?

If jcr:contains behaves as expected I will use it, but I can’t figure out why it doesn’t. There are no non-visible chars, simply the plain string ‘Theo’.

image

This query works:

image

These don’t:

image

image

As we are new to bloomreach XP we might be missing some basic concept. Any points in the right direction are highly appreciated.

Kind regards,
Frank


FvM

    August 8

Hi Ard,

ok, our use case is really simple: given a list of contact persons a user wants to search that document collection of contact persons with search parameters, e.g. by firstname. How would you approach this other than a partial match as outlined above? What would be a best practice advised by bloomreach?

If jcr:contains behaves as expected I will use it, but I can’t figure out why it doesn’t. There are no non-visible chars, simply the plain string ‘Theo’.

image

This query works:

You see, the above is a case-insensitive match

These don’t:

As we are new to bloomreach XP we might be missing some basic concept. Any points in the right direction are highly appreciated.

Use ‘the*’ instead of ‘the%’ . The % is for jcr:like

Regards Ard

Hi Ard,

you are right, I confused the wildcards. Partial matching with jcr:contains now works, given that the search phrase is located at the start of the searched content properties.

From what I see in the FilterImpl class something like '*theo*' is not supported. I guess we’ll have to build or own search logic for that.

Thanks again for your help and kind regards,
Frank

From what I see in the FilterImpl class something like '*theo*' is not supported. I guess we’ll have to build or own search logic for that.

it was already mentioned by Ard to avoid this if you don’t want your server to go OOM pretty much with few user requests.