I have a document type with a property firstname. The document repository should be filtered/queried by a HTML form input parameter. I am having trouble setting up an HstQuery and Filters, i.e. addContains() or addLike() do not behave as expected. I tried various XPath expressions and JCR SQL queries which seemingly are not supported anyway.
Let’s say there is a document containing ‘theo’ in the firstname property.
Using the Repository servlet the following queries do not work:
//*[@hippostd:state = 'published' and jcr:like(@starterstoreboot:firstname,'%theo%')]
//*[@hippostd:state = 'published' and jcr:contains(@starterstoreboot:firstname,'the')]
The first query only works if ‘theo’ is reduced to ‘he’, most likely so that the % will match a char at the start/end of the property value and produce a result. The second query only works for an exact match ‘theo’ but not a partial match.
How should a substring / partial match against document contents be implemented properly in brxm? I have tried all combinations I could think of but without any luck.
Most likely ‘theo’ in the document contains nonbreaking space chars
before or after it. Therefor the term ‘theo’ does not occur. Or you do
not have sufficient read access. Either way, use the addContains and
not the addLike since the latter is really not needed and discouraged
in general.
The search really works (you have the most basic search there is which
if that wouldn’t work every customer would have a problem), there must
be something odd in your document otherwise.
//*[@hippostd:state = 'published' and jcr:like(fn:lower-case(@starterstoreboot:firstname),'%theo%')]
As indicated, jcr:like really is discouraged as it easily blows up a repository. This is because of how inverted indexes like lucene work wrt query expansion. The fn:lower-case also should not be needed.
Rather first test
//*[@hippostd:state = ‘published’ and jcr:contains(.,‘theo’)]
If that works, then try to find out why it doesn’t work for the starterstoreboot property
thanks for your support. I tried working with lower-case/fn:lower-case yesterday, but without success. Actually your query using fn:lower-case within the jcr:like expression does return the expected results - which is great.
Could you please quickly explain the difference between lower-case and fn:lower-case to me?
However, I would like to avoid jcr:like as jcr:contains currently only ever produces results if there is an exact match. I did see the corresponding notes in the Filter class documentation.
thanks for your support as well. Yes, I really would like to avoid addLike() and use addContains(). The problem being that it is not producing results for a partial match, but only for a case-sensitive exact match: given the search phrase e.g. ‘he’ no results show up for documents containing ‘theo’ - which in my understanding should definitely work.
Are there differences with respect to the property types? In my case it is a String property, not a Text, Date or Compound etc.
thanks for your support as well. Yes, I really would like to avoid addLike() and use addContains(). The problem being that it is not producing results for a partial match, but only for a case-sensitive exact match:
addContains matches case independent
given the search phrase e.g. ‘he’ no results show up for documents containing ‘theo’ - which in my understanding should definitely work.
Why? That is an awful search experience. If you really want, jcr:like does that, but be warned, you will very easily get OOM exceptions, you can google this for Lucene with prefix wildcards.
Are there differences with respect to the property types? In my case it is a String property, not a Text, Date or Compound etc.
jcr:contains should just work on matches. You can use jcr contains with post fix wildcards if you want, but prefix wildcards we don’t support for OOM reasons. Also I really think the search experience will be awful with prefix wildcards
ok, our use case is really simple: given a list of contact persons a user wants to search that document collection of contact persons with search parameters, e.g. by firstname. How would you approach this other than a partial match as outlined above? What would be a best practice advised by bloomreach?
If jcr:contains behaves as expected I will use it, but I can’t figure out why it doesn’t. There are no non-visible chars, simply the plain string ‘Theo’.
This query works:
These don’t:
As we are new to bloomreach XP we might be missing some basic concept. Any points in the right direction are highly appreciated.
ok, our use case is really simple: given a list of contact persons a user wants to search that document collection of contact persons with search parameters, e.g. by firstname. How would you approach this other than a partial match as outlined above? What would be a best practice advised by bloomreach?
If jcr:contains behaves as expected I will use it, but I can’t figure out why it doesn’t. There are no non-visible chars, simply the plain string ‘Theo’.
This query works:
You see, the above is a case-insensitive match
These don’t:
As we are new to bloomreach XP we might be missing some basic concept. Any points in the right direction are highly appreciated.
Use ‘the*’ instead of ‘the%’ . The % is for jcr:like
you are right, I confused the wildcards. Partial matching with jcr:contains now works, given that the search phrase is located at the start of the searched content properties.
From what I see in the FilterImpl class something like '*theo*' is not supported. I guess we’ll have to build or own search logic for that.
Thanks again for your help and kind regards,
Frank