Filter duplicates (based on property) in HstQuery

Yves_Viegen · November 22, 2018, 2:27pm

Hi there,

We have the following use case. We have two scopes:

containing documents of type x in the default language (e.g. “en_US”).
for the locale (e.g. “nl_NL”) we have a folder containing translations of those documents from the default language as well as documents which only exists in this locale. Note: not all documents from the default language have translations in the other locale.

I want to do a query which gives me alle unique documents based on hippotranslation:id with a preference for the translated documents (“nl_NL”), so:

all documents from the default language (“en_US”) which don’t have a translation in the other locale (“nl_NL”)
all documents which have a translation (“nl_NL”)
all documents in the other locale (“nl_NL”) which don’t exist in default language

I currently do an HstQuery on both scopes and retrieve all documents of a certain type. Afterwards I perform a custom method to eliminate duplicate documents. A more elegant solution would be to do so in ‘one go’. So i’m looking for a way to add a filter/constraint (e.g. with a custom jcrExpressiony/xpath query) to eliminate the duplicates in one go.

E.g.

final HstQuery hstQuery = HstQueryBuilder.create(baseScope, translationScope)
    .ofPrimaryTypes(doctype)
    .limit(queryLimit)
    .where(
        // pseudocode
            if (current hippotranslation:locale equals hippotranslation:locale of defaultScope){
                if (hippotanslatiohippotranslation:id does not exist in translationScope){
                valid;
                }
                else {
                    not  valid;
                }
            }
            else {
                valid;
            }
          )
    .build();

Or visualized…

Is this possible? And if so can you point me in the right direction?

Hope you can advise.

Thank you!

Yves

Lef_Karamoulas · November 22, 2018, 3:58pm

Hello Yves,

when im reading the conditions it is a bit confusing with regards what exactly you are trying to retrieve.

if i rely on your visualisation this is a query i would try out.

SELECT * FROM hippotranslation:translated WHERE hippotranslation:locale=“en_US”

__ AND hippotranslation:id NOT IN (SELECT * FROM hippotranslation:translated WHERE hippotranslation:locale=“nl_NL”)__

UNION

SELECT * FROM hippotranslation:translated WHERE hippotranslation:locale=“nl_NL”

i havent tested it in the repository and maybe the syntax is not right but you get the point.

Hope this helps.

Kind regards.

Yves_Viegen · November 23, 2018, 9:30am

Thank you. It’s seems JCR SQL 1 doens’t allow for ‘NOT IN’ operations. Would it be possible using xpath you think?

Lef_Karamoulas · November 23, 2018, 12:33pm

Hello Yves,

I’ve tried possible approaches and combinations (both xpath and sql) but couldn’t come up with something that works. If someone else with extent knowledge could clarify if this is even possible.

Otherwise I guess your best choice here is programmatically as you described.

Kind regards

jasper.floor · November 23, 2018, 1:26pm

JCR sql syntax is very limited. SQL wasn’t designed for a tree structure but for relational data.
https://docs.adobe.com/docs/en/spec/jcr/1.0/8.5.3_SQL_EBNF.html

You can’t UNION or JOIN.

XPATH is mostly complete (afaik) but inherently more limited. It was built for a tree structure, but the limitations will be the same for the most part as SQL.

You can’t do comparison between node properties.

the following would cover case 2 and 3
//element(*, myhippoproject:newsdocument)[@hippotranslation:locale=‘nl_NL’ ]

Case 1 isn’t possible to query directly. You can query all english documents and get the translations from the bean via bean.getAvailableTranslations()getTranslation(“nl_NL”). Which would cover case 1 and 2. Though each call to getAvailableTranslations is actually a query in the background. So also not perfect.

Yves_Viegen · November 27, 2018, 1:19pm

Ok thank you all for thinking along. I will solve / have solved it programmatically.

Before asking the question I solved it by quering both scopes and remove the ‘duplicates’ afterwards. For this I used the getAvailableTranslations on every node so I was executing a lot of queries.

My new/current method of solving this is first use a query to get all hippotranslation:id’s from the translationScope. And use these id’s as constraint in the final query on both scopes. Not perfect, but now I only need two queries.

woonsanko · November 27, 2018, 6:39pm

Perhaps can you live even with single query?

/jcr:root/content/documents//element(*, hippotranslation:translated)[@hippo:availability='live'] order by @hippotranslation:id, @hippotranslation:locale

Regards,

Woonsan

Yves_Viegen · November 29, 2018, 10:33am

Thank you Woosan. But even if I query the entire repository I still need some way to programmatically filter the ‘duplicate’ translationid’s. For now i’m happy with the solution I’ve got.

Topic		Replies	Views
Duplicates in front end Experience Manager (PaaS/OnPrem)	5	512	February 1, 2019
Hst query to get distinct result based on document attribute Experience Manager (PaaS/OnPrem)	1	342	August 13, 2019
Partial match in document property with query parameter not working Experience Manager (PaaS/OnPrem)	10	849	August 8, 2019
Search by HSTQuery Experience Manager (PaaS/OnPrem)	2	399	January 18, 2021
How to query linked documents by property Experience Manager (PaaS/OnPrem)	1	608	September 3, 2018

Filter duplicates (based on property) in HstQuery

Related topics