Basic Indexing
This section describes how to send an object from a data source to Mindbreeze. You will learn how to implement a custom crawler and what is needed to send objects to an index.
Send objects to Mindbreeze
In order to be able to search, objects first have to lie in the index. This chapter explains how you can send objects from your data source to Mindbreeze. It’s very easy to make an object searchable – the following lines are all you need to set up an object with the title “title” and the key “1” in the index:
Indexable indexable = new Indexable(); indexable.setKey("1"); indexable.setTitle("title"); client.filterAndIndex(indexable);
There are a few things to clarify around these lines. First and foremost you have to think about what documents from your data source are relevant for the search.
What objects are there in my data source?
When you want to add a new data source to the search, you should always consider what content is interesting for the users.
This example uses CMIS services as a data source. CMIS offers four different types of objects: folders, documents, relationships and policies. Here only documents are sent.
How are objects sent? Which process takes care of it?
Mindbreeze uses crawlers to send objects to the index. A crawler knows the data source and sends the contained objects for indexing. There’s a crawler for each data source type. For Mindbreeze InSpire there are, among others, a Microsoft Exchange crawler and a Microsoft SharePoint crawler. We offer the same plug-in interface that we use for our crawlers in our SDK.
As a first step you should package the example crawlers as a plug-in and load it onto your appliance. Right click on the file build.xml and select Run As > Ant Build
.
In the directory build
the plug-in archive cmis-datasource.zip
is set up.
Now the plug-in needs to be saved in the appliance. To do this open the Mindbreeze configuration user interface and go to the tab Plug-ins. Select the Zip file and confirm with “Upload”.
The plug-in is now installed.
Now set up an index and add a new data source.
Send an object
The most important class for the CMIS crawler is mindbreeze.example.cmis.crawler.CmisCrawler
. This implements the plug-in interface for crawler com.mindbreeze.enterprisesearch.mesapi.crawler.Crawler
:
public interface Crawler { public void init (Configuration configuration); public void performCrawlRun(FilterAndIndexClient client) throws Exception; public void shutdown(); }
The init
method serves as the preparation of the crawler for the crawler runs. Here databases, for example, can be built up. The counterpart is the shutdown
method – this is invoked before the ending of the crawler.
performCrawlRun
is invoked for every crawl run. The objects are indexed by the data source. The call
client.filterAndIndex(indexable);
sends an object.
The Indexable
object collects the relevant data for a transmitted object. The following minimal example prepares an object for the indexing:
Indexable indexable = new Indexable(); indexable.setKey("1"); indexable.setTitle("my title");
The key
sets the key for the object in the data source. The title
sets the title of the object.
You can now already search in the Mindbreeze web client for the indexed document.
Important: When defining the key you should be aware that the crawler can be configured for multiple data sources of the type. This example can index objects from CMIS data sources. Multiple crawlers can be configured, for example for Fabasoft Folio Cloud Austria and Fabasoft Folio Cloud Germany. So that an object can be uniquely identified across multiple data sources, the key in the index consists of 3 parts. The type of the data source (category), the data source (category instance) and the key. The category is set for the plug-in. The category instance can be configured in the configuration user interface:
For the example you can use the standard settings. However, you have to be aware that when creating a further data source a different category instance can be configured, since otherwise objects from e.g. Fabasoft Folio Cloud Austria will overwrite objects from Fabasoft Folio Cloud Germany.
Displayed date and change date
In order to set the displayed date for the indexed object use setDate
.
indexable.setDate(cmisDocument.getCreationDate());
This is also the date that is used if you sort according to date.
In addition to the displayed date there is also the change date of the document. This date is used to determine whether a document has been changed or not.
indexable.setModificationDate(cmisDocument.getLastModificationDate());
All data values must be converted into UTC before being sent. Only this ensures that the values are correctly displayed and that there are no problems with time zones or summer/winter times.
Send contents and files
In the previous section a single object was indexed with title and key. Mindbreeze offers a very easy possibility for the indexing of an object’s content. You can send existing files directly and let the Mindbreeze Filter Service extract all the content from the document. The methods setContent
and setExtension
are important for this. With setContent
you transmit the bytes of the content. With setExtension
you define the file type of the content. If you enter .doc here, for example, and the word document data contains content then the filter automatically extracts the content and additional information such as the author.
But you can also simply send a string as content.
indexable.setContent("content".getBytes("utf-8")); indexable.setExtension("txt");
To sum up, an object’s content is determined with the properties content
and extension
. The content’s data is transmitted in content. The extension txt means that the content is viewed as a text document.
Index all CMIS documents
The example project runs through all available folders and invokes the method buildIndexable
for each document. An indexable object is then created with the desired values.
private Indexable buildIndexable(Document cmisDocument) throws IOException { Indexable indexable = new Indexable(); indexable.setKey(cmisDocument.getId()); indexable.setTitle(cmisDocument.getName()); indexable.setDate(cmisDocument.getCreationDate()); indexable.setModificationDate(cmisDocument.getLastModificationDate()); byte[] content = getContentBytes(cmisDocument.getContentStream()); indexable.setContent(content); String extension = getExtension(cmisDocument); indexable.setExtension(extension); return indexable; }
The search in the Mindbreeze Web Client now delivers all the demo user’s documents in Fabasoft Folio Cloud.
Index and search properties
The example currently indexes the title, content and date. But Mindbreeze offers the possibility to save as many properties per object as you want. Properties are specified with indexable.putProperty
.
indexable.putProperty(NamedValue.newBuilder() .setName("name") .addValue(ValueHelper.newBuilder("value")) );
In order to map a list value you simply need to invoke addValue
multiple times:
indexable.putProperty(NamedValue.newBuilder() .setName("name") .addValue(ValueHelper.newBuilder("value")) .addValue(ValueHelper.newBuilder("value")) );
The added properties are already contained in the index and can also be searched through. You can, for example, carry out a search for name:value and receive only those objects that have value set as the name.
Display of properties in the Mindbreeze client
So that the properties are also displayed in the Mindbreeze client, they need to be entered in the CategoryDescriptor. Each property that should be displayed must be entered as metadatum
. You can specify the translation for the term via name
elements.
<metadatum id="name"> <name xml:lang="de">Name de</name> <name xml:lang="en">Name en</name> </metadatum>
After the CategoryDescriptor is changed the plug-in must be newly packaged and loaded.
Display of properties with SearchService
The SearchService delivers the values that are also displayed in the client service. Therefore you can also adapt and customise the CategoryDescriptor as described in section "Index an search properties". But you can also request properties for the search with addRequestProperty
.
QueryExpr query = QueryExpr.newBuilder().setKind(QueryExpr.Kind.EXPR_UNPARSED).setUnparsedExpr(userQuery).build(); SearchRequest searchRequest = SearchRequest.newBuilder() .setRankingStrategy(RankingStrategy.RANK) .setDetailLimit(DetailLimit.CONTENT) .setUserQuery(query) .addRequestedProperty(PropertyDefinition.newBuilder().setName("title")) .addView(View.newBuilder() .setId(View) .setCount(10) .setRankingStrategy(mindbreeze.query.ViewProtos.View.RankingStrategy.RELEVANCE) ) .build();
You receive an identical value object to what you specified when indexing.
Instead of a simple Unparsed-QueryExpr you can also use a Labeled-QueryExpr. This allows you to define the name of the property separately to the search term.
QueryExpr query = QueryExpr.newBuilder() .setKind(QueryExpr.Kind.EXPR_LABELED) .setNamedExpr(Labeled.newBuilder() .setLabel("name") .setExpr(QueryExpr.newBuilder() .setKind(QueryExpr.Kind.EXPR_UNPARSED) .setUnparsedExpr("value") ) ).build();
This can be useful if, for example, there is an entry field for a property. Then the property name is fixed and the user entry is used as the search term.
Data types
The previous examples show how string values can be added. The Mindbreeze InSpire SDK automatically finds figures in such properties. You can, for example, set the property version to 12 and Mindbreeze interprets the value. You can then e.g. search for version: 12 TO 24. This function also enables the use of version with the value “Version 12”. The restriction expression can still remain the same.
Figures can also be specified as a numerical value. This allows the section search to be accelerated and you receive a numerical value if you use the SearchService. With the following example you can index entire figures:
indexable.putProperty(NamedValue.newBuilder() .setName("version") .addValue( Value.newBuilder() .setKind(Value.Kind.INTEGER) .setIntegerValue(12) ) );
So that the quantity values can be searched through, the attribute regexmatchable
must be set to true
in the CategoryDescriptor.
Version
The search is possible with version:[12] or version:[10 TO 20].
In addition to simple numerical values, date values are also possible:
indexable.putProperty(NamedValue.newBuilder() .setName("mydate") .addValue( Value.newBuilder() .setKind(Value.Kind.QUANTITY) .setQuantityValue(Quantity.newBuilder() .setKind(Quantity.Kind.TIME) .setUnit(Unit.MS_SINCE_01_01_1970_00_00) .setIntValue(Indexable.getCalendarFromDate(new Date()).getTime().getTime()) ) ) );
The search mydate:[2003-01-01 TO 2023-01-01] restricts the results to between 2003 und 2023.
The search documentsize limits the size of the searched document:
indexable.putProperty(NamedValue.newBuilder() .setName("availablespace") .addValue( Value.newBuilder() .setKind(Value.Kind.QUANTITY) .setQuantityValue(Quantity.newBuilder() .setKind(Quantity.Kind.INFORMATION_STORAGE) .setUnit(Unit.BYTE) .setIntValue(1024 * 1024) ) ) );
Facets
Facets allow the user to easily refine the search results using the content of the index.
So that indexed properties can be used as facets, the attribute aggregatable
must be set to true
for the property in the CategoryDescriptor.
Important: Only string values are currently available to facets.
Facets are automatically displayed in the Mindbreeze client but can be deactivated with the option aggregations.
In the search service the same facets are available as in the client service. But you can also request any facets you want with addAggregation
for a View
.
… View.newBuilder() .setId("View") .setCount(10) .addAggregation("name") …
The facets are contained as a list of aggregation objects in the ResultSet.
<% for (Aggregation aggregation : resultSet.getAggregationList()) { if (aggregation.getEntryCount() == 0) continue; %> <div class="facet"> <div class="facetName"> <b><%="mes:date".equals(aggregation.getName())? "Datum" : aggregation.getName()%></b> </div> <div class="facetEntries"> <% for (Aggregation.Entry entry : aggregation.getEntryList()) { byte[] queryExpr = entry.getQueryExpr().toByteArray(); String facetQuery = Base64.encodeBase64URLSafeString(queryExpr); %> <div><a href="<%=Nav.self(request, "start", "0")%>&facet=<%=facetQuery%>"><%= entry.getName() %> (<%= entry.getCount() %>)</a></div> <% } %> </div> </div> <% } %>
Every characteristic of the facets contains the value of the property, the number of the objects and a QueryExpr object for the restriction of the documents. You can use this QueryExpr object when searching to restrict the search results.
QueryExpr.And.Builder constraints = QueryExpr.And.newBuilder(); String[] facets = request.getParameterValues("facet"); if (facets != null) { for (String facet : facets) { if (facet.isEmpty()) continue; QueryExpr queryExpr = QueryExpr.newBuilder().mergeFrom(Base64.decodeBase64(facet)).build(); constraints.addExpr(queryExpr); } } if (constraints.getExprCount() > 0) { searchRequest.setQueryConstraints(QueryExpr.newBuilder() .setKind(QueryExpr.Kind.EXPR_AND) .setAndExpr(constraints) ); }
Download
Access the Mindbreeze InSpire SDK download area.