An Incentro client using Alfresco on Google Cloud, which manages more than 50 million documents, needed improvements in the search and data model due to increased usage. We tell you how we solved it.
In the business world, efficient document management is crucial for success. An Incentro client, running an Alfresco installation on Google Cloud, currently manages more than 50 million documents with a complex data model. With the increased use of Alfresco, there were new requirements that demanded improvements in search functionality and changes to the data model.
In this article I explain how we dealt with these challenges by implementing sharding, achieving efficient reindexing and meeting customer expectations.
Context: Increased requirements
The client needed to implement several critical functionalities:
Search from different languages
Indexing of document content
Activation of search suggestions
Redesign of the data model
Reindexing challenge
To implement these functionalities and changes to the data model, a full reindex was required, which meant that during the process, searches would return inconsistent results.
Tests were carried out in different environments to estimate the time required for a full reindex. With the architecture in place at the time, the results showed unacceptable times, which would severely affect the customer's operation, as access to documentation is a key business point. It was therefore mandatory to find a more efficient solution.
Research
After investigating different architectures, the implementation of sharding was chosen, as the results obtained in the tests were promising. Sharding involves dividing the indexes into smaller, more manageable parts (shards), distributing the workload among multiple servers or nodes. This distributed architecture significantly improves reindexing times and the capacity to handle large volumes of data.
Analysis phase
As the maintenance window was very tight, we had to make sure that the proposed solution would be able to perform the reindexing within the time available. To this end, the sharding architecture was configured in different test environments. We meticulously monitored the indexing process times with the sharding and, once we had fine-tuned the configurations, we were ready to make the leap to Production.
Results and benefits
The implementation of sharding was a success, because it allowed us to reduce the reindexing time to more than a third of the time projected with the previous architecture, which enabled us to meet the established KPIs.
This architecture allows us to scale more easily to handle future increases in document volume and provides greater robustness and efficiency in the long term. Given that most customers perform massive recurring uploads, achieving stability and reliability of a key service like Solr is a great success.
An old acquaintance
By sharding, we were able to address one of the biggest problems that every Alfresco user has encountered during the operation of the service, Solr.
Alfresco is aimed at organisations with the need to manage large volumes of data, so content indexing is one of the biggest headaches for any installation. Thanks to the implementation of sharding, we have made a qualitative leap in the operation of the service, as we have observed that the system returns searches more quickly.
Conclusion
This Incentro success story with the implementation of sharding in Alfresco on Google Cloud highlights the importance of innovation and adaptability in the management of large volumes of documentation. Faced with significant reindexing challenges, we were able to design and implement an efficient solution that dramatically improved processing times and system functionality.
This approach not only met the customer's requirements, but also improved the stability and scalability of the system, ensuring that Alfresco can efficiently handle the customer's needs in the future.