For day-to-day work, document storage is typically done through online productivity software and cloud storage. It gets more challenging as an application needs to process, store, and retrieve larger volumes. Using an Electronic Document Management System (EDMS) is a better solution, as they are designed to store, index, and retrieve documents with high performance and availability, and some include features like customizable metadata and version control.
While there are many SaaS-based EDMS solutions available, you can deploy your own open source EDMS to maintain complete control over your data. In this post, you’ll learn how to set up a highly-available Mayan EDMS backed by a PostgreSQL database.
EDMS Benefits
This setup is ideal if you store and process a large number of documents and need an EDMS that is attached to a web-based application, removing the need for any client side installations. Running an EDMS as a central hub ensures:
- security, privacy, and total control of your data;
- easy integration with third-party software; and
- automation of document workflows for business processes.
Why PostgreSQL?
PostgreSQL is a powerful, open source object-relational database management system that is highly valued for its scalability, security, and performance. In order to support end-to-end scaling for your application, your database also needs to be highly available, so this architecture example incorporates a replication tool specifically for PostgreSQL.
Getting Started with Mayan EDMS
Mayan is a web-based-based open source EDMS written in Python. Mayan defaults (by design) to installing and running on a single system; all of your application and database components can live on a single server or within several Docker containers. Though this is great for testing or trivial environments, for a production environment we want high availability and a widely known and adopted concept known as the SoC (Separation of Concern) principle. This is crucial best practice for building layered and scalable applications. This reference architecture demonstrates how to do that with Mayan.
Pros
- Open source means no licensing fees
- Easily store, view, and revert document versions
- Full text search of documents using customizable user-defined metadata
- Flexible access controls to design effective user roles and permissions
- Customizable workflows with event triggers to keep documents up to date
Cons
- Complex for smaller use cases
- User interface is less intuitive than other solutions
- Resource heavy for CPUs running optical character recognition (OCR)
Application Reference Architecture
To optimize Mayan’s capabilities in a real-world applications, our architecture utilizes:
- NGINX: Web server
- Prometheus & Grafana: Monitoring and observability tools
- PostgreSQL: Database
- Bucardo: PostgreSQL bi-directional database replication
- Linode Object Storage: S3-compatible and highly available storage
- keepalived: IP failover
A NodeBalancer distributes traffic to our application nodes. If one application server goes down, the load balancing service will begin only directing traffic to the healthy node. As soon as the unhealthy node recovers it will resume balancing connections as before. This makes it easy to add, remove, or update application servers without downtime, all while maintaining connections to the PostgreSQL database nodes.
For the “brain” of the application, Mayan and NGINX are deployed on the same virtual machines and we can leverage Mayan’s support for s3boto3 as a storage backend to upload our documents to Linode’s S3-compatible Object Storage.
If your application is mission-critical and uses PostgreSQL as a primary backend database, incorporating Bucardo provides a better uptime guarantee and makes your database fault-tolerant.
You can also achieve high-availability and replication with a managed database service that supports PostgreSQL, but keep in mind that most DBaaS offerings focus on updating PostgreSQL versions and keeping your database cluster online and available. Implementing Bucardo gives your PostgreSQL database bi-directional replication between two or more database nodes, ensuring that your database is highly available.
In this example, all nodes are secured with Cloud Firewalls for protection from the public internet and communicate internally via private VLAN. The application servers connect to the databases via a shared floating VLAN IP address with keepalived to facilitate failover.
Keepalived, or another IP failover system like FRRouting (FRR), is implemented at the database level so that a healthy database node will be connected to the cluster of your application nodes.
Achieving Fault Tolerance for Critical Files
An EDMS will often serve as a central hub for day to day operations and host some of your organization’s most critical files. Our application is built with redundancy at every level for baseline fault tolerance and optimize performance:
- Documents are stored on Linode’s highly available Object Storage.
- The database is on a separate node to increase performance and prevent having a single point of failure.
- Bucardo performs automatic database replication between the Postgres nodes.
Explore More Technical Content and Architectures
Our Solutions Engineering team shares frameworks, guides, and tools like this one to make it easier for developers to build applications that follow best practices for software architecture. Check out our Galera cluster reference architecture for a highly available MySQL/MariaDB architecture, or browse our available reference architecture examples on Linode Docs.
Comments (2)
How much those it cost to implement the mayan edms in a month and in a year.
Your swift response is best appreciated
If you’re using the Terraform script in our guide , you will deploy four 2GB compute instances ($48.00) and an Object Storage Bucket ($5.00). Additionally, as mentioned in the guide, you will want to deploy an additional node for Prometheus and Grafana ($5.00) as well as a NodeBalancer ($10.00). These services together would be roughly $68.00/month before taxes. This is assuming the amount of data your Object Storage was not more than 250GB and you stayed within your Network Transfer Allowance. Again, based on these assumptions, your yearly cost would be roughly $812.00.
You have the option to edit the Terraform script and change the default compute instance to a Nanode, however, I can’t guarantee the performance of the deployment with that plan.