I am having a difficult time getting memcached to work reliably. I have set it up successfully, and it works as expected for a few days, but eventually one node fails over to the other and then that one fails and can't fail back over again. This scenario eventually causes a Jamf Pro outage if left running long enough. I assume I need to adjust some of the service parameters, but I am not sure where to start. One challenge for me is that my nodes are AWS ElastiCache instances, and if you haven't used this service before, there are no logs and there is no SSH access, so I don't have much data to reference when trying to troubleshoot. My nodes have 2 CPUs and 15GB of RAM. My JSS is two Tomcat servers with 16 CPUs and 30GB of RAM. My MySQL server is 16 CPUs and 60GB of RAM. My JSS serves 36,000 Macs and our database is 20GB in size.
We are currently using an AWS ElastiCache instance for our memcached instance but have not really seen these issues. A few things you may want to double check is that memcached is configured correctly on each node. I know we ran into issues with IP address' and ended up just going with FQDN:PORT in the Tomcat/ROOT/WEB-INF/classes/dal/memcached.properties file. Also are all the nodes in the same VPC and can properly communicate with the ElasticCache instance? Only other thing I could think of is your clustering configuration may be having issues. If a node is failing over and not failing back it makes me think that Jamf is not seeing the cluster properly from that specific node.