SQL Replicator stops syncing in both Live or Scheduled modes, 18.1

SUGGESTED

Hey everyone,

First time poster here because we are in desperate need for assistance. Here's the rundown...

- We had been using Sage 17.1 for 2 years until Nov. 2019. With it, we have the SQL Replicator running all the time, and though  "Scheduled" mode was the only option with 17.1, it would replicate close to real time (every 15 mins best guess). I did not set this up, but it ran beautifully for 2 years with no issues.

- We use replicator to send PSQL data to a SQL database that we run BI reports off of via ODBC to gather important accounting information.

- We upgraded to Sage 18.1 in Nov., at which time, Sage added the "Live" replication option. I thought this was weird because ours of seemingly live already, but based on Sage's recommendation, we set the replicator to "Live" replication.

- Since this upgrade, we've had NOTHING BUT PROBLEMS with the replicator. At first, we found it failed to replicate the database fully, and we were missing data. After a week and endless hours of working with Sage support, we successfully rebuilt the SQL database and got replication to work Live. After about a week, we found replication would randomly stop and we'd have to manually restart the replication service (which takes up to 60 mins to fully come back) multiple times a day.

- We have since tried to change it to Scheduled replication mode and have it run every 30 mins, and this worked for a day, then all of the sudden, it stops working again. The odd thing about this setup is that it will try to sync every 30 mins, but not always get data into SQL. Sometimes the sync works, other times, it does not. For example, yesterday, no new data made it into SQL, but today as of 9:30am, things finally started to populate. The replicator sync logs show NO ERRORS...

Our goal is to replicator our changes Live, maximum amount of wait time being 30 mins for new changes. Does anyone have any recommendations? Anyone else experiencing this headache?

Steven

Parents
  • 0
    SUGGESTED

    Good day everyone! Sorry I've been MIA on this post for 2 weeks as I've been slammed scheduling-wise. I've been working with Suron Woods (awesome guy to work with) from Sage off and on regarding this issue, and for everyone's knowledge, I'd like to outline what we've done so far and what issues we're still having.

    Fixed issues pertaining to our setup:

    1. Suron pointed out some authentication struggles with our setup. I'll do my best to remember all the details but what we ended up discovering is that if you have a Sage account that matched a Windows Auth user account in SQL, these two accounts will conflict, causing some replication issues. For example, if a user's Sage account name is JDoe and their domain login is also JDoe, SQL Replicator tries to create a SQL login matching the Sage username, but if there's already a Windows Auth account in SQL matching username (even with the domain\ in front of the name), it sees it as the same account and fails to create the account, thus causing authentication issues. Our fix for this was to change the name of the 1-2 users matching this case within Sage itself. We only had to do this for users that needed to authenticate with the SQL database. Those that used Sage but never needed to pull data from SQL did not need to be touched.

    2. Our PSQL database was about 10GB. Replication to SQL will expand this data due to overhead in the replication process, so it's expected the size of SQL is 1.6 - 2.0x the size of your PSQL database. In our case, our 10GB PSQL database grew to be 45GB in SQL! This was the first indicator something was wrong. Suron had to go in, delete the old database (we detached it just in case rather than deleting it), clean out some other Sage-specific things in SQL (not sure on the specifics), then kick off a fresh replication into a fresh SQL database. This ran overnight and the following morning, we had a ~14GB database instead of a 45GB one.

    3. Suron installed a very new tool that monitors the health of replication, detects if there's been no changes after 20 mins, then will work to restart the replication service on its own to make sure it continues without manual intervention. He stressed this is super new and still in testing, so this is more of a band-aid than a fix, but I wanted to at least mention it.

    4. Terry, regarding your instruction to upgrade from 18.3.1 to 18.4.1, we ended up not doing this because according to Suron, there were no updates to the replicator between versions, so there was no reason to update. The version we're on is the most reliable at this time, and many of Suron's other clients he worked with have said upgrading to 18.3.1 have fixed a lot of their issues already.

    Remaining issues we're having:

    1. Since doing the above three fixes, our Live replication has been much more reliable, however we're not out of the woods yet. I'm still finding the SQL replicator switching from Live replication mode to Scheduled mode on its own every 2-4 days due to "Downshifting" when Live replication can't catch up. We don't know why it's downshifting... could be CPU, RAM, network... something. Our server is a VM with 8 vCPUs, 32GB RAM, and 10k SAS drives in RAID6. Our SQL database is right on the same server, so network speeds are not a factor. I doubt we are running out of resources for replication to run, so we're working to see why this is happening. The health monitor Suron installed is not helping this situation as the monitor only checks to confirm Live replication isn't hanging up, and since the mode will shift to Scheduled after a downshift, the monitor stops monitoring the replication status.

    What the future holds:

    - Side note for everyone, Suron let me know that their developers are working on releasing "SQL Replicator 2.0" in the next few months (take that with a grain of salt, it's all uncertain right now). This 2.0 version is a complete rewrite of the existing replicator with the goal of fixing all of our problems with the current one. The stakes are high on this version fixing things because Suron also mentioned to me that it's very unlikely we'll ever get a full SQL version of Sage 300 CRE because the original code is so poor. They would need to rewrite the entire problem, which they're not going to do. Instead, they're working on a cloud-only version of Sage 300 CRE. This new SQL replicator is as close as we're going to get when it comes to CRE being SQL-based.

    I'll update this as things unfold, but so far, that's the status of everything.

    PS: I didn't proof read, I spend more time on this than I thought I would, sorry if some things don't make sense, haha.

Reply
  • 0
    SUGGESTED

    Good day everyone! Sorry I've been MIA on this post for 2 weeks as I've been slammed scheduling-wise. I've been working with Suron Woods (awesome guy to work with) from Sage off and on regarding this issue, and for everyone's knowledge, I'd like to outline what we've done so far and what issues we're still having.

    Fixed issues pertaining to our setup:

    1. Suron pointed out some authentication struggles with our setup. I'll do my best to remember all the details but what we ended up discovering is that if you have a Sage account that matched a Windows Auth user account in SQL, these two accounts will conflict, causing some replication issues. For example, if a user's Sage account name is JDoe and their domain login is also JDoe, SQL Replicator tries to create a SQL login matching the Sage username, but if there's already a Windows Auth account in SQL matching username (even with the domain\ in front of the name), it sees it as the same account and fails to create the account, thus causing authentication issues. Our fix for this was to change the name of the 1-2 users matching this case within Sage itself. We only had to do this for users that needed to authenticate with the SQL database. Those that used Sage but never needed to pull data from SQL did not need to be touched.

    2. Our PSQL database was about 10GB. Replication to SQL will expand this data due to overhead in the replication process, so it's expected the size of SQL is 1.6 - 2.0x the size of your PSQL database. In our case, our 10GB PSQL database grew to be 45GB in SQL! This was the first indicator something was wrong. Suron had to go in, delete the old database (we detached it just in case rather than deleting it), clean out some other Sage-specific things in SQL (not sure on the specifics), then kick off a fresh replication into a fresh SQL database. This ran overnight and the following morning, we had a ~14GB database instead of a 45GB one.

    3. Suron installed a very new tool that monitors the health of replication, detects if there's been no changes after 20 mins, then will work to restart the replication service on its own to make sure it continues without manual intervention. He stressed this is super new and still in testing, so this is more of a band-aid than a fix, but I wanted to at least mention it.

    4. Terry, regarding your instruction to upgrade from 18.3.1 to 18.4.1, we ended up not doing this because according to Suron, there were no updates to the replicator between versions, so there was no reason to update. The version we're on is the most reliable at this time, and many of Suron's other clients he worked with have said upgrading to 18.3.1 have fixed a lot of their issues already.

    Remaining issues we're having:

    1. Since doing the above three fixes, our Live replication has been much more reliable, however we're not out of the woods yet. I'm still finding the SQL replicator switching from Live replication mode to Scheduled mode on its own every 2-4 days due to "Downshifting" when Live replication can't catch up. We don't know why it's downshifting... could be CPU, RAM, network... something. Our server is a VM with 8 vCPUs, 32GB RAM, and 10k SAS drives in RAID6. Our SQL database is right on the same server, so network speeds are not a factor. I doubt we are running out of resources for replication to run, so we're working to see why this is happening. The health monitor Suron installed is not helping this situation as the monitor only checks to confirm Live replication isn't hanging up, and since the mode will shift to Scheduled after a downshift, the monitor stops monitoring the replication status.

    What the future holds:

    - Side note for everyone, Suron let me know that their developers are working on releasing "SQL Replicator 2.0" in the next few months (take that with a grain of salt, it's all uncertain right now). This 2.0 version is a complete rewrite of the existing replicator with the goal of fixing all of our problems with the current one. The stakes are high on this version fixing things because Suron also mentioned to me that it's very unlikely we'll ever get a full SQL version of Sage 300 CRE because the original code is so poor. They would need to rewrite the entire problem, which they're not going to do. Instead, they're working on a cloud-only version of Sage 300 CRE. This new SQL replicator is as close as we're going to get when it comes to CRE being SQL-based.

    I'll update this as things unfold, but so far, that's the status of everything.

    PS: I didn't proof read, I spend more time on this than I thought I would, sorry if some things don't make sense, haha.

Children
  • 0 in reply to GraniteStateGlassSage

    Hello, thanks for your long run down of what you went through.  We found the same issue as you with Sage/Windows accounts but only for SQL accounts we created manually. Replicartion faield if there was a dupe account. 

    The replicator does create a SQL account with the same exact name as our users Windows accounts during replication.  Unsure if you read the related security issue we have seen and confirmed that when a replication happens it disables any SQL created accounts except the SA account.  So if I go into SQL Manager and create an account out side of Sage, during the next replication that account is disabled and cannot be reused as I have not figured out how to enable it again.  Even if you delete it and try to create a similar account in Sage the Sage replication will fail and say the old enabled account is still there when it is not.  In addition we have found the replication only properly creates SQL accounts on first replication.  On any subsequent replications where a new Sage user was created it creates a matching SQL account but does not properly apply the password nor does it give it the proper rights to the replicated DB.  One has to go into SQL Manager and manually set these properties on the new user.  There is a statement in the Replicator instructions that speaks to something similar to this saying a new user must log into Sage before replication but I have tried that and it did not help the situation. 

    We are on 18.4 so even though you were told it is no different it could be different.  The issues I have were confirmed to exist in 3 other 18.4 replication scenarios (one person spoke about it here...Jstone).  So if you have your replication and security working I would not upgrade to 18.4.