Error running recon: Connection Failure

johnnasset
Contributor

With every policy that installs a package, we always have a handful of machines that will successfully install the package but fail to run a Recon post-install. These are not the same machines each time. Here is a sample log:

Executing Policy Adobe Flash 11.9.900.170...
Downloading Adobe Flash Player_11.9.900.170.pkg...
This package is a PKG or an MPKG, and the index.bom file is not found. Attempting to open the package as a flat package...
Downloading http://xxx.xxx.org/CasperShare/Packages/Adobe%20Flash%20Player_11.9.900.170.pkg...
Installing Adobe Flash Player_11.9.900.170.pkg...
Successfully installed Adobe Flash Player_11.9.900.170.pkg.
Running Recon...
Retrieving inventory preferences from https://xxx.xxx.org:8443/...
Locating accounts...
Searching path: /Applications
Locating package receipts...
Gathering application usage information...
Locating printers...
Locating software updates...
Locating plugins...
Error running recon: Connection failure: "The host xxx.xxx.org is not accessible."

If it was a network issue I would assume that the package would fail to install as well. Anybody else seeing this? We are on version 9.22.

92 REPLIES 92

Chris_Hafner
Valued Contributor II

I'll jump in on the fun. I see this form time to time but have attributed it to laptops being disconnected or put to sleep before being able to submit the report. I can't promise it but it's fairly minor in occurrence here:

9.63
AFP

DBrowning
Valued Contributor II

I keep getting: Error running recon: Connection failure: "The Internet connection appears to be offline." or Error running recon: Connection failure: "The network connection was lost."

This is happening after a script runs.

JSS 9.63

Achilles
New Contributor II

I am also seeing these issues.

Error running recon: Connection failure: "The host xxx.xxx.xxx.xxx is not accessible."
and/or..
Error running recon: Device Signature Error - A valid device signature is required to perform the action.
and/or..
Error running recon: Connection failure: "The request timed out."

Achilles
New Contributor II

Has there been any movement on this. I don't want to have to add a script to run recon?

Master = 10.9.5 / server 3.2.2 / JSS 9.6.2 mysql 5.6

Does upgrading to 9.65 fix this issue?

yellow
Contributor

Same here. And my JSS is fully up. And no 9.65 does not fix this issue.
I'm starting to wonder if it's the number of connections that tomcat or the database (is getting from tomcat) that is creating this error? Not accessible because it doesn't answer in a timely fashion?

Achilles
New Contributor II

i was thinking the same thing about to open the jus database and check the tomcat settings.

Achilles
New Contributor II

nope I'm at 151 allowed connections.

Chris_Hafner
Valued Contributor II

What is the frequency of these with everyone. In a daily inventory report I'll get say, 600 successful inventory reports and about 20ish failures. Those failures are generally a mix of:

Error running recon: Connection failure: "The host xxx.xxx.xxx.xxx is not accessible." and/or..

In which I am able to verify that those units are connecting through a non local network (whether it's a hotspot, or home Wifi).

The rest are those

"Error running recon: Connection failure: "The request timed out."

...which given the small numbers I'm seeing, I assume my laptop wielding user closed the laptop or otherwise disconnected form the network. I am NOT seeing the device signature errors. That said I am running MySQL 5.5 because of early issues we had with 5.6. This was about 2 years ago and I never felt the need to get back up there.

scottb
Honored Contributor

Is there a recommendations table for MySQL connections? I have one for Tomcat, but not MySQL...after 9.65, ours was set to minimum and I bumped it up a little as I thought we had done so previously.

Chris_Hafner
Valued Contributor II

I'm not sure about a table. I set Max threads in ROOT/WEB-INF/xml/DataBase.xml to 400 (401 in MySQL, that last one is for MySQL itself) and then 2.5 times that number for Max Connections in /Tomcat/conf/server.xm. So 1002 in my case.

yellow
Contributor

So, this is happening on desktops as well as laptops, so I don't think it's a sleep issue.

scottb
Honored Contributor

Thanks @Chris_Hafner - will use that to see what we have in there now as a guideline.
We currently manage ~765 Macs globally.

Achilles
New Contributor II

It looks like all the polices are working? its just when it tries to recon that I'm getting the failures across the board.. Laptops Desktops on wireless with network cable bound not bound doesn't matter and its not all the time only a few.

Achilles
New Contributor II

Im going thru all my polices and converting the builtin in update Inventory to - Files and Processes (Execute Command) jamf recon. I will see if that stops the false reporting. By the time I get in I have hundreds of "Failed" Policy's none of which actually failed..

Chris_Hafner
Valued Contributor II

@boettchs as a followup we manage a pretty similar number of units. It's anywhere between 650 and 750 OS X devices. When I talk to other engineers about this particular setup they tend to look at me funny. I do keep a very large number of potential connections and threads. It gives me the overhead to have a runaway process or memory leak so that I can catch it before the service bogs down.

Achilles
New Contributor II

Just upgraded to 9.65 and still having the same issue. Also added a script instead for the "update inventory" and I'm still getting these errors all day. Once I inspect the error the package installed and or the policy worked its only when its running the recon part? any help?

yellow
Contributor

I've been playing with threads, pool sizes, dB connections, changed from every15 to every30... Nothing really seems to make a difference. Still getting:

Error running recon: Connection failure: "The request timed out."
Error running recon: Connection failure: "The host hostname.company.com is not accessible."
Error running recon: Connection failure: "Could not connect to the server."

Yet during this time, there are multiple Macs imaging. The JSS is totally accessible. It's weird. The computer objects in Casper are indeed showing as connected at today's date, but they aren't getting an inventory update (which is obviously critical). Can't get accurate info for Smart Groups if I can't get recons to run.

yellow
Contributor

So, curiously, I might have found the issue, at least on my set up.

I've been through a jillion config on the backend with Tomcat & MySQL, changes to policies, changes to default check in rules, and nothing seemed to help. In going back to the email errors I was getting, it finally dawned on me that 95% of the errors were being generated from an "update inventory" policy that we had running, once a day, "Login, Check-in, Network State Change", All computers. This was a policy that had existed when I started working with this iteration of Casper when it was v8.. I thought nothing of it and it had been migrated into v9.. Since I've disabled it to test, all my errors of this ilk have gone away.

mpermann
Valued Contributor II

@yellow, if you've disabled that once a day update inventory policy, how are you doing your inventory updates on the computers? Did you create a brand new daily update inventory policy scoped to all computers or are you doing something different?

yellow
Contributor

Most of our policies already include a recon, for the moment I'm playing my odds to see how stale my info gets.

Chris_Hafner
Valued Contributor II

Well... couldn't you try creating a recurring policy that runs "jamf recon" at the same frequency as your former inventory policy? Actually, I'm going to swap the checkbox for the command on my recurring inventory just to test the difference.

yellow
Contributor

Just did.. mainly because the errors have stopped coming so fast and furious. I deleted the original and created a new one that I limited it to once a day and recurring check in only. We'll see how that goes.

Chris_Hafner
Valued Contributor II

... eh, scratch that. Looks like I already tried that some time ago and have been running that way for about the past year. I still see these errors FYI. Not a lot, but a few and usually off site.

Chuey
Contributor III

Anytime I get a connection failure but the JSS is available I just remove the framework and re-enroll and I don't seem to have any more issues with that machine. Running JSS 9.65

bollman
Contributor II

Hm, whatever happened here? We just migrated to our brand new Casper 9.72 setup using RHEL servers and JDS distribution.
There are a lot of errors coming in on the recon part, could be that recurring once a day or after running a policy. It's a tad bit annoying that the recon part fails as that makes the computer not leave the smart group and the policy excecuting again. So far nothing bad has happened more than some users asking why the same software might want to update more than once almost at the same time (like java, flash and so where we have user notification).
I have upped our MySQL and Tomcat settings to "a lot" (according to posts found here on JAMFNation) so there should be no memory problems, and the virtual server is on a big pipe so there should be no network problems either.
Did this go away magically for people or are you all just "coping"?

aamjohns
Contributor II

After updates (to JSS) I will go back and see if update inventory works. It still does not. So I am still using a script to do recon. That works. But update inventory is spotty at best so I cannot rely on it.

dmw3
Contributor III

We are getting similar random failures with the Update Inventory policy.

Although three of the computers sometimes failing are on the same network segment, have to check with the network people to see if anything in the switch logs that show why the dropped connection.

The error we are getting are below:

"Actions from policy log: Executing Policy Update Inventory... Running Recon... Error running recon: Connection failure: "The host jss-server is not accessible." The results of this policy were not logged at the time of execution. The actual execution time was Tue Jun 16 16:18:29 EST 2015."

and

"Actions from policy log: Executing Policy Update Inventory... Running Recon... Retrieving inventory preferences from https://jss-server:8443/... Locating package receipts... Locating accounts... Searching path: /Applications Locating printers... Gathering application usage information... Error running recon: Connection failure: "The request timed out.""

Either error will show up on only five out of 150 computers enrolled, but at random times. three of these computers are desktops on wired connections, the other two are laptops that can be either wired or wireless connections. If the update failed all the time i could track iit down, but it seems so random, nothing really shows in the JSS logs server or client.

bollman
Contributor II

Considering the quality of the campus network here, I cannot believe that this would actually be a network problem. We have around 800 computers at the moment and I'm seeing around 20-30 reports of failed recons every day.

nzmacgeek
New Contributor III

Agreed here.

I've bumped up the maxThreads in our tomcat instance to 1000 on both the connector/executors. I have also done some work to ensure MySQL's setup is optimal (turn off binary logging, ensure packet sizes are appropriate, etc). One could assume the issue is a network connectivity issue, but then the recon is sending an inventory report to the very same box that is accepting the policy log? ;-)

Our JSS is due for an upgrade to 9.73 this week, including updating Java, etc. I'll check back with you all if this still happens. We're on RHEL6.5, on a VM configured with two cores and 16GB of RAM.

dmw3
Contributor III

We are getting random computers giving similar errors on "update inventory, usually one of two errors;

"Error running recon: Unknown Error - An unknown error has occurred." or "Error running recon: Connection failure: "The request timed out."', any suggestions JSS 9.73

Running "Update Inventory" via Casper Remote on these computers always works.

bollman
Contributor II

This is still happening on 9.81, even on computer connected to Ethernet. So far I haven't really been able to catch it as it happens.

dpertschi
Valued Contributor

Back on my radar here, this issue is now causing me to develop a tic.

During patch management cycles every month, software-installs-but-recon-fails subsequently throws my compliance reporting off by a few percent (and YES it matters).

My inventory collection is pretty sparse, deleted unnecessary EA's but I'm loath to delete the 35 that are left and add them back one at a time. Where is the damn EA disable button anyway!

I can't find a way in the JSS to capture the recon failure and re-run it. Thinking now about a launch daemon that runs periodically looking for the failure in the jamf.log and re-run if found.

Thoughts, comments?

tcandela
Valued Contributor II

I am so fed up with this issue, was just about to create a new discussion until I seen this discussion already created.

I get this Error running recon: Connection Failure all the time, sporadically, no matter if the computer is on the local network or if the computer is running recon from off the local network (at home).

the package runs successfully but the ensuing 'recon' results in Error running recon: Connection Failure

makes no sense

were_wulff
Valued Contributor II

@dpertschi , @tcandela

When you get this error on the client, if you look at your JAMFSoftwareServer.log, do you happen to see any errors from around the same time as the failure on the client that start with, "Exception javax.xml.bind.UnmarshalException" at all?

Thanks!
Amanda Wulff
JAMF Software Support

tcandela
Valued Contributor II

@amanda.wulff where exactly is the JAMFSoftwareServer.log ? on the JSS itself ?

on the clients i usually look at /var/log/jamf.log

where do i go to look at JAMFSoftwareServer.log ?

thanks

Syracuse Orange Elite 8 !!

dmw3
Contributor III

@tcandela logs are on the JSS server

/usr/local/jss/logs

mscottblake
Valued Contributor

@amanda.wulff I am seeing some of the same issues. After talking with my TAM, we upgraded the RAM on the DP, thinking it might just not be able to handle the requests. (I'm also having issues with packages failing par-way through sometimes)

I had 5 failures overnight and during that same span, this is the Jamf Server Log:

2016-03-25 23:53:59,351 [WARN ] [Tomcat-184 ] [CRUDHelper               ] - User does not have UPDATE privileges for Package
2016-03-26 01:51:48,641 [INFO ] [oolThread-0] [icKeyInfrastructureHelper] - Refreshing pki information
2016-03-26 07:51:48,641 [INFO ] [oolThread-9] [icKeyInfrastructureHelper] - Refreshing pki information

were_wulff
Valued Contributor II

@mscottblake

Issues with a distribution point (and its amount of RAM) likely have nothing to do with Recon connection failures, so I'd suggest you keep working with your TAM to get them resolved; the error you posted is pretty clear in pointing out the problem you're having, however.

"User does not have UPDATE privileges for Packages" is pretty self-explanatory and points toward there being a permissions problem in your JSS.

Even if the user you're logged in as says it's an Administrator, we've seen permissions go strange sometimes for no good reason.
In those cases, switching the permissions set to Custom, saving, then setting it back to Administrator and saving tends to get it back to where it needs to be.

If the permissions are meant to be custom, just go and check and make sure that user has privileges to update packages.

If the user in question is not supposed to have permission to update packages, then the error is not an error at all and is expected behavior.

In this case, I don't think the problem you're seeing is related to what we're looking for in this thread; the error I was asking about in my post from 3/25 is looking for a very specific error that points to a very specific problem that can cause Recon failures.
If you're not seeing that exact "Exception javax.xml.bind.UnmarshalException" error in the JAMFSoftwareServer.log when a recon fails, then that particular issue isn't affecting you and it's likely something else that I'd recommend continuing working with your TAM on to figure out what's going on.

Amanda Wulff
JAMF Software Support

mscottblake
Valued Contributor

@amanda.wulff Thats actually the point I was trying to illustrate. I saw a handful of the same policy errors come through overnight, but there was nothing in the JSS Server logs to indicate any problems.

were_wulff
Valued Contributor II

@mscottblake

There are numerous things that can cause Recon failures; the one I was asking about specifically, that contains the "Exception javax.xml.bind.UnmarshalException" is a very specific one that is related to an open issue we're aware of, which is why I'd been asking specifically about that.

If that isn't what you're seeing in your JAMFSoftwareServer.log, then there is something else going on and your TAM will be able to dig into it a bit with you to figure out what's going on. Most times, if you cross-reference the time of failure in the jamf.log with the system.log you can find out what the client machine thinks happened.

The permissions error you'd posted in your comment, however likely doesn't have anything to do with Recon failures, as nothing in a Recon should be requesting privileges to update packages in the JSS. There may be a slim possibility that they're related (for example, if you have an EA that needs to update package information in the JSS), but that would require your TAM digging into it a bit deeper with you to figure it out.

Amanda Wulff
JAMF Software Support