***************************
Update 2
This case is now closed. Only advice that I can give at this point is be VERY careful when you are updating the Certificates in the MOB. If you do it wrong, you will be rebuilding your environment.
The below notes do work for fixing the MOB just make sure you know exactly what your syntax is supposed to be and know which Certificates you are replacing...
Back to the drawing board for me.....
*****SIGH*****
$#@%^&*&%$##^%^&!!!
***************************
UPDATE!!!
After going through all of this again (time has finally permitted me to get a back to this). I have finally got MOST of this working. I say most because I reset my PSC and vCenter Certs to the same thing and now I have to call support to see if I can change this! *Yes, I am an idiot* I will update again as I get this last part figured out.
VMware has updated KB2121701 so many times over the last couple of months that they must really be sick of that page.
The ROOT of the problem is the certificates that are associated with the PSC and the vCenter servers. If they get changed, for some reason VMware does not change them properly in the MOB *or where ever the info is stored* and it then falls on you the Admin to be clever enough to know this is the issue......
If you follow the instructions (very carefully I might add) it gives you instructions how to view what the current certificate that the MOB has listed for your PSCs and your vCenters , how to download a copy of those Certs to get a Thumbprint, how to download your current Certificates, and finally how to use the ls_update_certs.py * which you have to install a new one from the KB article* script to modify what is in the MOB pages. Below is the example from the article of the scripts you will run. I want to point out if you have multiple PSCs and vCenters you will need to do this for ALL of them! You also have to run this from the PSC server.
%VMWARE_PYTHON_BIN%" ls_update_certs.py --url https://psc.vmware.com/lookupservice/sdk --fingerprint 13:1E:60:93:E4:E6:59:31:55:EB:74:51:67:2A:99:F8:3F:04:83:88 --certfile c:\certificates\new_machine.crt --user Administrator@vsphere.local --password Password
You would need to do the above for:
1.) Your Production PSC *get the thumbprint for the old cert and download the new cert to a central location*
2.) Your Production vCenter *get the thumbprint for the old cert and download the new cert to a central location*
3.) Your DR PSC *get the thumbprint for the old cert and download the new cert to a central location*
4.) Your DR vCenter *get the thumbprint for the old cert and download the new cert to a central location*
I don't know how to make KB 2121701 easier to read but there has to be a way....it is a wealth of knowledge but....it is not easy to obtain that knowledge!
****************************
I am trying to love VMware vSphere 6 and Site Recovery Manager 6 (SRM). I am trying to show my confidence in VMware. It's not working though.....and I know, I broke the cardinal rule of IT “never adopt early.”
Update 2
This case is now closed. Only advice that I can give at this point is be VERY careful when you are updating the Certificates in the MOB. If you do it wrong, you will be rebuilding your environment.
The below notes do work for fixing the MOB just make sure you know exactly what your syntax is supposed to be and know which Certificates you are replacing...
Back to the drawing board for me.....
*****SIGH*****
$#@%^&*&%$##^%^&!!!
***************************
UPDATE!!!
After going through all of this again (time has finally permitted me to get a back to this). I have finally got MOST of this working. I say most because I reset my PSC and vCenter Certs to the same thing and now I have to call support to see if I can change this! *Yes, I am an idiot* I will update again as I get this last part figured out.
VMware has updated KB2121701 so many times over the last couple of months that they must really be sick of that page.
The ROOT of the problem is the certificates that are associated with the PSC and the vCenter servers. If they get changed, for some reason VMware does not change them properly in the MOB *or where ever the info is stored* and it then falls on you the Admin to be clever enough to know this is the issue......
If you follow the instructions (very carefully I might add) it gives you instructions how to view what the current certificate that the MOB has listed for your PSCs and your vCenters , how to download a copy of those Certs to get a Thumbprint, how to download your current Certificates, and finally how to use the ls_update_certs.py * which you have to install a new one from the KB article* script to modify what is in the MOB pages. Below is the example from the article of the scripts you will run. I want to point out if you have multiple PSCs and vCenters you will need to do this for ALL of them! You also have to run this from the PSC server.
%VMWARE_PYTHON_BIN%" ls_update_certs.py --url https://psc.vmware.com/lookupservice/sdk --fingerprint 13:1E:60:93:E4:E6:59:31:55:EB:74:51:67:2A:99:F8:3F:04:83:88 --certfile c:\certificates\new_machine.crt --user Administrator@vsphere.local --password Password
You would need to do the above for:
1.) Your Production PSC *get the thumbprint for the old cert and download the new cert to a central location*
2.) Your Production vCenter *get the thumbprint for the old cert and download the new cert to a central location*
3.) Your DR PSC *get the thumbprint for the old cert and download the new cert to a central location*
4.) Your DR vCenter *get the thumbprint for the old cert and download the new cert to a central location*
I don't know how to make KB 2121701 easier to read but there has to be a way....it is a wealth of knowledge but....it is not easy to obtain that knowledge!
****************************
I am trying to love VMware vSphere 6 and Site Recovery Manager 6 (SRM). I am trying to show my confidence in VMware. It's not working though.....and I know, I broke the cardinal rule of IT “never adopt early.”
VMware has been my favorite technology for
a long time! I drank the Kool-aide and in my mind there is not another company
that is doing the kinds of things that they are. Let's face it though....nobody
is perfect.
I have now had a case open with them since
June 8th about Site Recovery Manager 6 and vCenter 6, about 2
months. I have talked to some great techs there at VMware, but to me I am
beginning to sense that there is a lot of confusion among their ranks about the
new products. I have had techs tell me that I had to have the same certificate
for both the protected and recovery site in order for things to work, and yet
their install and configure manual clearly says different. I have had
technicians that did not know what the VMCA is and what the function of it was,
going as far as to tell me that I needed to do individual certs for each of my
vCenter servers, Platform Services Controller (PSC) servers and my ESXi
servers. I still have not gotten a good answer as to if SRM and the VMCA work
together or if they will sometime in the future. Heck, the first month of my
case was spent calling and begging their support team to call me back, it wasn’t
until my VP called and started screaming that I started getting any serious
traction on the case.
The frustrating part? I have done a bog standard install of SRM. I
have setup my environment with VMware’s best practices. I have even gone so far
as to ask the technicians to verify the install.
The PSCs are External. The vCenter Servers and the SRM servers are
stand-alone VM servers. I made my VMCAs
into a subordinate Certificate Authorities to my in-house Certificate Authority
so that all of my clients would trust the sites and we would not have issues.
It is exactly as VMware shows it in a standard Two-Site Topology
with one vCenter Server instance per Platform Services Controller (PSC).
My issue?? Here goes, when
I go to Site Recovery>Sites from my production server, I immediately
get the below message:
Error: Failed to connect to Lookup Service at HTTPS://DRPSCSERVER.DOMAIN.COM:443/lookupservice/sdk.
Reason:
com.vmware.vim.vmomi.core.exception.
CertificateValidationException: Server certificate chain not verified.
Simple right? My certificates on my vCenter must not be trust that
PSC chain right? One of the servers must not be have the chain or the
certificate for the DR site….but they do. VMware has verified they do. I can go
to the DR PSC server from my Production vCenter Server and it shows the site as
trusted…
Now, if I try the same exact thing from the DR side what happens
you ask? Same exact thing, but the error message says that it certificate chain
is not valid for the Production PSC server. Which is really weird….because I
can see both vCenter servers on both the Production and DR sites. Oh, and once
again I can go to the Production PSC from the vCenter server and it shows the
site as well.
Ahh….so it must be the PSCs don’t trust each other…..NOPE. I can
go to each of the PSCs and they both trust the other.
Well so that leaves the SRM servers right? One of them must be the
culprit. Well, as before …the vCenter servers all look trusted, and so do the
PSC servers. The certificates that the SRM servers have are actually from the
parent CA. So they are trusted all the way through….
I am bumfuzzeled….
If anyone has any advice on this PLEASE speak up! Once I get a
solution I promise I will append it to this entry….
Further to my last comment.....can you also list any kb articles you've followed without succes. For example this one is similar to your issue but I'm not sure what you have tried so far.
ReplyDeletehttp://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2121701&sliceId=1&docTypeID=DT_KB_1_1&dialogID=720030279&stateId=0%200%20720038727
Thank you so much for the reply! There have been several articles that we have looked at ...problem is that at the time they could not be run on Windows based External PSC servers. KB2109074 and KB2121701 were two articles that we looked at previously. Interestingly enough KB2121701 has been updated again, two days ago with a file to download and replace the existing just like the article that you pointed out. My only concern now is if it going to work seeings how most of the certificates that we have going in house right now are all SHA256 not SHA1.
DeleteJust tried KB21211701 still no joy. The last time I looked at that article it said you could not run it on Windows External PSCs.
Delete........not sure where last comment went.......anyway sorry to hear of your issues please mail me SR number to lee at vmware dot com. I'll take a look into this with my SRM colleagues.
ReplyDeleteEchoing Lee's comment, according to a colleague of mine, you might want to have a look at KB 2121689 and KB 2121701, we are looking at re-writing those though.
ReplyDeleteValentin,
DeleteLooks like some of those articles have been edited within the last two days. I'm continuing to hope for a solution!
Glad I'm not the only one...I've been fighting with this all day
ReplyDeleteNope, and unfortunately I still don't have a solution. My VP at this point is beyond livid. It has been almost 3 months now that we have been trying to get an answer to this problem from VMware. I you find a way around it let me know please!
ReplyDeleteAny updates on this issue. I am having the same problem now.
ReplyDeleteI WISH!! I am getting weekly updates from SRM Tech Support saying that nothing further yet. The last one that I got they said "Still no fix from Engineering but there's been a lot of progress.
DeleteThanks was this last Thursday....
Sorry to hear you are having the issue. As promised I will update with a fix as soon as I have one!
Martin
Martin...I was able to fix this during this week. I followed the kb2121701. I am at 6.0 U1. This first time I tried it did not work. I rolled back to the snaps I took ahead of time. This time I did a more through investigation of the all the PSC entries in the MOB. I found more than one certificate on both sites that needed to be replaced. I when through both site PCSs and created the command to run for each certificate (with the fingerprint for each) I needed to replace ahead of time. The run those in each site without rebooting until I run them all on both sites. Then reboot the vcenter, srm, and psc VMs on both sites all at once. When everything came back up my SRM was back up and working in vCenter. I could see all the sites and protection groups, etc. It was a pain, but it fixes it.
DeleteHi CSCMan. Could you be more specific as to what exactly you have done? You say "each certificate", but which certificates are we talking about? If I follow the KB, they mention only one certificate, not multiple.
DeleteAny help would be much appreciated! I'm planning on going through the same KB one more time (already opened a SR with VMware).
Martin,
DeleteTry using a note++ and make sure after every 64th character of the certificate hit enter to start from a new line.
Martin, we've resolved this issue following the steps in VMware KB2121689. The only difference with last time was the administrator@vsphere.local password. It contained a special character, which the Python script didn't like. We temporarily changed the password and all was well. We did this for both sites and now SRM is working like a charm.
ReplyDeletePlease let me know if this helps. And if not, let me know where you are stuck, as I'm sure we can resolve your little issue!
Sweet, there might be hope then. I figured it would come from the community before it did anywhere else. Any clue as to what the special character was that was killing it? Was it an @ or a $ by chance?
ReplyDeleteWell, it was a question mark. Go figure. It simply made it impossible to even run the script. So the first time, we created a tempadmin@vsphere.local account, with a simpler password (only letters and numbers), but the script failed. I don't remember the exact error message we got, but Googling didn't give us any new insights.
ReplyDeleteDoes your password contain any character that isn't a simple letter or number? Then try changing it, it might help. If not, let me know. If we can tackle this issue with this KB article, then so can you :)
Same issue with SRM and certificate...
ReplyDeleteCause:
Failed to connect to vCenter Server at https://fqdn:443/sdk. Reason: com.vmware.vim.vmomi.core.exception.CertificateValidationException: Server certificate chain not verified
Currently working with support, but no solutions for now.
I did not have password issues with mine.....for me I found that I had more than one URL in the MOB that had an incorrect ssltrust on it...ie incorrect cert. So I ran the script on both sites for the different fingerprints to the correct certificate. Then rebooted the servers for both sides and the same time. I also needed to make sure I followed the step 9 and added the carriage return after the 64th character or it did not work.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteAh, so it was the carriage return! We simply imported the certificate in Windows and then exported it again. Probably does the same, because that's what helped us out with that step :)
ReplyDeleteMartin, did you ever come to resolution on this? Google brought me here. My certificates are proper, even an embedded PSC to keep things simple. Yet, I too get the dreaded "Failed to connect to lookup service" in SRM due to Server certificate chain not verified.
ReplyDeleteGetting tired of trial and error trying to fix what appears to be a bug, even in 6.0u1b :(
YES!!!
DeleteBut Ugg! What a pain in the rear it was!
The article that is listed above KB2121701 is what you have to follow. You have to follow it step by step to the letter too! Don't assume you already have anything already. The long and the short of it is when you install your PSC and vCenter, when you change your certificates from what they originally were, you have to go into the MOB and you have to modify the certificates that are being seen in there. Again, KB2121701 is going to be your best friend on this but OMG is it confusing! I have done this finally about 3 weeks ago and got it all working ....If you want to talk further about it let me know.
you can email me at mcwells1974 at h o t m a i l . c o m
It is a junk mail email account but I will make sure I check it.
Good to hear! Both KB2121701 and KB2121689 (embedded psc) indicate that this is resolved with 6.0u1b but not until you place the certificates again.
DeleteI'll give it another go & be sure to report back here.
Thanks!
Yeah....they lie.
ReplyDeleteYou can verify in the MOB what certificates are being used. If you save your certificate for the vCenter and open it in notepad, you can see what hash it is using and compare it to the MOB.
Well sir, I owe you a beer! What a rather irritating path to implementation, I'd much prefer the 5.5 way of doing certs as it was a pain in the ass, but at least you knew when it was going to work.
ReplyDeleteSo glad I ran across your blog, and I'm serious about the beer if I come across you at a future vmworld!
All I did was replace my machine cert at the HQ and DR site to get rid of the nasty certificate errors when using vSphere web client. My SRM seems paired still, but vSphere replication is no longer working. When I go to manage each vR server at https://ipaddress:5490 and go to the configuration page, enter the SSO credentials , save and restart service - I do get the pop up about trusting the new certificate and it does show the thumbprint. I obviously hit accept, but then after a few moments it says Bad exit code: 1 at the top in red. Also VUM (vSphere Update Manager) is broken as well!
ReplyDeleteAll I did was replace the machine (reverse http proxy) certificate with one generated by our MS CA as described in this excellent blog here:
http://www.virtually-limitless.com/certificates/replacing-or-implementing-ssl-certificates-in-vsphere-6/
We are running 6.0 update 2, and noticed all the KB's I've found that remotely resemble this issue with VR or other services say that it was an issue that was fixed with 6.0 1b, which we jumped right past (upgraded from 5.0).
Keith, Did you check the MOB website? Did you check which certificate it is trying to use for those functions? I used an in-house cert to make my PSCs into Subordinate CAs then from there I did the rest of the work on my SRM stuff.
ReplyDeleteOk found a "Known issue" for vSphere replication 6.0 that also seems to apply to vSphere replication 6.1.1 (that we are using). You apparently have to power off the vSphere replication appliance from vmware web client, and then power it back on. It does something with the registration of the ovf doing it that way, rather than restarting it through the web interface port 5490.
ReplyDeleteOnce this was done I was able to go back in the configuration page in the web interface and accept the certificate.
Hi Martin, I found this page looking for information on integrating my VMCA (default mode) with external PSC's to SRM 6. Can you provide any information on how to do this, or is it done automatically when installing SRM?
ReplyDeleteThe VMCA is completely a separate thing. As it has been a while since I have installed it I went looking again and this is what I found for the install
ReplyDeletehttps://www.derekseaman.com/2015/04/vsphere-6-0-install-pt-11-vmca-as-subordinate.html
I installed the VMCA as a subordinate to my MS AD integrated certificate server. I had to publish subordinate root certificates from the root CA. Is there something in particular you are needing help with?
https://kallesplayground.wordpress.com/2018/06/06/vmware-vcenter-6-5-u2-and-srm-8-1-server-certificate-chain-is-not-trusted-and-thumbprint-verification-is-not-configured/
ReplyDelete