- a end users complains he/she can't access blah-blah-blah, with the blah-blah-blah being a server , printer or some other resource on the network
- your firewall-guys shrugs their shoulders or are plain out; " not helpful" and quick to points the finger at you ( the networking team )
- the application-team, claims the server is up and happy, but that's about it, & leave me alone
- the location is a remote branch and you have no technical resource to login into the server for any nestat -an or any review of the local connectiontable status
- the end_user/client is leaning on you for a fix or direction or escalation, and they are frustrated
- you have no span, no port-tap, no sniffer, basically limited
- And everybody believes it's your problem, but offers little or no help from the rest of the IT staff in isolation of the issue
- your call out of bed at 02:00 due to this issue to look into the matter
Okay we all have been in the above situations & at one time/day in our careers , right ? Where we have a little bit of this going on ?
The above is a typical day of any IT network gal/dude day. And with out means to debug and diagnostic your network topology, we are stuck with the maze of finger pointing.
Have no fear,
I have tip that can just about make your life easier, & when you have different dept and teams that don't work effectively, or just independently, with bad collaboration, and/or a whole bunch of finger pointing.
Take this drwg that I drew up, and follow me along the journey of network diagnostic ,where you are blinded and limited in what you can do.
In this scenario, it's a simple VPN Firewall managed by an outside group that's a good player of how to "point the finger" and with a simple core ( your network gear ) and the server. Okay very simple in deed.
The business application server is managed by another team, that also points the finger. They are never around, and just plain not competent in their duties. I do a lot of debugging for them, because of this.
The end_user/client {172.16.17.15} is actual a payroll lady that updates payroll details or whatever she does on this server. It was working correctly last week, but today she's having problems. The firewall group in this case gave her a dedicated webvpn ip_address of { --> 172.16.17.15 } and the business app server that shes access, is located at { --> 172.16.17.88.22:356/tcp }. All she sees in her dashboard, is that she can't connect to the server. She upset and can't conduct her work.
Okay simple. or so we think ! But really it is :)
Since you want to quickly rule your network out , & as not the cause of the lack of connectivity and want to validate the firewall team hasn't made any changes towards the previous fwpolicies.
We have a way of doing this using just our cisco layer2/3 switches & with no attached sniffer.
It involves a little bit of hacking around with the cisco debug function, & against a access-list. This ACL that we will create, is NOT apply to any interface nor do we modify any existing ACLs.
Okay watch and learn;
1st:
I like to create a debug file for holding the data.This allows you to set this up on all switches/routers in the path of the end_user/client and the server(s).
You could let it run to logging buffer and/or remote-syslog, but a small file situation in flash will not hurt anything.
On my bigger switches like that of a 6500/7600 with a spare CF slot, I purposely install bigger CF-memory into these slots for backup iosimage, backups of the running-config, and for things like
what I'm posting here.
here's my small restricted debug file;
config t
-->
logging file flash:debug 4096 debugging
endNotice I made it 4096bytes and told it to log facilities messages debugging or higher. I could have made it bigger, but 4096bytes was ideal in this case.
2nd
Next, I validate the file with a cli dir cmd;
-->
dir flash:debug
Directory of flash:/debug
3 -rwx 42 Jan 11 2013 07:25:48
-04:00 debug
Its also a good thing to do a cmd "show logging" and validate the log services. You should see a line that look's like the following;
-->
show logging
( output reduce to show only the important information)
File logging: file flash:debug,
max size 4096, min size
03rd
Okay we are almost ready, & now the fun part. You craft a simple ACL.
Key things to think about during this process;
- be specific in your src/dst/protocol/port ( basically what are you looking for )
- DO NOT due a "ip any any"
- DO NOT apply this ACL to any interface ( it's not required for this level of diagnostic )
- Make pretty damm sure the number you picked is NOT being currently used, basically find a new un-used number in the extened ACL range
- You can not as of Cisco 12.4 ios-train, debug a named ACL
- install remarks in this ACL, like the case/ticket/date in the, so you or other engineers know what it's for , and if you should happen to leave it in accidently or for any extended time ( give you a clue later on so you don't being say WTF )
In my case and for this example, I was only interested in the TCP protocol and the 1st part of the 3-way tcp handshake the SYN.
If I can find and see the SYN in my debug messages, than I now know the clients connection is allowed & that I could also rule out the firewall team, and the network obviously was correct up to the point and at whatever position of my debug ACLs on either sw1 or sw2.
Here's my ACL;
-->
access-list 111 remark ticket326783 reqst pty-dept-bs-app Charlie O'riley 20130102
access-list 111 permit tcp host 172.16.99.55 host 172.16.129.10 syn
-->
ack Match on the ACK bit
dscp Match packets with given dscp value
eq Match only packets on a given port number
established Match established connections
fin Match on the FIN bit
fragments Check non-initial fragments
gt Match only packets with a greater port number
log Log matches against this entry
log-input Log matches against this entry, including input interface
lt Match only packets with a lower port number
neq Match only packets not on a given port number
option Match packets with given IP Options value
precedence Match packets with given precedence value
psh Match on the PSH bit
range Match only packets in the range of port numbers
rst Match on the RST bit
syn Match on the SYN bit
time-range Specify a time-range
tos Match packets with given TOS value
urg Match on the URG bit
It's up to you, as to what information that you need to inspect for.
So since this lady only gave me ip_address and destination/source, I installed this into my ACL. I could have very much written like the following;
access-list 111 permit tcp host 172.16.99.55 host 172.16.129.10 eq 356
or
access-list 111 permit ip host 172.16.99.55 host 172.16.129.10
or
access-list 111 permit tcp host 172.16.99.55 gt 1024 host 172.16.129.10 eq 356
Whatever you do, be very careful in selecting that information that you want to debug on. Also your not going to capture data, so I personally could care less on looking at the full flow of data between client-server. The syn/syn-ack would ideal in this setup.
Now after the ACL is crafted, and after selecting the fields/ports/src+dst-address, we can now execute our debug against the ACL list.
( i.e router>debug ip packet 111 detail )
When done, you screen should echo something similar to the following;
-->
debug ip packet 111 detail
IP packet debugging is on (detailed) for access list 111
term mon
And now you monitor the debug filename for any output that it collects when the use attempts to make a session.
cmd more flash:filename
e.g
core01:
-->more flash:debug
( a snippet of the information in that file )
Jan 11
11:25:21.402: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, input
feature
.Jan 11 11:25:21.402:
TCP src=49212, dst=356, seq=2432073100, ack=0, win=8192 SYN, MCI Check(63),
rtype 0, forus FALSE, sendself FALSE, mtu 0.Jan 11 11:25:21.402: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, rcvd 1
.Jan 11 11:25:21.402: TCP src=49212, dst=356, seq=2432073100, ack=0, win=8192 SYN
.Jan 11 11:25:34.019: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, input feature
.Jan 11 11:25:34.019: TCP src=49213, dst=356, seq=2938999768, ack=0, win=8192 SYN, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0
.Jan 11 11:25:34.019: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, rcvd 1
.Jan 11 11:25:34.019: TCP src=49213, dst=356, seq=2938999768, ack=0, win=8192 SYN
.Jan 11 11:25:35.789: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, input feature
.Jan 11 11:25:35.789: TCP src=49214, dst=356, seq=880617259, ack=0, win=8192 SYN, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0
.Jan 11 11:25:35.789: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, rcvd 1
.Jan 11 11:25:35.789: TCP src=49214, dst=356, seq=880617259, ack=0, win=8192 SYN
.Jan 11 11:25:36.796: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, input feature
.Jan 11 11:25:36.796: TCP src=49215, dst=356, seq=2038341671, ack=0, win=8192 SYN, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0
.Jan 11 11:25:36.796: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, rcvd 1
.Jan 11 11:25:36.796: TCP src=49215, dst=356, seq=2038341671, ack=0, win=8192 SYN
.Jan 11 11:25:37.794: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, input feature
.Jan 11 11:25:37.794: TCP src=49216, dst=356, seq=489952225, ack=0, win=8192 SYN, MCI Check(63), rtype 0, forus FALSE, sendself FALSE, mtu 0
.Jan 11 11:25:37.794: IP: s=172.16.17.15(Vlan123), d=172.16.17.88.356, len 64, rcvd 1
.Jan 11 11:25:37.794: TCP src=49216, dst=356, seq=489952225, ack=0, win=8192 SYN
Once again if you execute the above thru out the network infastructure & various points, you can quickly prove or dis-approve that the client is allow and if any upstream ACLs, rules or other device has prevent access. You can take systematic approach and start close to the src ( firewall ) or dest ( server ) if you had multiple systems in the path. I did the above debug on the last switch#2 in this example.
In this scenario, it was found that the business-app software had Pseudo internal error, causing the service to show running, and the task manage window indicating all was fine, but in reality; " the service was NOT listening or responding to the client requests". So the application team, was dragged out of bed and escalated to look into the issues.
Okay, I'm sorry fire-guys. It wasn't your fault. This time :)
On a final note: the logfile shown in this example would log all messages to include other console messages. You could play with tcl scripting ( aka tickle ) to clean up the logfile output.
example here's a very basic tcl script,
more flash:logfilter.tcl
# filter to show any log messages that don't have TIMEZONE
#
dir flash:debug
#
#
more flash:debug | ex EST: ;
The script "dir" the flash:debug file and then "more" out the contents, but exclude anything with the EST timezone in it. This can be execute using anyone of the following means;
Router>tclsh flash:logfilter.tcl
or
Router>tclsh logfilter.tcl
If I ever get un-lazy, I will add to that tcl script to first check to see if the file exist, and if it was more than 0 bytes, before execution of the display of the content. My tcl scripting experience is just as bad as my perl scripting :)
Remember that logfile that we created would be roll-over at 4096bytes, & will collect debug messages ( severity 7 ) and anything higher. You can increase the filesize, but beware it will chew up storage space on the flash/disk/bootflash device. A chassis style switches or routers, with spare storage slots, can benefit from my earlier suggestion of populating the spare slot with a CF ( Compact Flash ) card.
I hope you found this post interesting, but I challenge you to explore debugging w/ACL when all else fails, and you are limited by locations, available personnel, or sniffers & with inspecting packets.
Ken Felix
Freelance Nework/Security Engineer
kfelix at hyperfeed d-o-t com
Cisco is one of the most complicated things to do. It never failed me to have a migraine, gosh! I'll get my self familiarize on these things. Thanks much for sharing. IT network consultants in tri-state
ReplyDelete