That's a much better case for using the url mapper, yes.
On Thu, May 28, 2020 at 1:40 PM Michael Cizmar <michael.ciz...@mcplusa.com> wrote: > Right. Another case that I'm exploring...crawling an internal site and > wanting a load balanced url. So you would crawl something like this: > > http://mystaging-server.myco.com/index.html > > and then want to change it to: > > https://www.myco.com/index.html > > Is that better for the url mapper? > > > > -- > > Michael Cizmar > Managing Director > > p: 312.585.6396 > > d: 312.585.6286 > twitter: @michaelcizmar <http://twitter.com/michaelcizmar> > > http://www.mcplusa.com/ > > > The information contained in this communication is confidential, private, > proprietary, or otherwise privileged and is intended only for the use of > the addressee. This e-mail is intended only for the person or entity to > whom it is directed. Unauthorized use, disclosure, distribution or copying > is strictly prohibited and may be unlawful. If you are not the intended > recipient, please notify us immediately and permanently delete this e-mail > and any attachments. > > ------------------------------ > *From:* Karl Wright <daddy...@gmail.com> > *Sent:* Thursday, May 28, 2020 12:03 PM > *To:* user@manifoldcf.apache.org <user@manifoldcf.apache.org> > *Subject:* Re: URL Mapping > > Thanks! It's far better to implement this than to try and hack it. A > general way of removing session information with regular expressions is > probably not going to cut it either, so for now it's got to be in Java. > > Karl > > > On Thu, May 28, 2020 at 12:47 PM Michael Cizmar < > michael.ciz...@mcplusa.com> wrote: > > The "!ut" and then a bunch of session information is from Web Sphere > Portal. Some information about it here: > > https://books.google.com/books?id=bqAXnpmj5LwC&pg=PA180&lpg=PA180&dq=%22!ut%22+session+variables+websphere#v=onepage&q=%22!ut%22%20session%20variables%20websphere&f=false > > I'll look at making a change to the web crawler to suppor this like the BV > and ASP.NET > > ------------------------------ > *From:* Karl Wright <daddy...@gmail.com> > *Sent:* Thursday, May 28, 2020 11:41 AM > *To:* user@manifoldcf.apache.org <user@manifoldcf.apache.org> > *Subject:* Re: URL Mapping > > Hi, > > There are provisions in the URL canonicallization part of the world for > removal of session information from the URL. It only knows about some > kinds of widely used sessions; java app server sessions, for example, > Broadvision sessions, etc. If you can convince me that your session > information is (a) uniquely identifiable, and (b) commonly used, the proper > approach is to incorporate session removal in this framework. Please let > me know. > > Karl > > > On Thu, May 28, 2020 at 12:11 PM Michael Cizmar < > michael.ciz...@mcplusa.com> wrote: > > I've got a really long url with a bunch of unnecessary session query > string parameters. I've been trying unsuccessfully to map it to the same > url without the session. > > an example of the url below. I thought I could do this: > > url map regular expression: > > (.*)\/!ut > > replacement configuration: > > > > > So the go would be that the url be: > > http://localhost:8080/mcplusa/myportal/agents/portal/quoteenroll/digs%20-%20quoting%20%20enrollment%20(individual)/ > > But the url gets rejected. > > Sample Crawl Url > > > http://localhost:8080/mcplusa/myportal/agents/portal/quoteenroll/digs%20-%20quoting%20%20enrollment%20(individual)/!ut/p/a1/rZHLTsMwEEV_hS6yjDx5OWZpdRFImzYCAYk3lZM6D5TYSWoqPh8HFu2GQhHejEeae-aOLmIoQ0zyY1tz3SrJu7lneLfdBtTxI1iRhzsMFEfrpZ_6AFFoBnIzAN88Cj_pXxBDrJR60A3KeS2kvimV1KZaMKhJ886C8U1pIeSkOtNM3Pz5QewO3IJG9WIGDGW7RzkB7hZFIWxyyx3bL8LAJo6L7QoELitMPAH7r4WXLefmpvBkOoqfiTHth6vYTRxIAT1eufMy8D74Z2DqXg2Mf5Fz-zqOjJq05nzeNcr-FpchuVOyTGpjkOvGbmWlUHYmQtmZCGWfoqF_6omHq83G5gUBL-iOa0oXiw9FOxLu/dl5/d5/L0lJS2FZcHBpbW1LYVlwcGltbVlwcGchIS9vSHd3QUFBSXdpRUFJSkRBQ1VZaUVJVTVCZ09DbFFBQUlBQVNvU0FyUnFBQURBQWF0QXdMTzlRQUFFQUJ3WWVBR0tTQUFDa0k1Z21HU3dTaXJTQUFDZ0s5ZzBIUS80SmlHcGhxRWFoR29ScUVhbEdwaC9aNl9PTzVBMTRHMEs4Ukg2MEE2R0xDNFA0MDBHNy9hZ2VudCBjb250ZW50JTBwb3J0YWwlMHF1b3RlZW5yb2xsJTBkaWdzIC0gcXVvdGluZyAgZW5yb2xsbWVudCAoaW5kaXZpZHVhbCkvZjQ0YmEyOWUtODQwOC00YjFlLTg4MzktMTFlMjI4NDgxYTVhL2RpZ3MgLSBxdW90aW5nICBlbnJvbGxtZW50IChpbmRpdmlkdWFsKQ > >