Hi,

sorry, I haven't had time yet, but I will have a look at it this weekend.


Best


Peter


Am 10.08.2022 um 09:00 schrieb Md Azaz Ali:
Hi Dr. Peter,

Did you had time to see the gist or any more information that is required
please let me know

Also recently we found a text that throws stackoverflow in local system as
for the same ruta script  shared here is the text , its part of a email
that gets translated to base 64 may be some special symbol or any thing
else present in email body. but with this it breaks but still we are not
sure the actual texts that caused it oom in prod


IkVtcGxveWVlIE5hbWUiLCJFbXBsb3llZSBDb2RlIiwiRW1wbG95ZWUgU3RhdHVzIiwi
RHJpdmVyIElEIiwiRGVmaWNpZW5jeSBDb3VudCIsIkxvY2F0aW9uIiwiRm9ybSIsIkZp
bGUgTmFtZSIsIk1lc3NhZ2UgTnVtYmVyIiwiTWVzc2FnZSBEZXNjcmlwdGlvbiIsIkV4
cGlyZWQgRGF0ZSINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNz
IiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIkFN
QkdDIiwiQW1hem9uIEJhY2tncm91bmQgU3RhdHVzIEZpbGUiLCI5MjUiLCJBbWF6b24g
QmFja2dyb3VuZCBDaGVjayBTdGF0dXMgUGVuZGluZyIsIiINCiJSdWJlbiBFc2NvYmVk
byIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxl
IExvZ2lzdGljcyBMTENfRElJMyIsIkFNWk9UIiwiQW1hem9uIENvbmR1Y3RlZCBUcmFp
bmluZyBSZXF1aXJlbWVudHMiLCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJl
biBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJE
ZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIk1FTlJWIiwiRHJpdmVyIFF1YWxp
ZmljYXRpb24iLCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJlbiBFc2NvYmVk
byIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxl
IExvZ2lzdGljcyBMTENfRElJMyIsIlJUUlRDIiwiRHJpdmVyIFF1YWxpZmljYXRpb24i
LCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2
NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGlj
cyBMTENfRElJMyIsIkFQUCIsIkRyaXZlciBRdWFsaWZpY2F0aW9uIiwiMTE3IiwiUFJF
VklPVVMgRU1QTE9ZTUVOVCBBRERSRVNTIElORk9STUFUSU9OIE1JU1NJTkcvSU5DT01Q
TEVURSIsIiINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwi
MTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIk1FQyIs
IkRyaXZlciBRdWFsaWZpY2F0aW9uIiwiMCIsIkRvY3VtZW50IE1pc3NpbmciLCIiDQo=



Thanks

On Sun, Aug 7, 2022 at 11:59 AM Md Azaz Ali <[email protected]> wrote:

Hi Dr. Peter,


sorry for not being able to clarify it , i have created gist .

Below gist has address.ruta file with one one example attached to both the
rules

https://gist.github.com/azazali30/635c3b80e02908e9f8387db3fda865db


Many Thanks





On Sat, Aug 6, 2022 at 4:11 PM Peter Klügl <[email protected]>
wrote:

Hi,


I had a quick look at the rules. Given the examples you provided, only
the first rule matches three times, the second rule not once.

So I have to ask before I can refactor the rules: what should the rules
annotate exactly?


Best


Peter


Am 05.08.2022 um 11:44 schrieb Md Azaz Ali:
Hi   Dr. Peter Klügl,

Yes its same in stackoverflow

On Fri, Aug 5, 2022 at 12:48 PM Peter Klügl <[email protected]>
wrote:

Hi,


the attachements are removed by the mailing list. Are the rules the
same
as in the StackOverflow question?


Best,


Peter

Am 04.08.2022 um 20:15 schrieb Md Azaz Ali:
HI Dr. Peter,

Here are some example addresses that the attached ruta is able to
find.
There is two ruta rules which is used one is for multiline addresses
and other for single line addresses.
Also we are using some prepopulated EntityType Annotation with feature
location_indicator



//Annotation EntityType with feature location_indicator is already
present = Georgia

11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022



//EntityType with feature location_indicator is already present =
Cambridge;MA;U.S.A

One Rogers Street
Cambridge, MA
02142-1209
U.S.A

//EntityType with feature location_indicator is already present  =
Cambridge, MA, U.S.A.
1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


//EntityType with feature location_indicator is already present =
U.S.A
11175 Cicero Drive
Suite 200
Alpharetta, Georgia 30022
U.S.A

//EntityType with feature location_indicator is already present =
U.S.A
My new address is
8 Commerce Dr.
Suite 3B
Bedford, NH 03110
U.S.A


//EntityType with feature location_indicator is already present  =
U.S.A.
400 Renaissance Center Drive
Suite 2600
Detroit, MI 48243
U.S.A.

//EntityType with feature location_indicator is already present  =
U.S.A.
125 Wacker Drive
Suite 300
Chicago, IL 60606
U.S.A.

//EntityType with feature location_indicator is already present  =
U.S.A.

1120 Avenue of the Americas
4th Floor
New York, NY 10036
U.S.A.


222 West Las Colinas Blvd. Suite 1650 North Tower Millennium Center
Irving, TX 75039 U.S.A.


Block No. 9A, Pritech Park SEZ, RMZ Ecospace Internal Road, Bellandur,
Bengaluru, Karnataka 560103, India



Thanks & Regard
Md Azaz Ali

On Thu, Aug 4, 2022 at 5:42 PM Peter Klügl <[email protected]>
wrote:

      Hi,


      yes, I can suggest some refactored rules.

      However, I do not know the common input data and the use cases.
It is
      easier for me if I have a few representative input snippets I can
      test
      the refactored rules against. Can you provide some (artifical)
      example
      text snippets?


      Best


      Peter


      Am 04.08.2022 um 11:33 schrieb Md Azaz Ali:
      > Hi Dr. Peter Klügl,
      >
      >
      > 1. We are not able to upgrade to Ruta 3.x because we have to
      upgrade
      > uimaj-core also and to do that we need an stable version of
      cleartk-ml
      > (which is not working with uima 3.x).
      >
      > 2. using PARAM_MAX_RULE_MATCHES ,
PARAM_MAX_RULE_ELEMENT_MATCHES we
      > are not sure what numer will be good enough.
      >
      > 3. if possible can you please suggest an improved version for
above
      > script it will really help.
      >
      > 4. Also getting a new build from main-v2 is also not possible
      because
      > we can only use ga versions which are available directly in mvn
      repository
      >
      > I am attaching one script file if you can suggest the possible
      > improvements it will be really helpful.
      >
      > Note: I am new to ruta and these ruta scripts are written by
old
      > developers in my company who are not associated with the
company
      any
      > more.
      >
      > Many Thanks
      >
      >
      > On Tue, Aug 2, 2022 at 8:35 PM Peter Klügl
      <[email protected]>
      > wrote:
      >
      >     Hi,
      >
      >
      >     thanks for the pointer. I added an answer.
      >
      >     Let me know if you want to have more information about the
rule
      >     refactoring.
      >
      >
      >     In my experience, the life of a Ruta rule engineer is much
      easier
      >     if the
      >     Ruta rules stay small :-)
      >
      >
      >     Best,
      >
      >
      >     Peter
      >
      >
      >     Am 31.07.2022 um 21:09 schrieb Md Azaz Ali:
      >     >
      >

https://stackoverflow.com/questions/73147822/getting-oom-issue-while-running-ruta-script-with-large-texts
      >     >
      >     >
      >     >
      >     > Many Thanks
      >     >
      >     --
      >     Dr. Peter Klügl
      >     Head of Text Mining/Machine Learning
      >
      >     Averbis GmbH
      >     Salzstr. 15
      >     79098 Freiburg
      >     Germany
      >
      >     Fon: +49 761 708 394 0
      >     Fax: +49 761 708 394 10
      >     Email: [email protected]
      >     Web: https://averbis.com
      >
      >     Headquarters: Freiburg im Breisgau
      >     Register Court: Amtsgericht Freiburg im Breisgau, HRB
701080
      >     Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél
Markó
      >
      --
      Dr. Peter Klügl
      Head of Text Mining/Machine Learning

      Averbis GmbH
      Salzstr. 15
      79098 Freiburg
      Germany

      Fon: +49 761 708 394 0
      Fax: +49 761 708 394 10
      Email:[email protected]
      <mailto:email%[email protected]>
      Web:https://averbis.com

      Headquarters: Freiburg im Breisgau
      Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
      Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email:[email protected]
Web:https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: [email protected]
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: [email protected]
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Reply via email to