Hi Dr. Peter,

Did you had time to see the gist or any more information that is required
please let me know

Also recently we found a text that throws stackoverflow in local system as
for the same ruta script  shared here is the text , its part of a email
that gets translated to base 64 may be some special symbol or any thing
else present in email body. but with this it breaks but still we are not
sure the actual texts that caused it oom in prod


IkVtcGxveWVlIE5hbWUiLCJFbXBsb3llZSBDb2RlIiwiRW1wbG95ZWUgU3RhdHVzIiwi
RHJpdmVyIElEIiwiRGVmaWNpZW5jeSBDb3VudCIsIkxvY2F0aW9uIiwiRm9ybSIsIkZp
bGUgTmFtZSIsIk1lc3NhZ2UgTnVtYmVyIiwiTWVzc2FnZSBEZXNjcmlwdGlvbiIsIkV4
cGlyZWQgRGF0ZSINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNz
IiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIkFN
QkdDIiwiQW1hem9uIEJhY2tncm91bmQgU3RhdHVzIEZpbGUiLCI5MjUiLCJBbWF6b24g
QmFja2dyb3VuZCBDaGVjayBTdGF0dXMgUGVuZGluZyIsIiINCiJSdWJlbiBFc2NvYmVk
byIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxl
IExvZ2lzdGljcyBMTENfRElJMyIsIkFNWk9UIiwiQW1hem9uIENvbmR1Y3RlZCBUcmFp
bmluZyBSZXF1aXJlbWVudHMiLCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJl
biBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJE
ZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIk1FTlJWIiwiRHJpdmVyIFF1YWxp
ZmljYXRpb24iLCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJlbiBFc2NvYmVk
byIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxl
IExvZ2lzdGljcyBMTENfRElJMyIsIlJUUlRDIiwiRHJpdmVyIFF1YWxpZmljYXRpb24i
LCIwIiwiRG9jdW1lbnQgTWlzc2luZyIsIiINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2
NjkiLCJJbiBQcm9jZXNzIiwiMTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGlj
cyBMTENfRElJMyIsIkFQUCIsIkRyaXZlciBRdWFsaWZpY2F0aW9uIiwiMTE3IiwiUFJF
VklPVVMgRU1QTE9ZTUVOVCBBRERSRVNTIElORk9STUFUSU9OIE1JU1NJTkcvSU5DT01Q
TEVURSIsIiINCiJSdWJlbiBFc2NvYmVkbyIsIjE3MDY2NjkiLCJJbiBQcm9jZXNzIiwi
MTQ2NjQyMSIsIjYiLCJEZXBlbmRhYmxlIExvZ2lzdGljcyBMTENfRElJMyIsIk1FQyIs
IkRyaXZlciBRdWFsaWZpY2F0aW9uIiwiMCIsIkRvY3VtZW50IE1pc3NpbmciLCIiDQo=



Thanks

On Sun, Aug 7, 2022 at 11:59 AM Md Azaz Ali <[email protected]> wrote:

> Hi Dr. Peter,
>
>
> sorry for not being able to clarify it , i have created gist .
>
> Below gist has address.ruta file with one one example attached to both the
> rules
>
> https://gist.github.com/azazali30/635c3b80e02908e9f8387db3fda865db
>
>
> Many Thanks
>
>
>
>
>
> On Sat, Aug 6, 2022 at 4:11 PM Peter Klügl <[email protected]>
> wrote:
>
>> Hi,
>>
>>
>> I had a quick look at the rules. Given the examples you provided, only
>> the first rule matches three times, the second rule not once.
>>
>> So I have to ask before I can refactor the rules: what should the rules
>> annotate exactly?
>>
>>
>> Best
>>
>>
>> Peter
>>
>>
>> Am 05.08.2022 um 11:44 schrieb Md Azaz Ali:
>> > Hi   Dr. Peter Klügl,
>> >
>> > Yes its same in stackoverflow
>> >
>> > On Fri, Aug 5, 2022 at 12:48 PM Peter Klügl <[email protected]>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >>
>> >> the attachements are removed by the mailing list. Are the rules the
>> same
>> >> as in the StackOverflow question?
>> >>
>> >>
>> >> Best,
>> >>
>> >>
>> >> Peter
>> >>
>> >> Am 04.08.2022 um 20:15 schrieb Md Azaz Ali:
>> >>> HI Dr. Peter,
>> >>>
>> >>> Here are some example addresses that the attached ruta is able to
>> find.
>> >>>
>> >>> There is two ruta rules which is used one is for multiline addresses
>> >>> and other for single line addresses.
>> >>> Also we are using some prepopulated EntityType Annotation with feature
>> >>> location_indicator
>> >>>
>> >>>
>> >>>
>> >>> //Annotation EntityType with feature location_indicator is already
>> >>> present = Georgia
>> >>>
>> >>> 11175 Cicero Drive
>> >>> Suite 200
>> >>> Alpharetta, Georgia 30022
>> >>>
>> >>>
>> >>>
>> >>> //EntityType with feature location_indicator is already present =
>> >>> Cambridge;MA;U.S.A
>> >>>
>> >>> One Rogers Street
>> >>> Cambridge, MA
>> >>> 02142-1209
>> >>> U.S.A
>> >>>
>> >>> //EntityType with feature location_indicator is already present  =
>> >>> Cambridge, MA, U.S.A.
>> >>> 1120 Avenue of the Americas
>> >>> 4th Floor
>> >>> New York, NY 10036
>> >>> U.S.A.
>> >>>
>> >>>
>> >>> //EntityType with feature location_indicator is already present =
>> U.S.A
>> >>>
>> >>> 11175 Cicero Drive
>> >>> Suite 200
>> >>> Alpharetta, Georgia 30022
>> >>> U.S.A
>> >>>
>> >>> //EntityType with feature location_indicator is already present =
>> U.S.A
>> >>>
>> >>> My new address is
>> >>> 8 Commerce Dr.
>> >>> Suite 3B
>> >>> Bedford, NH 03110
>> >>> U.S.A
>> >>>
>> >>>
>> >>> //EntityType with feature location_indicator is already present  =
>> U.S.A.
>> >>>
>> >>> 400 Renaissance Center Drive
>> >>> Suite 2600
>> >>> Detroit, MI 48243
>> >>> U.S.A.
>> >>>
>> >>> //EntityType with feature location_indicator is already present  =
>> U.S.A.
>> >>>
>> >>> 125 Wacker Drive
>> >>> Suite 300
>> >>> Chicago, IL 60606
>> >>> U.S.A.
>> >>>
>> >>> //EntityType with feature location_indicator is already present  =
>> U.S.A.
>> >>>
>> >>>
>> >>> 1120 Avenue of the Americas
>> >>> 4th Floor
>> >>> New York, NY 10036
>> >>> U.S.A.
>> >>>
>> >>>
>> >>> 222 West Las Colinas Blvd. Suite 1650 North Tower Millennium Center
>> >>> Irving, TX 75039 U.S.A.
>> >>>
>> >>>
>> >>> Block No. 9A, Pritech Park SEZ, RMZ Ecospace Internal Road, Bellandur,
>> >>> Bengaluru, Karnataka 560103, India
>> >>>
>> >>>
>> >>>
>> >>> Thanks & Regard
>> >>> Md Azaz Ali
>> >>>
>> >>> On Thu, Aug 4, 2022 at 5:42 PM Peter Klügl <[email protected]>
>> >>> wrote:
>> >>>
>> >>>      Hi,
>> >>>
>> >>>
>> >>>      yes, I can suggest some refactored rules.
>> >>>
>> >>>      However, I do not know the common input data and the use cases.
>> It is
>> >>>      easier for me if I have a few representative input snippets I can
>> >>>      test
>> >>>      the refactored rules against. Can you provide some (artifical)
>> >>>      example
>> >>>      text snippets?
>> >>>
>> >>>
>> >>>      Best
>> >>>
>> >>>
>> >>>      Peter
>> >>>
>> >>>
>> >>>      Am 04.08.2022 um 11:33 schrieb Md Azaz Ali:
>> >>>      > Hi Dr. Peter Klügl,
>> >>>      >
>> >>>      >
>> >>>      > 1. We are not able to upgrade to Ruta 3.x because we have to
>> >>>      upgrade
>> >>>      > uimaj-core also and to do that we need an stable version of
>> >>>      cleartk-ml
>> >>>      > (which is not working with uima 3.x).
>> >>>      >
>> >>>      > 2. using PARAM_MAX_RULE_MATCHES ,
>> PARAM_MAX_RULE_ELEMENT_MATCHES we
>> >>>      > are not sure what numer will be good enough.
>> >>>      >
>> >>>      > 3. if possible can you please suggest an improved version for
>> above
>> >>>      > script it will really help.
>> >>>      >
>> >>>      > 4. Also getting a new build from main-v2 is also not possible
>> >>>      because
>> >>>      > we can only use ga versions which are available directly in mvn
>> >>>      repository
>> >>>      >
>> >>>      > I am attaching one script file if you can suggest the possible
>> >>>      > improvements it will be really helpful.
>> >>>      >
>> >>>      > Note: I am new to ruta and these ruta scripts are written by
>> old
>> >>>      > developers in my company who are not associated with the
>> company
>> >>>      any
>> >>>      > more.
>> >>>      >
>> >>>      > Many Thanks
>> >>>      >
>> >>>      >
>> >>>      > On Tue, Aug 2, 2022 at 8:35 PM Peter Klügl
>> >>>      <[email protected]>
>> >>>      > wrote:
>> >>>      >
>> >>>      >     Hi,
>> >>>      >
>> >>>      >
>> >>>      >     thanks for the pointer. I added an answer.
>> >>>      >
>> >>>      >     Let me know if you want to have more information about the
>> rule
>> >>>      >     refactoring.
>> >>>      >
>> >>>      >
>> >>>      >     In my experience, the life of a Ruta rule engineer is much
>> >>>      easier
>> >>>      >     if the
>> >>>      >     Ruta rules stay small :-)
>> >>>      >
>> >>>      >
>> >>>      >     Best,
>> >>>      >
>> >>>      >
>> >>>      >     Peter
>> >>>      >
>> >>>      >
>> >>>      >     Am 31.07.2022 um 21:09 schrieb Md Azaz Ali:
>> >>>      >     >
>> >>>      >
>> >>>
>> >>
>> https://stackoverflow.com/questions/73147822/getting-oom-issue-while-running-ruta-script-with-large-texts
>> >>>      >     >
>> >>>      >     >
>> >>>      >     >
>> >>>      >     > Many Thanks
>> >>>      >     >
>> >>>      >     --
>> >>>      >     Dr. Peter Klügl
>> >>>      >     Head of Text Mining/Machine Learning
>> >>>      >
>> >>>      >     Averbis GmbH
>> >>>      >     Salzstr. 15
>> >>>      >     79098 Freiburg
>> >>>      >     Germany
>> >>>      >
>> >>>      >     Fon: +49 761 708 394 0
>> >>>      >     Fax: +49 761 708 394 10
>> >>>      >     Email: [email protected]
>> >>>      >     Web: https://averbis.com
>> >>>      >
>> >>>      >     Headquarters: Freiburg im Breisgau
>> >>>      >     Register Court: Amtsgericht Freiburg im Breisgau, HRB
>> 701080
>> >>>      >     Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél
>> Markó
>> >>>      >
>> >>>      --
>> >>>      Dr. Peter Klügl
>> >>>      Head of Text Mining/Machine Learning
>> >>>
>> >>>      Averbis GmbH
>> >>>      Salzstr. 15
>> >>>      79098 Freiburg
>> >>>      Germany
>> >>>
>> >>>      Fon: +49 761 708 394 0
>> >>>      Fax: +49 761 708 394 10
>> >>>      Email:[email protected]
>> >>>      <mailto:email%[email protected]>
>> >>>      Web:https://averbis.com
>> >>>
>> >>>      Headquarters: Freiburg im Breisgau
>> >>>      Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> >>>      Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> >>>
>> >> --
>> >> Dr. Peter Klügl
>> >> Head of Text Mining/Machine Learning
>> >>
>> >> Averbis GmbH
>> >> Salzstr. 15
>> >> 79098 Freiburg
>> >> Germany
>> >>
>> >> Fon: +49 761 708 394 0
>> >> Fax: +49 761 708 394 10
>> >> Email:[email protected]
>> >> Web:https://averbis.com
>> >>
>> >> Headquarters: Freiburg im Breisgau
>> >> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> >> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> >>
>> --
>> Dr. Peter Klügl
>> Head of Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: [email protected]
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>>

Reply via email to