Hi, Thanks for all this info. I still have to read it through carefully. One thing though - I had already tried the Limit checkbox under the jCasGen button, but it doesn't seem to work - the ruta types still get generated. I also set it as a preference (Window->Preferences->UIMA Preferences, clicked the checkbox that says Limit JCasGen to types defined in this project) but that didn't work either.
I will check on the RutaDescriptorFactory. So far, what I am doing works, but my rules are not yet very complex. thanks, Bonnie MacKellar On Mon, Jan 18, 2016 at 10:09 AM, Peter Klügl <peter.klu...@averbis.com> wrote: > Hi, > > comments inline... > > Some background information at first: > The UIMA Ruta Workbench and its project layout was initially designed to > support the user in developing rules. Thus, the project structure is "as > simple as possible" and tailored for this task. However, the deployment > and integration of the rules in actual applications was neglected since > there is no direct support for generic build tools, e.g, for packaging > the rules and descriptiors correctly in a jar. This leads often to > situations where users need to perform manual steps which is a no go for > larger projects. > > In my opinion, maven has more advantages than disadvantages and is a > must for development of larger java projects. Thus, I created a maven > plugin that does that stuff, the Workbench was responsible for: it > creates the descriptors and compiles the wordlists. I also adapted the > Workbench to work with normal maven based projects. So now, at least > when I write serious ruta rules I do not use the old project layout in > the Workbench, but a common maven project with additional plugins. There > you can specify the packaging of your ruta based application or just see > the ruta files as some resources in a java file, and you can add > automatic unit tests. It's much more flexible and powerful and I think > it's worth the trouble. Sometimes I also use the old project layout, but > just for some quick testing. > > I cannot provide a general introduction to maven. Best to start by using > an existing project and learn by modifying and adapting it to your use > case. > > Am 14.01.2016 um 18:45 schrieb Bonnie MacKellar: > > Hi, > > > > Sorry that the images did not go through! I had hoped to avoid typing > > > > Here it is > > The top level > > > > ECClassifier > > - bin > > --desc > > -descriptor > > -utils > > -BasicEngine.xml > > -BasicTypeSystem.xml > > -InternalTypeSystem.xml > > -testrulesEngine.xml > > -testrulesTypeSystem.xml > > -input > > -metadata > > -output > > -script > > -testrules.ruta > > -src > > -desc > > -types > > -BasciMetaMapTypeSystem.xml > > -BasicTypeSystem.xml > > -InternalTypeSystem.xml > > -MetaMapApiAE.xml > > -MetaMapApiTypeSystem.xml > > -testrulesTypeSystem.xml > > -ec > > -metamap > > -PipelineSystem.java (this is the main application) > > -gov (all of the Metamap types were copied here for some reason) > > -META-INF > > -org.apache.uima.fit > > -types.txt > > -ruta > > -ec > > -testrules.ruta (copied from above) > > -testrules (these are the types generated by jCasGen, which come > > from my Ruta script) > > -Relational_Type.java > > -Relational.java > > -UMLSConcept_type.java > > -UMLSConcept.java > > > > Unfortunately I am rather new to all of this, so I am not totally > following > > some of your answers. I put my comments and questions inline. The > > description of integrating Ruta into Java in the tutorial doesn't say > much > > about project layout in Eclipse so I had to do lots of internet searches > > and copy what I found, so I am sure I am not doing things in the best > way. > > What did you miss? I should link/add more examples. I will add more > documentation when I find the time. > > > > On Thu, Jan 14, 2016 at 11:19 AM, Peter Klügl <peter.klu...@averbis.com> > > wrote: > > > >> Hi, > >> > >> just a few short first comments... more tomorrow... > >> > >> - Unfortunately, the images did not make it (due to the mailing list > >> settings?). You can send me the mail directly if you want. > >> - I really prefer now to develop ruta script in maven built projects. Is > >> maven an option for you? > >> > > I don't know Maven very well and really did not want to add another layer > > of complexity to this already very complex system. How does Maven help? > > It automates the manual steps like generating the JCas classes and > copying the descriptors. There is a maven plugin for JCasGen and one for > ruta. Thus the build process results in exactly the jar you want. It's > really terrible in the beginning, but I would not want to miss it now. > > > > >> - You can limit JCasGen to the current project. Then, only local type > >> systems are used to generate the classes and the problem with overriding > >> RutaBasic is avoided. However, if you copy the descriptors, that does > >> not help. > >> > > I copied the descriptors, and then used jCasGen on the descriptor down in > > the src folder. How do I limit JCasGen? > > > > There is a checkbox right below the JCasGen button in hte component > descriptor editor. The maven plugin also support this option. If you > want to generate JCas classes from ruta scripts, then you have to take > care that the types of BasicTypeSystem are not generated as well. The > only way I see right now to avoid that is this option but it only works > if the type system is located in a different project. Thus, no copying > and no old project layout (or some hacks). With the maven plugin, the > descriptors can be loaded from, e.g., the classpath of the project, and > do not need to be located within the project. > > >> - JCasGen on generated type systems of ruta scripts can be tricky > >> (because ruta imports the BasicTypeSystem by default and this one should > >> not be generated anew). I rather recommned to define JCas cover class > >> type in separate type systems. > >> > > Not sure I follow what this means > > > > I normally have additional typesystem descriptors in my ruta projects > which contain the types that are used to generate the java classes. The > types defined in the ruta scripts are not used for generating the JCas > classes but only for intermediate annotations. This is probably only a > personal preference. > > > >> - Copying descriptors should be avoided in general > >> > > So how do I develop using the workbench but get the results into Java? > ALl > > of the posts I read stressed that you have to have the typedescriptors > and > > script under the src folder, but the workbench doesn't want them there. > The > > only online example I could find that uses both MetaMap and Ruta had a > > layout kind of similar to mine. > > The maven plugin is able to generate the buildpath that is required by > the Workbench to work correctly. It contains the ruta specific source > paths. So the Workbench can also be used in normal java projects if they > are built with maven. Then, everything is automatically usable in Java. > > > > >> - Do you need the descriptors of ruta at all? Did you define new types > >> in ruta scripts? The java code does not make use of the ruta descriptors > >> > > Yes, I define new types, and eventually there will be lots of types. The > > current Ruta script is a test case only. I am not sure what you mean when > > you say the java code does not make use of the descriptors. If I don't > have > > them, I get lots of runtime errors. > > > > Hmmm... missed the types.txt which is responsible that the types are > available. > > Your code lines > AnalysisEngine rae = > AnalysisEngineFactory.createEngine(RutaEngine.class, > RutaEngine.PARAM_MAIN_SCRIPT, > "testrules"); > AnalysisEngineDescription rutaEngineDesc = > AnalysisEngineFactory.createEngineDescription(RutaEngine.class, > RutaEngine.PARAM_MAIN_SCRIPT, "testrules"); > > use the RutaEngine implementation in order to create the analysis engine > description. This description do not refer the types defined in the ruta > scripts. This works for simple script project but will fail for more > complicated ones, e.g., when you have several scripts importing each > other. There is a helper class in ruta for generating descriptors from > scripts including types: > org.apache.uima.ruta.descriptor.RutaDescriptorFactory > > > >> - The way you create the ruta descriptors in the java example does not > >> support all ruta functionality, e.g. , new types > >> > > I am probably doing it completely wrong :-). I couldn't find many > examples. > > No, a lot of functionality was added but not as much documentation. Let > me know if you want to have more information about the > RutaDescriptorFactory. The maven plugin uses this class the generate the > descriptors. > > I hope this helps a bit. Just ask if you have more questions or if > something is not clear. > > Best, > > Peter > > > > > >> - The duplicate import is fixed in the next release > >> > > OK, good to know that this is not something I am doing wrong. > > > > > >> - Is the code open source somewhere, e.g., on github? > >> > > No, beause it is test code only right now. > > > >> Best, > >> > >> Peter > >> > >> Am 14.01.2016 um 16:13 schrieb Bonnie MacKellar: > >>> Hi, > >>> > >>> I just spent the last 4 days stumbling through the documentation, > >>> tutorials, posts to this mailing list, and any code examples I could > >>> find on the Internet, so I could integrate the Metamap annotator and a > >>> RUTA script in Java using UimaFit. I succeeded, and I have something > >>> that runs, but I doubt I am organizing things the best way in Eclipse, > >>> and in particular, I am noticing some odd things if I try to build and > >>> test the script first in the Ruta development environment in Eclipse > >>> and then move the script to my Java environment. I suspect my workflow > >>> is not the best possible, so I am looking for advice on how to manage > >>> this. > >>> > >>> My project was created as a Ruta project so I could have the > >>> development environment support. I then added Uima nature to the > >>> project to get the Java development folders. I set up the type > >>> descriptors for Metamap, and after much reading, realized I needed a > >>> types.txt file in my source folder that tells the system how to find > >>> the Metamap type descriptors. I then added the Ruta script to the > >>> pipeline in my Java class and then copied the type descriptor for that > >>> down to my source folders as well. Finally, I realized I needed java > >>> classes for the types, and that pressing a jCasGen button in the > >>> ComponentDescriptorEditor was the way to do that. However, there are > >>> some anomalies when I do this. > >>> > >>> So, my project has this structure at the top level > >>> > >>> Inline image 1 > >>> > >>> and at the src level, this is the structure. Notice that the Ruta > >>> script and types have been copied down to this level > >>> > >>> Inline image 2 > >>> > >>> > >>> The code that creates the AnalysisEngineDescriptors and runs the > >>> pipeline looks like this (it is in PipelineSystem. java) > >>> > >>> try { > >>> ae = > >>> > >> > AnalysisEngineFactory.createEngine(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class); > >>> AnalysisEngineDescription mmEngineDesc = > >>> > >> > AnalysisEngineFactory.createEngineDescription(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class); > >>> AnalysisEngine rae = > >>> AnalysisEngineFactory.createEngine(RutaEngine.class, > >>> RutaEngine.PARAM_MAIN_SCRIPT, > >>> "testrules"); > >>> AnalysisEngineDescription rutaEngineDesc = > >>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class, > >>> RutaEngine.PARAM_MAIN_SCRIPT, > >>> "testrules"); > >>> JCas jCas = ae.newJCas(); > >>> jCas.setDocumentText("serum albumin greater or equal 2g/dL"); > >>> SimplePipeline.runPipeline(jCas, mmEngineDesc, rutaEngineDesc); > >>> displayResults(jCas); > >>> displayRutaResults(jCas); > >>> > >>> and the types.txt file contains this > >>> classpath*:desc/types/MetaMapApiTypeSystem.xml > >>> classpath*:desc/types/BasicTypeSystem.xml > >>> classpath*:desc/types/InternalTypeSystem.xml > >>> classpath*:desc/types/testrulesTypeSystem.xml > >>> > >>> > >>> If I want to use the Ruta Workbench to develop my Ruta script, it > >>> appears that I have to regenerate the java type files, such as > >>> Relational.java, each time I make a change. Is that correct? > >>> And when I do this, I notice that it completely regenerates the > >>> org.apache.uima.ruta.type hierarchy, which leads to an odd runtime > >>> error (NoSuchMethodException, caused by trying to call > >>> setLowMemoryProfile). I read a chain on this list about this error > >>> which recommended to delete the regenerated uima type hierachy. This > >>> worked, but it seems I have to go through these steps every time I > >>> regenerate the Ruta types, which is a pain. > >>> > >>> Also, I notice that the metamap type hierarchy is also regenerated > >>> inside my project. I theorize it is because of the import in my Ruta > >>> type descriptor > >>> TYPESYSTEM BasicTypeSystem; > >>> TYPESYSTEM BasicMetaMapTypeSystem; > >>> TYPESYSTEM MetaMapApiTypeSystem; > >>> DECLARE Relational,UMLSConcept; > >>> Candidate{ -> MARK(UMLSConcept)}; > >>> > >>> is this not the right way to make my script aware of the Metamap types? > >>> > >>> I also notice that in the type descriptor, this import is generated > twice > >>> <imports> > >>> <import location="BasicTypeSystem.xml"/> > >>> <import location="BasicTypeSystem.xml"/> > >>> </imports> > >>> > >>> In general, is it a good or bad idea to develop the Ruta script in the > >>> workbench and then copy its pieces into the Java source folder? It > >>> seems like a very convoluted process. > >>> > >>> Thanks for your help > >>> > >>> Bonnie MacKellar > >> > >