How to ingest files when metadata contain non standard characters?

2014-10-07 Thread Konstantinos Mavrommatis
Hi, I am trying to ingest a large number of files. The metadata for these files exist in .met files. Many of the metadata fields contain characters like '<>&$' etc. Running crawler on these metadata results in failure. When I try to escape the characters using HTML encode e.g. '>' becomes > et

Re: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Lewis John Mcgibbney
Hi Kos, I take you up on your challenge ;) However I don't know if this will fix it. On Tue, Oct 7, 2014 at 11:31 PM, Konstantinos Mavrommatis < kmavromma...@celgene.com> wrote: > > sailfish quant --index > /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype > 'T=PE:O=><:S=AS' -1 <(gunz

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
2014 1:43 PM > To: dev@oodt.apache.org > Subject: Re: How to ingest files when metadata contain non standard > characters? > > Hi Kos, > I take you up on your challenge ;) However I don't know if this will > fix it. > > On Tue, Oct 7, 2014 at 11:31 PM

Re: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Lewis John Mcgibbney
Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis < kmavromma...@celgene.com> wrote: > I escaped the characters using the CGI::escapeHTML function from the CGI > perl module. > Wow. I am surpised at this one. I wonder if this is a bug which results in the discrepa

Re: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Lewis John Mcgibbney
In addition, if you can get to the bottom of what you think the intended behaviour is here, please feel free to log a ticket in Jira https://issues.apache.org/jira/browse/OODT/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel On Wed, Oct 8, 2014 at 5:59 PM, Lewis John Mcgibbney < l

Re: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Chris Mattmann
: Date: Thursday, October 9, 2014 at 2:59 AM To: "dev@oodt.apache.org" Subject: Re: How to ingest files when metadata contain non standard characters? >Hi Kos, >Thanks for reply > >On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis < >kmavromma...@celgene.co

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
, October 08, 2014 11:38 AM > To: dev@oodt.apache.org > Subject: Re: How to ingest files when metadata contain non standard > characters? > > cas-metadata should handle this escaping/unescaping in its SerDe > capabilities. > > Kostsas, can yo provide the exact file that

Re: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Chris Mattmann
To: "dev@oodt.apache.org" Subject: RE: How to ingest files when metadata contain non standard characters? >Thanks Chris, > >attached is an offending file before escape. >For the record perl module HTML::Entities does provide an escapeHTML >alternative that produces accep

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
Here is the offending file before escape: http://oodt.jpl.nasa.gov/1.0/cas";> derived_from /gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz

Re: How to ingest files when metadata contain non standard characters?

2014-10-09 Thread Lewis John Mcgibbney
Thanks Kos, Please see https://issues.apache.org/jira/browse/OODT-759 We will track it there from now on and determine what needs to be done. On Wed, Oct 8, 2014 at 8:55 PM, Konstantinos Mavrommatis < kmavromma...@celgene.com> wrote: > Here is the offending file before escape: > > > > http://oodt