Hi,
> extended the Value interface instead of InputStream.
That would work as well: JackrabbitValue.
> if we make it like you wrote ...
> every module must handle this internally.
Yes. The modules can use standard JCR event listeners, so they are
backward compatible and implementation independe
some more thoughts ...
thomas, if we make it like you wrote ...
class VirusScanner {
public void scan(InputStream in) throws VirusFoundException {
if(in instanceof DataStoreInputStream) {
DataIdentifier di = ((DataStoreInputStream) in).getDataIdentifier();
if (
hi guys,
for my understanding ...
>Probably not in the Lucene index files itself. Text extraction could be used
>without using the Lucene index, for example to display the text content of a
>>PDF file. The text extraction module could store the DataIdentifier together
>with the extracted text
Hi,
On Mon, Nov 17, 2008 at 11:07 AM, Thomas Müller <[EMAIL PROTECTED]> wrote:
> Currently we don't detect that the binary already exists when using
> the regular JCR API.
We do for things like workspace.copy(...) or
propertyA.setValue(propertyB.getValue()). The only case where we don't
do that i
Hi,
>> Exactly. The DataStore should also check if the InputStream is a
>> DataStoreInputStream, so maybe it doesn't need to copy the binary:
>
> IMHO we should (and currently do) handle that on a higher level, by
> tracking the DataIdentifier in InternalValue.
Currently we don't detect that the
Hi,
On Mon, Nov 17, 2008 at 10:02 AM, Thomas Müller <[EMAIL PROTECTED]> wrote:
>> But what will you do in the case if you try to copy
>> a node internaly .. the datastore should know that he must not read the
>> binary
>> to prevent extra read and write to the datastore.
>
> Exactly. The DataStor
Hi,
> would you store the dataidentifier in the index
> and so in all modules ?
Probably not in the Lucene index files itself. Text extraction could
be used without using the Lucene index, for example to display the
text content of a PDF file. The text extraction module could store the
DataIdenti
Hi Thomas,
>Instead of returning an InputStream, Jackrabbit would return a
>DataStoreInputStream with the additional method getDataIdentifier().
>Then the module can read the identifier, check if the item is already
>processed, and avoid reading the data itself if this identifier is
>already proce
Hi,
The problem is: "process the binary only once".
With 'process' we said 'text extraction', but it could be 'virus
scan', 'index', 'create a thumbnail', 'transfer' (to the client or
from the client), or 'backup' - any expensive task. I believe a good
solution is to provide the object identity t
Hi,
On Tue, Nov 11, 2008 at 10:06 AM, Thomas Müller <[EMAIL PROTECTED]> wrote:
> It's an interesting use case, and probably quite common. It would be
> good if the text extraction would be run only once for each binary.
> However I'm not sure how this should be implemented... One solution is
> to
ue to write down the problems ..
greets
claus
-Ursprüngliche Nachricht-
Von: Thomas Müller [mailto:[EMAIL PROTECTED]
Gesendet: Dienstag, 11. November 2008 10:07
An: dev@jackrabbit.apache.org
Betreff: Re: Workspace.copy() Question ...
Hi,
> i have a nice usecase .. i have a fileno
Hi,
> i have a nice usecase .. i have a filenode in my workspace and i should create
> about 70 copies of this node.
> its a not so small pdf file (10Mb) and i am using the datastore so its no
> problem
> the binary exists only one time but the problem is the textextractor. it will
> be called 7
hi there ...
i have a nice usecase .. i have a filenode in my workspace and i should create
about 70 copies of this node.
its a not so small pdf file (10Mb) and i am using the datastore so its no
problem
the binary exists only one time but the problem is the textextractor. it will
be call
13 matches
Mail list logo