Hi,

I'm trying to understand the configuration parameters for IGFS. My use case
is using IGFS with a secondary file system, thus acting as a cache for a
hadoop file system, without having to modify any existing application (just
the input and output path that will now use the igfs scheme). In the
javadoc for FileSystemConfiguration I see:

int getPerNodeBatchSize()
Gets number of file blocks buffered on local node before sending batch to
remote node.
int getPerNodeParallelBatchCount()
Gets number of batches that can be concurrently sent to remote node.
int getPrefetchBlocks()
Get number of pre-fetched blocks if specific file's chunk is requested.

What is the remote node here? I understand this doesn't have to do with
other ignite nodes holding backup copies, as that would be set in the cache
configuration.

I have also taken a look to http://apache-ignite-users.70518.x6.nabble.
com/IGFS-Data-cache-size-td2875.html but that post seems to refer to a
deprecated field FileSystemConfiguration.maxSpaceSize that I haven't been
able to see neither in the javadoc or in https://github.com/apache/
ignite/blob/2.3.0/modules/core/src/main/java/org/apache/
ignite/configuration/FileSystemConfiguration.java. Other questions that I
have regarding Ignite configuration in the context of this use case:

 - When I use ATOMIC for the atomicityMode of metaCacheConfiguration I get
an launch exception  "Failed to start grid: IGFS metadata cache should be
transactional: igfs". So I understand TRANSACTIONAL is required for
metaCacheConfiguration, but I get no error when using ATOMIC
for dataCacheConfiguration, is there any reason to use TRANSACTIONAL for
dataCacheConfiguration? I understand ATOMIC gets better performance if you
don't use the transaction features.

 - The readThrough, writeThrough,writeBehind fields for the
CacheConfiguration dataCacheConfiguration and metaCacheConfiguration have
any effect? Or maybe IGFS is setting them according to the IgfsMode
configured in the defaultMode field of FileSystemConfiguration?

- Similarly, does the setExpiryPolicyFactory in dataCacheConfiguration and
metaCacheConfiguration have any effect? I'd be interested in
using DUAL_ASYNC defaultMode, and I though that maybe the ExpiryPolicy
could give an upper bound for the time it takes for a record to be written
to the secondary file system, because it has been expired from the cache.
That way I could safely tear down the IGFS cluster after that time without
any data loss. Is there some way of achieving that? Otherwise I think
DUAL_ASYNC could only be used in long lived cluster, because I understand
there is no functionality to flush the IGFS caches into the secondary file
system.

- Similarly, does the eviction policy configured for dataCacheConfiguration
and metaCacheConfiguration have any effect? In any case I understand that
IGFS can never fail due to having no more space in the caches, because it
will evict the requires entries, saving them to the secondary file system
if needed in order to avoid data loss.

It would be nice if someone could point me to some webminar or
documentation specific for IGFS. I have already watched
https://www.youtube.com/watch?v=pshM_gy7Wig and I think it is a good
introduction, but I would like to get more details. I have also read the
book "High-Performance In-Memory Computing With Apache Ignite"

Thanks a lot for all your help.

Best Regards,

Juan

Reply via email to