Hi Sam,
cool that you made the modifications ;-)
I think the overall changes look good. There are only a few (minor) comments
you might consider if you like..
I would rather avoid to store the encode & decode functions explicitly in the
PVFS_hint_s struct, because it is contained in the PINT_hi
Hi,
I completely agree with the proposed parameter and the potential orderings
(none, random-start, etc...). But I also agree with Walt:
>distribution, thus that really should be its own thing.
Therefore, I propose to have a new interface and new modules which does the
job, kind of a DatafileSel
Hi,
In general like the proposed solution better than the current, but an
extension which allows the distribution to see the aliases of the servers
would be neat as well. This certainly would solve the issue of placing
datafiles.
>Julian, how would this stuff fit in with your migration/hints st
Hi,
> Whoops, one other thing to report; apparently not all db libraries have
> the get_pagesize() function either. I happen to be trying this on a box
> with version 4.1.25 of db.
I like that you guys spend a lot of time to ensure that pvfs2 works with all
kind of different environments. However
Hi guys,
I found another unexpected behavior :(
This time I get in trouble when I create a unbalanced distribution over the
datafiles with MPI_Type_struct. I tried with 5 dataservers and with 2
dataservers, the example I will give here is for 2 dataservers.
The datatype I use for the view places
Hi guys,
I found another unexpected behavior :(
This time I get in trouble when I create a unbalanced distribution over the
datafiles with MPI_Type_struct. I tried with 5 dataservers and with 2
dataservers, the example I will give here is for 2 dataservers.
The datatype I use for the view places
Hi,
We see a rather strange and wrong behavior with PVFS2 using a file view with
MPI-IO using different levels :)
mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level0
000 0101 0101 0101
010 0101 0101 0101 0101 0101 0101 0101 0101
*
030 0101 000
Hi,
Maybe you remember my old email where I talked about the visualization
environment? Maybe I should have sent it to the developers list instead of
the internal list :-)
Well, I thought it is nice to visualize the state transitions with jumpshot,
e.g. create a event for each new state. In orde
Hi,
I found some unexpected behavior in the distribution behavior if I set in
MPI-programs the MPI hint striping_unit to (whatever) value. This can be
reproduced with an arbitrary MPI-program creating the file with the create
mode.
For example if I run the MPI-IO.c program (which is attached)
Hi,
cool patch ;)
> This still requires that the server be passed in a command line
> parameter which is the alias for that particular server. I just can't
> think of any way around that.
I think for most configurations (where only one pvfs2-server is running on
each node) the server might guess
Hi,
> Do you mean Pete's IB 4x numbers or my MX-10G numbers? Or both?
both are good, sorry I meant especially the throughput of the new
implementation MX-10G.
> I am not sure that I follow you here. Ideally, I only want to measure
> network activity and PVFS2 overhead. I would prefer to avoid
> me
Hi,
These results are quite interesting and the possible throughput you get with
IB is amazing.
However, I think you get into trouble to find the bottleneck of the operation
and the reason for the observed gaps due to the complexity of the software
stack.
Maybe you could benefit from using T
Hi,
you can find a excerpt of the documentation I made and still make for PVFS2
here:
http://www.rzuser.uni-heidelberg.de/~jkunkel2/pvfs2-doc.pdf
Although it is not finished yet and contains some typos it documents parts of
the interaction between system call and server statemachine invocation..
Hi,
is there a specific reasons to let the states
create_setattr_setup_msgpair transit by default to cleanup instead of
create_setattr_failure
and datafiles_setup_msgpair_array transit to cleanup instead of
datafiles_failure
For the other states like dspace_create_setup_msgpair and
dspace_cre
Hi,
> So, the plan is to put all of the hints in one long string and just pass
> that string in through the create interface?
not necessarily, I think it could be seperated also. I think the example I
gave was similar to the way of passing in distribution parameters into MPI-IO
some guys have imp
Hi,
>So I think we're mostly trying to work out what our API should really
>be, whether we should extend the distro functionality vs. going totally
>to hints, and if we go to hints what that API should look like, right?
probably the distribution needs a function which initializes/sets the interna
Hi,
> Any other ideas?
The whole issue would be resolved naturally by using the filename as key
instead of the number. Also the pcache would not longer be necessary...
I know I'm really persistant on this one :)
Julian
___
Pvfs2-developers mailing list
Hi,
> The server-settable-dist would be implemented to store the indices
> for the IO servers in the params field of the PVFS_sys_dist
> structure. I used server indices, because the PINT_dist_* interfaces
> allow for that (through the PINT_request_file_data struct), but its a
> bit ugly and proba
Hi,
> This way callers wouldn't be able to muck with the internals of the
> hint struct.
Ok, I will definitly do this.
> As I said, I prefer letting the hint struct be defined
> externally and requiring an array of them to the system interfaces.
> It seems to match what we have throughout
Hi,
thanks to your reply, no problem :)
> * After some discussion with RobR, I think we'd prefer a little
> different hint structure:
>
> typedef struct
> {
> char * type; /* null terminated */
> char * hint;
> int length;
> } PVFS_hint;
> This essentially replaces the type with a
Hi,
> calling any of the coalescing code. Julian and I had talked about this
> being a problem a while back, but I guess it never got looked at. In
> any case, I was able to cleanup the sync coalescing code some, so it was
> probably worth it.
I remember that I actually commited a fix to my branc
Hi,
> We were thinking of going about solving this in a 3 step process.
This is really cool ;)
I think it might be a good idea to thing about the zero conf issue and about
extending the server farm during this process. This actually might not be
much more efford to think about it and make plans
Hi,
> Indeed, this would be a really nice feature to have.
> But like many other ideas, this depends on server-to-server :)
> So I guess we would have to wait until we have that in place..
> thanks,
Server to server already seems to works well with minor changes, I use it in
my migration branch (m
Hi,
> I have checked in a patch that will allow the pvfs2-ping utility to verify
> if the fs.conf files obtained from all the servers for a given fsid
Going further to a (almost) zeroconf version of PVFS2 I was thinking about an
long term alternative that each server fetches the fs.conf from a (ma
Hi guys,
I want to bother you again with the hint stuff ;-)
As I regard only Pete said something to the hints so far. So I have no idea if
you like it or not. Propably you want (me) to modify it...
Pete mentioned that it is not good to add the hints to all operations, e.g.
the pvfs2 nope operati
Hi,
> On gentoo the man page actually lists _XOPEN_SOURCE 500 as the magic
> #define to add when using pread/pwrite, so that was the route that I
> tried first. This breaks the pvfs2 build, though- apparently that flag
> also turns off some features that we already use in addition to adding
> new
Hi,
> In any case, this looks like it was storing the admin mode variable on
> client-side rather than on the server and could be removed...
oh now I see what you meant, I understood the question partly wrong.
Ok, I can't see the connection between the client side variable g_admin_mode
and the ser
Hi,
looks like that is still true.
fchk sets it and the request scheduler seems to enforce that a non managment
operation can not be done while it is set. Also it enforces that the admin
mode can not be set while other operations are in progress.
julian
__
Hi,
> Okay. I think I understood. Thanks for the clarification!
Glad that that makes sense, I was thinking about how to do it a while...
> Can you send the migration tool as well or check that in to your branch?
It is already in my branch, the tool is pvfs2-migrate
syntax: pvfs2-migrate -d , you
Hi,
> Could you rephrase the above? I dont think I understand what it means..
Sorry for the weird phrases. I give it a new shot from the clients point of
view.
At the beginning of an I/O request the client already requests the files array
of datafiles from the metadata server.
Now the I/O start
Hi,
Maybe the old datafiles are not deleted properly ? Maybe we could try the
following to figure out if that is the case, choose a file on the old setting
(4 servers) (or create a new one lets say 100MByte) printing out the real file
names in the storage space with pvfs2-viewdist, in addition d
Hi,
> Hmm.. problem with ENOENT is
> how would you differentiate between migration and lost dfiles?
> What if you keep getting dfile array and retries with ENOENTS from the
> same server again and again? Will you conclude after the second retry that
> it is a corrupted FS?
The client could store a
Hi,
> of jumping back to io_datafile_post_msgpairs, I think you'll want to
> jump all the way back to io_init. Its probably easier to create
> another return code (IO_REINIT or something), and return that from
> io_datafile_complete_operations. I think there will be some cleanup
> that yo
Hi,
I want to adapt the I/O statemachines to reread the dfile array in case a I/O
server responds with PVFS_ENOENT during the flow or within the inital I/O
ACK. This might happen if the file is migrated away and the client does not
have the updated dfile array befor it initiates the I/O.
Thus, I
Hey Murali,
> Would you want to differentiate between threads and processes?
> Storing current->pid will mean you cannot differentiate amongst threads...
> just a thought.
I couldn't figure out how that is supposed to work in the small period of
time. If that can be done easily I think it would be
whoops in line 103 has to be a MPI_COMM_WORLD,
I attached the modified MPE.patch
Sorry,
Julian
diff -r 9e1df1d6fa98 src/wrappers/src/log_mpi_io.c
--- a/src/wrappers/src/log_mpi_io.c Mon Aug 21 12:37:31 2006 +0200
+++ b/src/wrappers/src/log_mpi_io.c Mon Aug 21 14:50:03 2006 +0200
@@ -8,7 +8,6 @@
Hi,
I attached a MPE patch which attaches the request ID to most calls for pvfs2.
The patch has to be applied in the directory mpich/src/mpe2
The example output of the hacked create and io.sm returns the following when
running with a mpi-io-test of 2 bytes:
Create request ID received:
host:
Hi,
I commited a patch (to the kunkel branch) which adds hint support to the
client-core and the kernel. Therefore each request creates the request id
hint if the core parameter "--create-request-id" was given.
It adds the hostname and the programs pid to the request callid.
I think it is good t
Hi,
I attached a patch which supports the PVFS-hint in MPI,
however, it looks like the adio delete functions do not give the hint to any
implementation. I haven't seen any which uses the hint for delete, but
shouldn't the hint be given to the implementation ? If there are no reasons
not to give
Hi,
I uploaded a unfinished I/O testsuite for PVFS2 to
http://www.rzuser.uni-heidelberg.de/~jkunkel2/direct-io-0.1-dist.tgz.
This testsuite tries several I/O strategies.
There is still a lot to do, but I wanted to get some feedback from you guys,
and maybe one of you can run the test on a RA
Hi,
> I don't like having a PVFS_hint argument on every sysint call.
> Seems like some selectivity would be warranted. What kind of hint
> will one ever want to pass to PVFS_mgmt_noop() for instance?
> But I'll let others say if they think this is the way to go.
Right for the noop maybe, but I was
Hi,
I attached a patch which adds PVFS_hint to most system interface calls,
these hints can be used on the client side only or are automatically
transfered to the server if a flag is set in the hint_transfer_to_server
array for the appropriate hint. The hint itself consists of a number (hint
typ
Hi Murali,
> Aren't the two issues (adding a new parameter to the system interface and
> the ability to create dfiles on specific nodes) sufficiently
> orthogonal/different?
Yes they are, but if we extend the system interface then we could easily add
another parameter if necessary and pending chan
Hi,
we talked about adding a request ID to the system interface. Therefore I think
we discussed adding a new parameter all system interface calls which can be
set from MPI or the client core. I think when we extend the list of
parameters then we could easily add another parameter as well, if ne
Hi,
really a cool patch !
Also thanks for pointing out that glibc schedules only one I/O request at a
time. This seems to nullify all the possible benefits of using aio.
To avoid redundant work I want to post my current plans regarding I/O to the
list.
Right now I'm experimenting / building a p
Hi,
> I think we should process the unprocessed requests before we have the
> server exit.
By unprocessed requests you mean unprocessed unexpected messages, job, bmi
and/or trove requests ?
I think unprocessed unexp messages shoudn't be processed, the client will
restart them once the server is
Hi,
between linux 2.6.16 and 2.6.17 a new kernel option has been integrated into
the kernel (kernel help):
CONFIG_BLK_DEV_IO_TRACE:
Say Y here, if you want to be able to trace the block layer actions
Hi,
> status? The state of the operation is already gone, so you can't
> distinguish between the caller giving you a bogus ID and the caller
> giving you the ID of something that recently completed. The thread that
> called test() assuming that the operation would be there will probably
> be conf
Hi,
> I would prefer to see this handled by the configure script, if
> possible. No O_DIRECT in fcntl.h => no support in pvfs2; we
> shouldn't get in the business of defining these constants if
> possible to avoid it.
right, I agree. In the moment I try to get a working O_DIRECT version in order
Hi,
also another note I forget to mention in the mail, why the non-threaded
version is a bit faster for some case like creates.
Due to the randomness of the requests of 15 clients it can happen that in the
pending queue are a couple of lookup operations for a create operation. Of
course lookup
Hi,
I did some tests in order to compare the multi threaded (separated read
thread) with the single threaded version of my branch.
Unfortunately, I have no nice diagrams this time :(
In this test each modifying op syncs the db.
I used 15 clients and one server. I modified the request scheduler to
Hi,
Right now I'm working to get a threaded I/O dbpf implementation (later it will
support O_Direct).
Right now we have the configure option --disable-aio-threaded-callbacks, I
wonder if somebody still uses it ? Should we get rid of it, that still will
give people the option to use the new dbp
Hi,
on http://www.rzuser.uni-heidelberg.de/~jkunkel2/150GB.tgz you will find a tar
archive comparing ODirect disk access on a cool raid machine accessing a
total precreated file with a size of 150GByte.
Up to the size of 128KByte all threads iterate 1 times and access the
given blocksize ra
Hi,
> Let me know if you have a suggestion on how to run the sequential
> tests that Walt was thinking would be a good idea to try.
Yeah, you could try the flag "-M" to the testrun, each thread will then
randomly search for a start location and then processes the file
sequentially.
Supringinly th
Hi,
cool link for the ext3 implementation :)
>gen_safe just stores the pointer in a hashtable and returns a hash
>key in replace of it. There's no reference counting done on the
>pointer, so its not a way of implementing smart pointers.
The same issue with test calls and cancel calls directly
Hi,
I think that is exacly what happens, unprocessed requests are discarded. I
thought the reason that it is the way it is must be that server failure /
crash and shutdown are handled in the same way ? The client might retry the
operations after the server is back up.
Julian
___
Hi,
> Maybe if we could differentiate between user errors (ENOENT) and
> unexpected system errors (ENOMEM, etc.), we could do that.
> Unfortunately, right now we don't differentiate, so some fellow might
> come along and do an ls on a directory or file that doesn't exist,
> get an error, causing a
Hi,
> Can you change the immediate completion flag and send another?
good idea, this stripped some stuff out from the statemachines...
> The case you're probably looking for is if
> the low watermark is set to two, and the queue never gets higher than
Right, that was the case I was looking for.
Its
Hi,
Thanks, hope the patch works alright...
> I'm not convinced this is the best value for the default case.
> If the server can handle operations fast enough that the queue
> remains at 1, coalescing doesn't seem necessary, and it will give
> widely varied timings for some of the operations
Hi,
I think it would be nice if pvfs2-genconfig generates this value automatically
for you in case --notrovesync is specified, instead of requiring the user to
add the coalesync values in this cases as well...
Thanks,
Julian
___
Pvfs2-developers mailin
Hi,
I looked a bit arround the implementation of the data sync mode,
currently the PINT_flow_setinfo is called which sets the sync mode for each
write operation of a flow. That means if 100 MByte are transfered for blocks
with 256 Kbyte a sync happens, which ends up in quite a lot syncs.
Maybe
Hi,
enclosed you will find a patch adding the configuration flags:
TCPBufferSend and TCPBufferReceive to the default part of the configuration
file.
I set the default values to receive = 65535 and send = 131071 which were used
when no configuration options are set, 131071 is the max value (set by
Hi,
I modified apps/karma + statfs and the request protocoll to add the number of
active bmi connections and the used amount of swap to the server fs details.
Therefore I added a new value to the perf counter which is modified in
bmi_tcp. I'm not sure how to figure the active connections out for
Hi,
> > 1. rmdirent(object name,parent handle) => object handle
> > 2. getattr(object handle) => attr
> > 3. remove(object handle) => ENOTEMPTY
> > 4. crdirent(object_handle)
> Ahh, you are right. I didn't realize that the attributes weren't read
> until after removing the directory entry. Wi
Hi,
I try to document the handling of some operations.
However the IO handling with flow is a bit complicated ;-)
I try to summarize only important steps of the IO process (including states of
the state-machines) and would be very happy if you could have a look and give
me hints if something is
Hi,
I have a question, if a client wants to remove a handle first the directory
entry is removed and then the client verifies if a directory is going to be
removed.
If a non empty directory is going to be removed the client just creates the
directory entry again for the parent directory, because
Hi,
I generate some diagrams for TAS using multiple databases for the key/value
pairs.
Last week I upgraded to the newest CVS version, it looks like performance did
change between the 4 week old CVS version (last patch
Date: 2005/11/14 20:43:46) and the 1 week old CVS version (last patch:
2005
Hi,
I attached some benchmark results for mpi-create on chiba. Every node does
create 1000 files. The clients and servers are disjoint. In the x-direction
the number of clients and severs increase. Every server is a meta and data
server. I think it looks like the creation rate is constant for
68 matches
Mail list logo