Re: Out of memory while reading excel file

2017-05-12 Thread codewizard
On Thursday, May 11, 2017 at 5:01:57 AM UTC-4, Mahmood Naderan wrote: > Excuse me, I changed > > csv.writer(outstream) > > to > > csv.writer(outstream, delimiter =' ') > > > It puts space between cells and omits "" around some content. However, > between two lines there is a new empty

Re: Out of memory while reading excel file

2017-05-12 Thread eryk sun
On Fri, May 12, 2017 at 8:03 PM, Peter Otten <__pete...@web.de> wrote: > I don't have a Windows system to test, but doesn't that mean that on Windows > > with open("tmp.csv", "w") as f: > csv.writer(f).writerows([["one"], ["two"]]) > with open("tmp.csv", "rb") as f: > print(f.read()) > >

Re: Out of memory while reading excel file

2017-05-12 Thread Peter Otten
Pavol Lisy wrote: > On 5/11/17, Peter Otten <__pete...@web.de> wrote: >> Mahmood Naderan via Python-list wrote: >>> between two lines there is a new empty line. In other word, the first >>> line is the first row of excel file. The second line is empty ("\n") and >>> the third line is the second

Re: Out of memory while reading excel file

2017-05-12 Thread Pavol Lisy
On 5/11/17, Peter Otten <__pete...@web.de> wrote: > Mahmood Naderan via Python-list wrote: > >> Excuse me, I changed >> >> csv.writer(outstream) >> >> to >> >> csv.writer(outstream, delimiter =' ') >> >> >> It puts space between cells and omits "" around some content. > > If your data doesn't

Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Thanks a lot for suggestions. It is now solved. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list

Re: Out of memory while reading excel file

2017-05-11 Thread Peter Otten
Mahmood Naderan via Python-list wrote: > Excuse me, I changed > > csv.writer(outstream) > > to > > csv.writer(outstream, delimiter =' ') > > > It puts space between cells and omits "" around some content. If your data doesn't contain any spaces that's fine. Otherwise you need a way to

Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Excuse me, I changed csv.writer(outstream) to csv.writer(outstream, delimiter =' ') It puts space between cells and omits "" around some content. However, between two lines there is a new empty line. In other word, the first line is the first row of excel file. The second line is empty

Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Thanks. That code is so simple and works. However, there are things to be considered. With the CSV format, cells in a row are separated by ',' and for some cells it writes "" around the cell content. So, if the excel looks like CHR1 11,232,445 The output file looks like

Re: Out of memory while reading excel file

2017-05-11 Thread Peter Otten
Mahmood Naderan via Python-list wrote: > I wrote this: > > a = np.zeros((p.max_row, p.max_column), dtype=object) > for y, row in enumerate(p.rows): > for cell in row: > print (cell.value) > a[y] = cell.value In the line above you overwrite the row in the numpy

Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
I wrote this: a = np.zeros((p.max_row, p.max_column), dtype=object) for y, row in enumerate(p.rows): for cell in row: print (cell.value) a[y] = cell.value print (a[y]) For one of the cells, I see NM_198576.3 ['NM_198576.3' 'NM_198576.3' 'NM_198576.3'

Re: Out of memory while reading excel file

2017-05-11 Thread Peter Otten
Mahmood Naderan via Python-list wrote: >>a = numpy.zeros((ws.max_row, ws.max_column), dtype=float) >>for y, row in enumerate(ws.rows): >> a[y] = [cell.value for cell in row] > > > > Peter, > > As I used this code, it gave me an error that cannot convert string to > float for the first cell.

Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Hi, I used the old fashion coding style to create a matrix and read/add the cells. W = load_workbook(fname, read_only = True) p = W.worksheets[0] m = p.max_row n = p.max_column arr = np.empty((m, n), dtype=object) for r in range(1, m): for c in range(1, n): d = p.cell(row=r,

Re: Out of memory while reading excel file

2017-05-10 Thread Mahmood Naderan via Python-list
>a = numpy.zeros((ws.max_row, ws.max_column), dtype=float) >for y, row in enumerate(ws.rows): > a[y] = [cell.value for cell in row] Peter, As I used this code, it gave me an error that cannot convert string to float for the first cell. All cells are strings. Regards, Mahmood --

Re: Out of memory while reading excel file

2017-05-10 Thread Mahmood Naderan via Python-list
Hi, I am confused with that. If you say that numpy is not suitable for my case and may have large overhead, what is the alternative then? Do you mean that numpy is a good choice here while we can reduce its overhead? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list

Re: Out of memory while reading excel file

2017-05-10 Thread Mahmood Naderan via Python-list
On Wed, 5/10/17, Peter Otten <__pete...@web.de> wrote: Subject: Re: Out of memory while reading excel file To: python-list@python.org Date: Wednesday, May 10, 2017, 6:30 PM Mahmood Naderan via Python-list wrote: > Well actually cells are treated as strings and not integer

Re: Out of memory while reading excel file

2017-05-10 Thread Peter Otten
Mahmood Naderan via Python-list wrote: > Well actually cells are treated as strings and not integer or float > numbers. May I ask why you are using numpy when you are dealing with strings? If you provide a few details about what you are trying to achieve someone may be able to suggest a

Re: Out of memory while reading excel file

2017-05-10 Thread Irmen de Jong
On 10-5-2017 17:12, Mahmood Naderan wrote: > So, I think numpy is unable to manage the memory. That assumption is very likely to be incorrect. >> np.array([[i.value for i in j] for j in p.rows]) I think the problem is in the way you feed your excel data into the numpy array constructor. The

Re: Out of memory while reading excel file

2017-05-10 Thread Mahmood Naderan via Python-list
rows. Mine is about 100k. Currently, the task manager shows about 4GB of ram usage while working with numpy. Regards, Mahmood On Wed, 5/10/17, Peter Otten <__pete...@web.de> wrote: Subject: Re: Out of memory while reading excel file To: pytho

Re: Out of memory while reading excel file

2017-05-10 Thread Peter Otten
Mahmood Naderan via Python-list wrote: > Thanks for your reply. The openpyxl part (reading the workbook) works > fine. I printed some debug information and found that when it reaches the > np.array, after some 10 seconds, the memory usage goes high. > > > So, I think numpy is unable to manage

Re: Out of memory while reading excel file

2017-05-10 Thread Mahmood Naderan via Python-list
Thanks for your reply. The openpyxl part (reading the workbook) works fine. I printed some debug information and found that when it reaches the np.array, after some 10 seconds, the memory usage goes high. So, I think numpy is unable to manage the memory. Regards, Mahmood On Wednesday,

Re: Out of memory while reading excel file

2017-05-10 Thread Peter Otten
Mahmood Naderan via Python-list wrote: > Hello, > > The following code which uses openpyxl and numpy, fails to read large > Excel (xlsx) files. The file si 20Mb which contains 100K rows and 50 > columns. > > > > W = load_workbook(fname, read_only = True) > > p = W.worksheets[0] > > a=[] >

Out of memory while reading excel file

2017-05-10 Thread Mahmood Naderan via Python-list
Hello, The following code which uses openpyxl and numpy, fails to read large Excel (xlsx) files. The file si 20Mb which contains 100K rows and 50 columns. W = load_workbook(fname, read_only = True) p = W.worksheets[0] a=[] m = p.max_row n = p.max_column np.array([[i.value for i in j]