python - Ambiguity in Pandas Dataframe / Numpy Array "axis" definition -
i've been confused how python axes defined, , whether refer dataframe's rows or columns. consider code below:
>>> df = pd.dataframe([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"]) >>> df col1 col2 col3 col4 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 so if call df.mean(axis=1), we'll mean across rows:
>>> df.mean(axis=1) 0 1 1 2 2 3 however, if call df.drop(name, axis=1), drop column, not row:
>>> df.drop("col4", axis=1) col1 col2 col3 0 1 1 1 1 2 2 2 2 3 3 3 can me understand meant "axis" in pandas/numpy/scipy?
a side note, dataframe.mean might defined wrong. says in documentation dataframe.mean axis=1 supposed mean mean on columns, not rows...
it's perhaps simplest remember 0=down , 1=across.
this means:
- use
axis=0apply method down each column, or row labels (the index). - use
axis=1apply method across each row, or column labels.
here's picture show parts of dataframe each axis refers to:

it's useful remember pandas follows numpy's use of word axis. usage explained in numpy's glossary of terms:
axes defined arrays more 1 dimension. 2-dimensional array has 2 corresponding axes: first running vertically downwards across rows (axis 0), , second running horizontally across columns (axis 1). [my emphasis]
so, concerning method in question, df.mean(axis=1), seems correctly defined. takes mean of entries horizontally across columns, is, along each individual row. on other hand, df.mean(axis=0) operation acting vertically downwards across rows.
similarly, df.drop(name, axis=1) refers action on column labels, because intuitively go across horizontal axis. specifying axis=0 make method act on rows instead.
Comments
Post a Comment