python - Ambiguity in Pandas Dataframe / Numpy Array "axis" definition -
i've been confused how python axes defined, , whether refer dataframe's rows or columns. consider code below:
>>> df = pd.dataframe([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"]) >>> df col1 col2 col3 col4 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3
so if call df.mean(axis=1)
, we'll mean across rows:
>>> df.mean(axis=1) 0 1 1 2 2 3
however, if call df.drop(name, axis=1)
, drop column, not row:
>>> df.drop("col4", axis=1) col1 col2 col3 0 1 1 1 1 2 2 2 2 3 3 3
can me understand meant "axis" in pandas/numpy/scipy?
a side note, dataframe.mean
might defined wrong. says in documentation dataframe.mean
axis=1
supposed mean mean on columns, not rows...
it's perhaps simplest remember 0=down , 1=across.
this means:
- use
axis=0
apply method down each column, or row labels (the index). - use
axis=1
apply method across each row, or column labels.
here's picture show parts of dataframe each axis refers to:
it's useful remember pandas follows numpy's use of word axis
. usage explained in numpy's glossary of terms:
axes defined arrays more 1 dimension. 2-dimensional array has 2 corresponding axes: first running vertically downwards across rows (axis 0), , second running horizontally across columns (axis 1). [my emphasis]
so, concerning method in question, df.mean(axis=1)
, seems correctly defined. takes mean of entries horizontally across columns, is, along each individual row. on other hand, df.mean(axis=0)
operation acting vertically downwards across rows.
similarly, df.drop(name, axis=1)
refers action on column labels, because intuitively go across horizontal axis. specifying axis=0
make method act on rows instead.
Comments
Post a Comment