# Rows and columns and histograms

Warning: post below is more technical than usual. I finally figured out something which had been confusing me for a long time, although I’m sure it’s obvious to others. My explanation below is partly to solidify my own understanding and partly in case I’m not the only person confused about this.

I have been working with astronomical images for just about half my life now.
Usually these come in FITS format, with the conventions that the
pixels are identified in a Cartesian way: *x* increases to the right along
the horizontal axis, and *y* increases to the top along the vertical axis, with pixel
(1,1) in the lower left of the image. Pixel centers are integer coordinates.
My understanding is that neither of these conventions is universal.

Recently I was trying to plot a 2-D histogram using the `histogram2d`

function in `numpy`

. The idea of a
2-D histogram is that you have a set of points in the *xy* plane, you divide that plane
into bins (which you can also think of as pixels), and count how many points are in
each bin (this number would be the pixels’ value). The documentation explains how to
use the function and gives the helpful information that
`the histogram does not follow the Cartesian convention where `

x` values are on the abscissa and `

y```
values on the ordinate
axis. Rather,
```

x` is histogrammed along the first dimension of the array (vertical), and `

y```
along the second dimension of the array
(horizontal).
```

What does this helpful information mean? It’s a reminder that arrays in Python do not follow
the image conventions that I described above. In Python, array indices are written in
this order: `[row,column]`

and when printed, position `[0,0]`

is in the upper left.
The array axes are defined such that axis 0 is rows and axis 1 is columns, meaning that if `data`

is an array and you do something like `data.mean(axis=0)`

, you are averaging each column over all
rows, in other words along the vertical direction. If you do `data.mean(axis=1)`

, you are averaging
each row over all columns, in other words along the horizontal direction.

This lovely figure from the Software Carpentry Python lesson helps explain:

`histogram2d`

constructs an array by the procedure described above.
But the helpful information tells you that you can’t just plot the resulting array as an image over the (x,y) data points,
because the array has the axes interchanged, and it starts at the upper left rather than lower.
To plot the output of histogram2d with the data points, you have to either:

- plot it as an image but accept that you get your
`y`

values on the horizontal axis - transpose it using
`T`

(see Dave’s Matplotlib Basic Examples) - or rotate it and flip it along one axis (same as transposing; see 2-D Histogram)

Regardless of which of these methods are used, you also need to use `origin=lower`

in `matplotlib.imshow`

to get
(0,0) in the lower left.

I figured this out because I was trying to make a fancy plot with the 2-D histogram in the middle
and the associated 1-D histograms along its sides, a bit like
this,
which is based on this matplotlib example
The way to make the 1-D histograms is by summing
the 2-D histogram along one of the axes. Summing on `axis=0`

involves summing each column along rows,
so that should give you the number of objects as a function of your `y`

variable, and summing on
`axis=1`

should give you the number as a function of `x`

.

Here is a figure which uses this approach, and the code that produced it (this code actually required two 2-D histograms because of what I’m computing here; here is a somewhat simpler version).

I hope you find this useful; comments and/or corrections are welcome, either below or via pull request here.