How to plot Gaia astrometry data to TESS images using Python?

First I have to say, great question! Very detailed and reproducible. I went through your question and tried to redo the exercise starting from your git repo and downloading the catalogue from the GAIA archive.

EDIT

Programmatically your code is fine (see OLD PART below for a slightly different approach). The problem with the missing points is that one only gets 500 data points when downloading the csv file from the GAIA archive. Therefore it looks as if all points from the query are crammed into a weird shape. However if you restrict the radius of the search to a smaller value you can see that there are points that lie within the TESS image:

enter image description here

please compare to the version shown below in the OLD PART. The code is the same as below only the downloaded csv file is for a smaller radius. Therefore it seems that you just downloaded a part of all available data from the GAIA archive when exporting to csv. The way to circumvent this is to do the search as you did. Then, on the result page click on Show query in ADQL form on the bottom and in the query you get displayed in SQL format change:

Select Top 500

to

Select

at the beginning of the query.

OLD PART (code is ok and working but my conclusion is wrong):

For plotting I used aplpy - uses matplotlib in the background - and ended up with the following code:

from astropy.io import fits
from astropy.wcs import WCS
import aplpy
import matplotlib.pyplot as plt
import pandas as pd
from astropy.coordinates import SkyCoord
import astropy.units as u
from astropy.io import fits 


fits_file = fits.open("4687500098271761792_med.fits")
central_coordinate = SkyCoord(fits_file[0].header["CRVAL1"],
                              fits_file[0].header["CRVAL2"], unit="deg")

figure = plt.figure(figsize=(10, 10))
fig = aplpy.FITSFigure("4687500098271761792_med.fits", figure=figure)
cmap = "gist_heat"
stretch = "log"

fig.show_colorscale(cmap=cmap, stretch=stretch)
fig.show_colorbar()

df = pd.read_csv("4687500098271761792_within_1000arcsec.csv")    

# the epoch found in the dataset is J2015.5
df['coord'] = SkyCoord(df["ra"], df["dec"], unit="deg", frame="icrs",
                       equinox="J2015.5")
coords = df["coord"].tolist()
coords_degrees = [[coord.ra.degree, coord.dec.value] for coord in df["coord"]]
ra_values = [coord[0] for coord in coords_degrees]
dec_values = [coord[1] for coord in coords_degrees]

width = (40*u.arcmin).to(u.degree).value
height = (40*u.arcmin).to(u.degree).value
fig.recenter(x=central_coordinate.ra.degree, y=central_coordinate.dec.degree, 
             width=width, height=height)
fig.show_markers(central_coordinate.ra.degree,central_coordinate.dec.degree, 
                 marker="o", c="white", s=15, lw=1)
fig.show_markers(ra_values, dec_values, marker="o", c="blue", s=15, lw=1)
fig.show_circles(central_coordinate.ra.degree,central_coordinate.dec.degree, 
                 radius=(1000*u.arcsec).to(u.degree).value, edgecolor="black")
fig.save("GAIA_TESS_test.png")

However this results in a plot similar to yours:

enter image description here

To check my suspicion that the coordinates from the GAIA archive are correctly displayed I draw a circle of 1000 arcsec from the center of the TESS image. As you can see it aligns roughly with the circular shape of the outer (seen from the center of the image) side of the data point cloud of the GAIA positions. I simply think that these are all points in the GAIA DR2 archive that fall within the region you searched. The data cloud even seems to have a squarish boundary on the inside, which might come from something as a square field of view.


Really nice example. Just to mention that you can also integrate the query to the Gaia archive by using the astroquery.gaia module included in astropy

https://astroquery.readthedocs.io/en/latest/gaia/gaia.html

In this way, you will be able to run the same queries that are inside the Gaia archive UI and change to different sources in an easier way

from astroquery.simbad import Simbad
import astropy.units as u
from astropy.coordinates import SkyCoord
from astroquery.gaia import Gaia

result_table = Simbad.query_object("Gaia DR2 4687500098271761792")
raValue = result_table['RA']
decValue = result_table['DEC']

coord = SkyCoord(ra=raValue, dec=decValue, unit=(u.hour, u.degree), frame='icrs')

query = """SELECT TOP 1000 * FROM gaiadr2.gaia_source 
           WHERE CONTAINS(POINT('ICRS',gaiadr2.gaia_source.ra,gaiadr2.gaia_source.dec), 
           CIRCLE('ICRS',{ra},{dec},0.2777777777777778))=1 ORDER BY random_index""".format(ra=str(coord.ra.deg[0]),dec=str(coord.dec.deg[0]))


job = Gaia.launch_job_async(query)  
r = job.get_results()

ralist = r['ra'].tolist()
declist = r['dec'].tolist()

import matplotlib.pyplot as plt
plt.scatter(ralist,declist,marker='+')
plt.show()

First 1000 rows, ordered by random index

Please notice I have added the order by random_index that will eliminate this strange non-circular behaviour. This index is quite useful to do not force the full output for initial tests.

Also, I have declared the coordinates output for the right ascension from Simbad as hours.

Finally, I have used the asynchronous query that has less limitations in execution time and maximum rows in the response.

You can also change the query to

query = """SELECT * FROM gaiadr2.gaia_source 
               WHERE CONTAINS(POINT('ICRS',gaiadr2.gaia_source.ra,gaiadr2.gaia_source.dec), 
               CIRCLE('ICRS',{ra},{dec},0.2777777777777778))=1""".format(ra=str(coord.ra.deg[0]),dec=str(coord.dec.deg[0]))

(removing the limitation to 1000 rows) (in this case, the use of the random index is not necessary) to have a full response from the server.

Of course, this query takes some time to be executed (around 1.5 minutes). The full query will return 103574 rows.

All sources. 103574 rows