Docker NLTK Download

well I tried all the methods suggested but nothing worked so I realized that nltk module searched in /root/nltk_data

step 1: i downloaded the punkt on my machine by using

python3
>>import nltk
>>nltk.download('punkt')

And the punkt was in /root/nltk_data/tokenizer

step 2: i copied tokenizer folder to my director and my directory looked something like this

.
|-app/
|-tokenizers/
|--punkt/
|---all those pkl files
|--punkt.zip

and step 3: then i modified the Dockerfile which copied that to my docker instance

COPY ./tokenizers /root/nltk_data/tokenizers

step 4: The new instance had punkt


In your Dockerfile, try adding instead:

RUN python -m nltk.downloader punkt

This will run the command and install the requested files to //nltk_data/

The problem is most likely related to using CMD vs. RUN in the Dockerfile. Documentation for CMD:

The main purpose of a CMD is to provide defaults for an executing container.

which is used during docker run <image>, not during build. So other CMD lines probably were overwritten by the last CMD python app.py line.


I was facing same issue when I was creating docker image with ubuntu image and python3 for django application.

I resolved as shown below.

# start from an official image
FROM ubuntu:16.04

RUN apt-get update \
  && apt-get install -y python3-pip python3-dev \
  && apt-get install -y libmysqlclient-dev python3-virtualenv

# arbitrary location choice: you can change the directory
RUN mkdir -p /opt/services/djangoapp/src
WORKDIR /opt/services/djangoapp/src

# copy our project code
COPY . /opt/services/djangoapp/src

# install dependency for running service
RUN pip3 install -r requirements.txt
RUN python3 -m nltk.downloader punkt
RUN python3 -m nltk.downloader wordnet

# Setup supervisord
RUN mkdir -p /var/log/supervisor
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf

# Start processes
CMD ["/usr/bin/supervisord"]