Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
O
OvoTools
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
林帅浩
OvoTools
Commits
7b2c99c7
Commit
7b2c99c7
authored
Oct 02, 2019
by
Ilya Ovodov
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Threading dataloader
parent
298083c6
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
68 additions
and
0 deletions
+68
-0
__init__.py
ovotools/pytorch/__init__.py
+2
-0
threading_dataloader.py
ovotools/pytorch/threading_dataloader.py
+66
-0
No files found.
ovotools/pytorch/__init__.py
0 → 100644
View file @
7b2c99c7
from
.threading_dataloader
import
BatchThreadingDataLoader
,
ThreadingDataLoader
\ No newline at end of file
ovotools/pytorch/threading_dataloader.py
0 → 100644
View file @
7b2c99c7
'''
Multithread pytorch dataloaders.
Usefull for due to improper work of standard DataLoader in a multiprocess (num_workers > 0) mode.
See issue https://github.com/pytorch/pytorch/issues/12831 etc.
Based on https://github.com/lopuhin/kaggle-imet-2019/blob/f1ec0827149a8218430a6884acf49c27ba6fcb1f/imet/utils.py#L35-L56
'''
import
torch
from
multiprocessing.pool
import
ThreadPool
class
BatchThreadingDataLoader
(
torch
.
utils
.
data
.
DataLoader
):
'''
Prepares each batch in a separate thread, including collate, so no processing is done in the caller thread.
Requires map-style dataset (i.e. dataset with indexing) and automatic batching.
Ignores worker_init_fn parameter.
'''
def
__iter__
(
self
):
sample_iter
=
iter
(
self
.
batch_sampler
)
if
self
.
num_workers
==
0
:
for
indices
in
sample_iter
:
yield
self
.
_get_batch
(
indices
)
else
:
prefetch
=
self
.
num_workers
with
ThreadPool
(
processes
=
self
.
num_workers
)
as
pool
:
futures
=
[]
for
indices
in
sample_iter
:
futures
.
append
(
pool
.
apply_async
(
self
.
_get_batch
,
args
=
(
indices
,)))
if
len
(
futures
)
>
prefetch
:
yield
futures
.
pop
(
0
)
.
get
()
for
batch_futures
in
futures
:
yield
batch_futures
.
get
()
def
_get_item
(
self
,
i
):
return
self
.
dataset
[
i
]
def
_get_batch
(
self
,
indices
):
return
self
.
collate_fn
([
self
.
_get_item
(
i
)
for
i
in
indices
])
class
ThreadingDataLoader
(
torch
.
utils
.
data
.
DataLoader
):
'''
Original dataset from https://github.com/lopuhin/kaggle-imet-2019
Prepares each dataset item in a separate thread, but collation is processed in the caller thread.
Requires map-style dataset (i.e. dataset with indexing) and automatic batching.
Ignores worker_init_fn parameter.
'''
def
__iter__
(
self
):
sample_iter
=
iter
(
self
.
batch_sampler
)
if
self
.
num_workers
==
0
:
for
indices
in
sample_iter
:
yield
self
.
collate_fn
([
self
.
_get_item
(
i
)
for
i
in
indices
])
else
:
prefetch
=
1
with
ThreadPool
(
processes
=
self
.
num_workers
)
as
pool
:
futures
=
[]
for
indices
in
sample_iter
:
futures
.
append
([
pool
.
apply_async
(
self
.
_get_item
,
args
=
(
i
,))
for
i
in
indices
])
if
len
(
futures
)
>
prefetch
:
yield
self
.
collate_fn
([
f
.
get
()
for
f
in
futures
.
pop
(
0
)])
for
batch_futures
in
futures
:
yield
self
.
collate_fn
([
f
.
get
()
for
f
in
batch_futures
])
def
_get_item
(
self
,
i
):
return
self
.
dataset
[
i
]
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment