Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
O
OvoTools
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
林帅浩
OvoTools
Commits
a34104ba
Commit
a34104ba
authored
Oct 02, 2019
by
iovodov
Browse files
Options
Browse Files
Download
Plain Diff
Merge remote-tracking branch 'remotes/origin/master' into adaptiveLR
parents
93d8841b
7b2c99c7
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
68 additions
and
0 deletions
+68
-0
__init__.py
ovotools/pytorch/__init__.py
+2
-0
threading_dataloader.py
ovotools/pytorch/threading_dataloader.py
+66
-0
No files found.
ovotools/pytorch/__init__.py
0 → 100644
View file @
a34104ba
from
.threading_dataloader
import
BatchThreadingDataLoader
,
ThreadingDataLoader
\ No newline at end of file
ovotools/pytorch/threading_dataloader.py
0 → 100644
View file @
a34104ba
'''
Multithread pytorch dataloaders.
Usefull for due to improper work of standard DataLoader in a multiprocess (num_workers > 0) mode.
See issue https://github.com/pytorch/pytorch/issues/12831 etc.
Based on https://github.com/lopuhin/kaggle-imet-2019/blob/f1ec0827149a8218430a6884acf49c27ba6fcb1f/imet/utils.py#L35-L56
'''
import
torch
from
multiprocessing.pool
import
ThreadPool
class
BatchThreadingDataLoader
(
torch
.
utils
.
data
.
DataLoader
):
'''
Prepares each batch in a separate thread, including collate, so no processing is done in the caller thread.
Requires map-style dataset (i.e. dataset with indexing) and automatic batching.
Ignores worker_init_fn parameter.
'''
def
__iter__
(
self
):
sample_iter
=
iter
(
self
.
batch_sampler
)
if
self
.
num_workers
==
0
:
for
indices
in
sample_iter
:
yield
self
.
_get_batch
(
indices
)
else
:
prefetch
=
self
.
num_workers
with
ThreadPool
(
processes
=
self
.
num_workers
)
as
pool
:
futures
=
[]
for
indices
in
sample_iter
:
futures
.
append
(
pool
.
apply_async
(
self
.
_get_batch
,
args
=
(
indices
,)))
if
len
(
futures
)
>
prefetch
:
yield
futures
.
pop
(
0
)
.
get
()
for
batch_futures
in
futures
:
yield
batch_futures
.
get
()
def
_get_item
(
self
,
i
):
return
self
.
dataset
[
i
]
def
_get_batch
(
self
,
indices
):
return
self
.
collate_fn
([
self
.
_get_item
(
i
)
for
i
in
indices
])
class
ThreadingDataLoader
(
torch
.
utils
.
data
.
DataLoader
):
'''
Original dataset from https://github.com/lopuhin/kaggle-imet-2019
Prepares each dataset item in a separate thread, but collation is processed in the caller thread.
Requires map-style dataset (i.e. dataset with indexing) and automatic batching.
Ignores worker_init_fn parameter.
'''
def
__iter__
(
self
):
sample_iter
=
iter
(
self
.
batch_sampler
)
if
self
.
num_workers
==
0
:
for
indices
in
sample_iter
:
yield
self
.
collate_fn
([
self
.
_get_item
(
i
)
for
i
in
indices
])
else
:
prefetch
=
1
with
ThreadPool
(
processes
=
self
.
num_workers
)
as
pool
:
futures
=
[]
for
indices
in
sample_iter
:
futures
.
append
([
pool
.
apply_async
(
self
.
_get_item
,
args
=
(
i
,))
for
i
in
indices
])
if
len
(
futures
)
>
prefetch
:
yield
self
.
collate_fn
([
f
.
get
()
for
f
in
futures
.
pop
(
0
)])
for
batch_futures
in
futures
:
yield
self
.
collate_fn
([
f
.
get
()
for
f
in
batch_futures
])
def
_get_item
(
self
,
i
):
return
self
.
dataset
[
i
]
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment